当前位置:网站首页>Greenplum数据库故障分析——semop(id=2000421076,num=11) failed: invalid argument
Greenplum数据库故障分析——semop(id=2000421076,num=11) failed: invalid argument
2022-06-26 16:30:00 【肥叔菌】
业务安装greenplum数据库时,master节点频繁崩溃,同事排查日志发现如下报错:
2022-06-23 21:01:21.201568 CST,,,p52171,th-761145216,,,0,,,seg-1,,,,,"FATAL","XX000","no free slots in PMChildFlags array",,,,,,,0,,,"pmsignal.c",173,"Stack trace: 1 .... "
看上面的报错顿时感觉不可思议,PMChildFlags array中没有空闲的槽位,报错代码位于AssignPostmasterChildSlot函数中,详细流程参见 PostgreSQL数据库PMsignal——后端进程\Postmaster信号通信。连代码这行报错都说了Out of slots ... should never happen
。PMChildFlags槽从PMSignalShmemSize函数中可以看出其数量2 * (MaxConnections + autovacuum_max_workers + 1 + max_worker_processes)
是远远大于MaxConnections,也就是业务连接在达到PMChildFlags槽数量上限前应该已经被MaxConnections限制住了;况且这个时候是数据库刚启动的时候,并没有业务连接进来。这就百思不得其姐了。。。
int AssignPostmasterChildSlot(void) {
int slot = PMSignalState->next_child_flag;
int n;
/* Scan for a free slot. We track the last slot assigned so as not to waste time repeatedly rescanning low-numbered slots. */
for (n = PMSignalState->num_child_flags; n > 0; n--) {
if (--slot < 0) slot = PMSignalState->num_child_flags - 1;
if (PMSignalState->PMChildFlags[slot] == PM_CHILD_UNUSED) {
PMSignalState->PMChildFlags[slot] = PM_CHILD_ASSIGNED;
PMSignalState->next_child_flag = slot;
return slot + 1;
}
}
/* Out of slots ... should never happen, else postmaster.c messed up */
elog(FATAL, "no free slots in PMChildFlags array");
return 0; /* keep compiler quiet */
}
Size PMSignalShmemSize(void){
Size size;
size = offsetof(PMSignalData, PMChildFlags);
size = add_size(size, mul_size(MaxLivePostmasterChildren(), sizeof(sig_atomic_t)));
return size;
}
int MaxLivePostmasterChildren(void) {
return 2 * (MaxConnections + autovacuum_max_workers + 1 + max_worker_processes);
}
没办法,由于和共享内存相关,直接从所有日志中抓取相关关键字,发现在某次重新安装的日志中出现如下问题,且是接在上述报错之后。通过度娘搜索到瀚高PG实验室有发过相关问题的博客,这里摘抄一下《semctl(156532736, 0, IPC_RMID, …) failed: Invalid argument 引起的数据库重启》。
[[email protected] pg_log] cat * | grep 'sem'
2022-06-23 21:05:36.272884 CST,,,p66817,th-1473353856,,,0,con2,,seg-1,,,,,"FATAL","XX000","semop(id=2000421076,num=11) failed: invalid argument"
数据库日志没有规律性的出现如下所示报错,同时导致数据库重启。系统平台:Linux x86-64 Red Hat Enterprise Linux 7 版本:9.5
FATAL,XX000,semop(id=157450268) failed: Invalid argument
FATAL,XX000,semop(id=157843496) failed: Invalid argument
PANIC,XX000,queueing for lock while waiting on another one
terminating any other active server processes
WARNING,57P02,terminating connection because of crash of another server process,The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.,In a moment you should be able to reconnect to the database and repeat your command.
archiver process (PID 3766) exited with exit code 1
FATAL,57P03,the database system is in recovery mode
all server processes terminated; reinitializing
could not remove shared memory segment /PostgreSQL.44345806: No such file or directory
semctl(156532736, 0, IPC_RMID, ...) failed: Invalid argument
semctl(156565505, 0, IPC_RMID, ...) failed: Invalid argument
FATAL,57P03,the database system is in recovery mode
database system was interrupted; last known up at 2018-12-27 04:54:36 CST
database system was not properly shut down; automatic recovery in progress
redo starts at 5E7/7036BD30
FATAL,57P03,the database system is in recovery mode
invalid record length at 5E7/75359EC8
redo done at 5E7/75359EA0
last completed transaction was at log time 2018-12-27 05:06:26.652179+08
MultiXact member wraparound protections are now enabled
autovacuum launcher started
database system is ready to accept connections
造成该问题的原因为参数RemoveIPC被设置为yes。RemoveIPC参数在/etc/systemd/logind.conf中控制在用户完全注销时是否删除System V IPC对象。该参数在 systemd 212(2014-03-25)版本中默认打开,RHEL7从219版本开始。显然,RHEL7中的该参数是默认关闭的。当RemoveIPC = yes时,PostgreSQL服务器使用的信号量对象在随机时间被删除,导致服务器崩溃,出现类似的日志:LOG: semctl(1234567890, 0, IPC_RMID, ...) failed: Invalid argument
。attached状态的共享内存段不会被清理,所以systemd不会清理正在被使用的共享内存段,但信号量没有进程attached的概念,所以即使它们实际上仍在使用中它们也会被清理干净。解决方案:
(1) 修改“/etc/systemd/logind.conf”文件中的“RemoveIPC”字段的值为“no”。
使用vim打开logind.conf文件。vim /etc/systemd/logind.conf
修改“RemoveIPC”字段的值为“no”。RemoveIPC=no
(2) 修改“/usr/lib/systemd/system/systemd-logind.service”文件中的“RemoveIPC” 字段的值为“no”。
使用vim命令打开systemd-logind.service文件。vim /usr/lib/systemd/system/systemd-logind.service
修改“RemoveIPC”字段的值为“no”。RemoveIPC=no
(3) 重新加载配置参数。
systemctl daemon-reload
systemctl restart systemd-logind
(4) 检查修改是否生效。
loginctl show-session | grep RemoveIPC
systemctl show systemd-logind | grep RemoveIPC
但是修改之后依旧未能解决问题,现在只能自己分析,从PostgreSQL数据库信号量机制— PGSemaphore底层原理文章中看出semop(id=2000421076,num=11) failed: invalid argument
都是在PGSemaphoreLock、PGSemaphoreLockInterruptable、PGSemaphoreUnlock和PGSemaphoreTryLock中出现问题报错的,并且平台是使用的SysV semaphore facilities,其输入参数都是如下类型的变量,且PGSemaphoreData类型变量所使用的内存都是在共享内存中,也就是调用SysV库函数semget获取的semId和已获取信号量的数量就存在该PGSemaphoreData类型变量中。
#ifdef USE_SYSV_SEMAPHORES
typedef struct PGSemaphoreData
{
int semId; /* semaphore set identifier */
int semNum; /* semaphore number within set */
} PGSemaphoreData;
#endif
PGSemaphoreCreate函数初始化PGSemaphore结构以表示计数为1的信号。但是PGSemaphoreCreate函数均是在postmaster守护进程中进行创建的,普通后台进程并没有权限去创建,因此创建过程没有报错就应该是没有问题的。
void PGSemaphoreCreate(PGSemaphore sema) {
Assert(!IsUnderPostmaster); /* Can't do this in a backend, because static state is postmaster's */
if (nextSemaNumber >= SEMAS_PER_SET) {
/* Time to allocate another semaphore set */
if (numSemaSets >= maxSemaSets) elog(PANIC, "too many semaphores created");
mySemaSets[numSemaSets] = IpcSemaphoreCreate(SEMAS_PER_SET);
numSemaSets++;
nextSemaNumber = 0;
}
sema->semId = mySemaSets[numSemaSets - 1]; /* Assign the next free semaphore in the current set */
sema->semNum = nextSemaNumber++;
IpcSemaphoreInitialize(sema->semId, sema->semNum, 1); /* Initialize it to count 1 */
}
因此最终问题在于我们的共享内存被操作系统处理掉了,参数RemoveIPC已经被排除了,现在就有两种可能:1. 被业务的脚本误伤,比如使用ipcrm,之前就有过被业务清理session的脚本误伤所有segement数据库进程的记录(可以通过死循环执行ipcs -a查看整个过程中master数据库信号量丢失的时间点和数据库崩溃的时间点来判定) 2. 公司linux发行版本系统对特定的系统用户有特殊设置(比如说master数据库进程是在session-72887.scope中,在user-1000.slice下,而segment数据库进程是在user-71381.slice下,1000是ubuntu的UID,是71381是gpadmin的UID,比如特定项目喜欢针对ubuntu账户进行特定的设置。
ipcs -a
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x2369b4bc 3309575 gpadmin 640 7535067136 18
------ Semaphore Arrays --------
key semid owner perms nsems
0xd5b0c6f8 1179650 gpadmin 640 154
------ Message Queues --------
key msqid owner perms used-bytes messages
夜深人静了,业务也不愿意把环境给我们继续调查下去了,遂放弃该节点部署到其他节点上去了。排查到这里,深感自己在浩瀚复杂的操作系统面前的无力感。
边栏推荐
- 6 custom layer
- Quickly get started with federal learning -- the practice of Tencent's self-developed federal learning platform powerfl
- How to implement interface current limiting?
- Scala 基础 (二):变量和数据类型
- Kept to implement redis autofailover (redisha)
- Make up the weakness - Open Source im project openim about initialization / login / friend interface document introduction
- Redis migration (recommended operation process)
- [understanding of opportunity -31]: Guiguzi - Daoyu [x ī] Crisis is the coexistence of danger and opportunity
- Redis migration (recommended operation process) 1
- 108. 简易聊天室11:实现客户端群聊
猜你喜欢
Niuke programming problem -- dynamic programming of must brush 101 (a thorough understanding of dynamic programming)
How to implement interface current limiting?
[机缘参悟-31]:鬼谷子-抵巇[xī]篇-危机是危险与机会并存
pybullet机器人仿真环境搭建 5.机器人位姿可视化
Tsinghua's "magic potion" is published in nature: reversing stem cell differentiation, and the achievements of the Nobel Prize go further. Netizen: life can be created without sperm and eggs
TCP congestion control details | 1 summary
11 introduction to CNN
【力扣刷题】二分查找:4. 寻找两个正序数组的中位数
我把它当副业月入3万多,新手月入过万的干货分享!
大话领域驱动设计——表示层及其他
随机推荐
[from deleting the database to running] the end of MySQL Foundation (the first step is to run.)
Keepalived 实现 Redis AutoFailover (RedisHA)
[Li Kou brush question] monotone stack: 84 The largest rectangle in the histogram
C语言 头哥习题答案截图
108. 简易聊天室11:实现客户端群聊
Supplement the short board - Open Source im project openim about initialization / login / friend interface document introduction
Redis顺序排序命令
Scala Basics (II): variables and data types
电路中缓存的几种形式
《软件工程》期末重点复习笔记
R language generalized linear model function GLM, GLM function to build logistic regression model, analyze whether the model is over discrete, and use the ratio of residual deviation and residual degr
1-12vmware adds SSH function
day10每日3题(1):逐步求和得到正数的最小值
R language plotly visualization: Violin graph, multi category variable violin graph, grouped violin graph, split grouped violin graph, two groups of data in each violin graph, each group accounts for
基於Kubebuilder開發Operator(入門使用)
Redis 迁移(操作流程建议)1
Ten thousand words! In depth analysis of the development trend of multi-party data collaborative application and privacy computing under the data security law
神经网络“炼丹炉”内部构造长啥样?牛津大学博士小姐姐用论文解读
Practice of federal learning in Tencent micro vision advertising
Failed to upload hyperf framework using alicloud OSS