当前位置:网站首页>Record and analysis of Rac abnormal heartbeat (ipreamsfails)
Record and analysis of Rac abnormal heartbeat (ipreamsfails)
2022-07-16 06:38:00 【51CTO】
environmental information
Oracle 19.12
Oracle Linux 7.9
One : Real production environment phenomenon
When slightly affected :
- The user reported that the database connection failed 、 It lasted for a few minutes and then returned to normal
- crs The log contains node communication exception information
- OSWatcher ipReamsfails growth
Severe impact :
- The user reported that the database connection failed 、 Production business downtime
- crs Node kick out , Operating system restart
- Switch to dg library , Still happen crs Node kick out , Operating system restart
- Stop database 2 node . The single instance database is running normally , Observe for two weeks without any error .
- OSWatcher ipReamsfails A lot of growth
Two : Before testing , Actions taken by the production environment :
- Modify the operating system kernel parameters from 4M、3M Add to ,15M、16M( Official documents ID:2008933.1) The problem remains.
- Modify the private network connection method , Direct connection with network cable , The problem remains.
3、 ... and : Start the stress test ( use swingbench) Result
• Test scenario 1 :
Configuration situation :
Concurrency number :200
test result :
CRS The log contains node communication exception information .
OSWatcher There are findings in the log ipReamsfails Growing .
• Test scenario II :
Configuration situation :
Concurrency number :600
test result :
CRS The log contains node communication exception information .
OSWatcher There are findings in the log ipReamsfails Growing slowly .
• Test scenario three :
Configuration situation :
Adjust the configuration of heartbeat switch , Enable macroframe (Jumbo Frame)
Adjust the heartbeat network card MTU size =9000
Concurrency number :1000
test result :
CRS journal normal .
OSWatcher journal ipReamsfails No growth .
• Test scenario 4 :
Configuration situation :
Upgrade the network card driver
Adjust the heartbeat network card MTU size =1500
Concurrency number :600
test result :
CRS The log contains node communication exception information .
OSWatcher There are findings in the log ipReamsfails Growing slowly .
Four : Test summary
- During the test, the pressure is actually produced 10 More than times .
- No expulsion of database nodes was found .
- CRS The log contains node communication exception information .
- OSWatcher There are findings in the log ipReamsfails growth .
5、 ... and : Final solution
- Adjust the heartbeat switch configuration to enable macroframe (Jumbo Frame)
- Adjust the heartbeat network card MTU size =9000
6、 ... and : Follow up test plan
- Upgrade the operating system kernel test
- from Oracle Linux Switch to a Redhat Linux test
- New server test
- If conditions permit , The following will supplement the testing of different systems and hardware platforms . If there is no condition, it will be a pigeon .
7、 ... and : Problem analysis and summary
• IP reassembles failed The phenomenon
- IP reassembles failed The phenomenon is due to ipfrag_low/high_thresh The cache is full , New packets cannot be received by the operating system , Direct discarding . Produced IpReamsfails Value growth .
- ipfrag_low/high_thresh buffer Operating system default 3M、4M
/proc/sys/net/ipv4/ipfrag_low_thresh
/proc/sys/net/ipv4/ipfrag_high_thresh
• Under what circumstances will network packets be put ipfrag In cache , And cause ipReamsFails Value growth ?
- In the Internet , Packets will be based on the minimum MTU Size split transmission reorganization . The operating system network card and switch default to 1500 byte .
- When packets exceed MTU Byte size packets , It will be split into multiple packets and transmitted in turn .
- After being transmitted to the target end, it will be put ipfrag In cache , Wait for packet reorganization .
- stay Oracle in , There are two scenes with MTU Size matters .
a) client –> Request and response of database server
Related parameters :SDU, The default is 8K
b) Heartbeat between database servers and cache fusion Data blocks
Related parameters :DB_BLOCK_SIZE, The default is 8K
- By default (MTU 1500 byte ) Data transmission in the service network and heartbeat network will be unpacked and packaged , At this point, the packet will be put ipfrag In cache .
- Database read operating system network IO, The operating system comes from ipfrag_thresh Read packets from the cache of , Restructuring , Assemble the package into SDU block or DB_BLOCK block .
- When the operating system starts from ipfrag_thresh The efficiency of reading packets from the cache for packet reorganization is lower than ipfrag_thresh Cache growth efficiency ,ipfrag_thresh The cache will gradually increase , When the maximum cache value is reached , New packet discards , Incomplete packet reception , Package reorganization failed . Report errors :IP reassembles failed ,IpReamsfails Value growth .
• How to avoid this kind of phenomenon ?
- To avoid this phenomenon , Through adjustment MTU 9000, Greater than DB_BLOCK_SIZEK 8K size .
- Transferred between database servers cache fusion Data blocks There will be no unpacking 、 The scenario of packaging , therefore IpReamsfails The phenomenon of growth disappeared .
• Other reasons ?
This phenomenon is related to the system version 、 Network card driver version and other low-level compatibility .
attach : Official documents and reference descriptions
- Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID 563566.1)
The direct connection mode of heartbeat network card is adjusted according to the original factory documents 、 Upgrade the network card driver . - IPC Send timeout/node eviction etc with high packet reassembles failure (Doc ID 2008933.1)
Adjusted according to the original factory documents ipfrag_thresh. - Recommendation for the Real Application Cluster Interconnect and Jumbo Frames (Doc ID 341788.1)
Adjusted according to the original factory documents MTU size 9000, The problem did not recur after adjustment .
Those who have encountered this problem can discuss it together , At present, there are still blind spots that have not been covered , For example, the test did not reappear RAC Node expulsion 、 Only adjusted the private network heartbeat , It is not related to APP Adjustment of business flow between .
边栏推荐
- vscode插件安装介绍
- 自动化仪表与过程控制(期末复习)
- (VSCode+anaconda解决CommandNotFoundError: Your shell has not been properly configured to use ‘conda ac
- 【MATLAB】matlab第三课——绘图进阶
- stm32用较简单的方法控制许许多多的灯实现流水效果
- Chapter 4 stm32+ld3320+syn6288+dht11 realize voice acquisition of temperature and humidity values (Part 1)
- YUV format data
- 短视频平台常见SQL面试题,你学会了吗?
- 语音芯片JQ8400的使用心得
- Blue Bridge Cup embedded Hal library LED_ TEST
猜你喜欢

第四章 STM32+LD3320+SYN6288+DHT11实现语音获取温湿度数值(下)

IIC通讯

Embedded software development stm32f407 buzzer register version

【PCB】关于电赛——硬件设计和PCB绘制的一些心得(持续更新)

RTtread-动态内存分配

【信号调理】【PCB】电源板(提供±2v5,3v3,5v,12v供电)的制作——电赛使用

Blue Bridge Cup embedded Hal library Tim_ BASE

第一章 DHT11温湿度传感器的使用

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
![[matlab] matlab lesson 3 - advanced drawing](/img/f8/aec64319d44d54bf4fef26939bebf9.png)
[matlab] matlab lesson 3 - advanced drawing
随机推荐
Go language from entry to specification -6.8, go generation and parsing JSON and precautions
SQL中去除重复数据的几种方法,我一次性都告你
Use MessageBox to realize window confession applet (with source code)
整理numpy
【MATLAB】matlab第二课——绘图初步
Keil5 software reports error: l6406e: no space in execution regions with ANY selector matching xxx
Max3232ese problem record and solution
[Multisim] problems and solutions of Multisim Simulation "zero crossing comparator"
【PCB】关于电赛——硬件设计和PCB绘制的一些心得(持续更新)
Chapter III use of ld3320 speech recognition module
[matlab] matlab lesson 3 - advanced drawing
Embedded software development stm32f407 buzzer register version
记录:VsCode通过ssh连接阿里云
梯度下降法的向量化
第五章 STM32+LD3320语音识别控制淘宝USB宿舍书桌灯
嵌入式软件开发 STM32F407 跑马灯 标准库版
嵌入式软件开发 STM32F407 按键输入 标准库版
OpenGL 3D graphics development notes, terrain, lighting, shadows, etc
openMV实现颜色追踪
#导入Word文档图片# 根文件系统制作与挂载