当前位置:网站首页>An accident caused by a MySQL misoperation cannot be withstood by High Availability!
An accident caused by a MySQL misoperation cannot be withstood by High Availability!
2022-06-24 19:21:00 【Wukong chat architecture】
Hello , I'm Wukong .
Last time our project did not put MySQL Is the high availability deployment ready ,MySQL Dual master mode + Keepalived, To ensure high availability . Simply put, there are two MySQL Master node , There are two Keepalived Installed on the host computer to monitor MySQL The state of , Once problems are found , Just restart MySQL, And the client will automatically connect to another computer MySQL.
For details, please see this article written by Wukong : actual combat MySQL High availability Architecture
This is an accident we encountered in the project , Let's go over it .
The contents of this article are as follows :

The scene of the accident
- [x] Environmental Science : Test environment
- [x] Time : In the morning 10:30
- [x] Feedback personnel : Test group , There's a frying pan , After troubleshooting by development colleagues , It is found that there may be a database problem .
Then start looking for reasons . Because I deployed this database environment , So if I came to check, I was familiar with it .
System deployment diagram
First, let's talk about the deployment diagram of the system , So that you can understand .
Two databases are deployed in node55 and node56 Node , They are in a master-slave relationship with each other , So it is called double master . 
There are two Keepalived Deployed in node55 and node56 above , Separately monitor MySQL Container state . 
Reasons for error reporting and solutions
① My first thought was , Not having Keepalived To ensure high availability , Even if MySQL Hang up , It can also be done through Keepalived To restart automatically . Even if one fails to restart , There is another one that can be used ?
② Then go to the server and have a look MySQL The state of the container . To MySQL On two servers of , Let's take a look at MySQL Container state ,docker ps command , Found two MySQL Containers are not in the list , This means that the container is not functioning properly .

③ It's impossible , I installed Keepalived High availability components , Don't Keepalived I've also hung up ?
④ Check the wave quickly Keepalived, Found two Keepalived It works . View by executing the command :systemctl status keepalived

- ⑤ what ,Keepalived It's normal , Keepalived It will restart every few seconds MySQL, Maybe I didn't see it in that short free time MySQL Container start up ? Execute another command ,docker ps -a, List the status of all containers . You can see MySQL Started and exited , explain MySQL It's really rebooting .

⑥ That means Keepalived Although it was restarted MySQL Containers , however MySQL I have a problem with myself , that Keepalived There is no way to improve the high availability of .
⑦ Then how to fix ? Just look at MySQL Report something wrong . Execute the command to view the container log .docker logs < Containers id>. Find the recent log :

- ⑧ Tips mysql-bin.index file does not exist , This file is configured for master-slave synchronization , stay my.cnf In the configuration .

After this configuration , Then, when performing master-slave synchronization , Will be in var/lib/mysql/log Multiple... Are generated under the directory mysql-bin.xxx The file of . One more mysql-bin.index Index file , It will mark now binlog Where are the log files recorded .

mysql-bin.index The contents of the document are as follows :
/var/lib/mysql/log/mysql-bin.000001 This mysql-bin.000001 The document is still numbered , There's still a pit here , I'll talk about it later .
⑨ The error message indicates that there is a lack of mysql-bin.index, Let's check it out , Not really ! No matter how the file disappeared , Get this log Create the folder first , then mysql It will automatically generate this file for us .
Solution : Execute the following command to create a folder and add permissions .
mkdir logchmod 777 log -R⑩ This is available on both servers log After the directory ,Keepalived Also help us restart automatically MySQL Containers , Then visit one of the nodes node56 Of MySQL The state of , Why , It's the wrong report . 
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'You can see several key messages :
- Slave_IO_Running: NO, Currently synchronized I/O The thread is not running , This I/O Threads are from the library , It will request the master database binlog, And will get binlog Write local relay-log ( relay logs ) In file . Is not running , It means that the slave database synchronization is not running normally .
- Master_Log_File: mysql-bin.000014, This indicates that the currently synchronized log file is
000014, We saw the node before node56 On mysql.index It says 000001, This 000014 Not at all index In the document , So it will report an error .
This involves the principle of master-slave synchronization , Previous picture : 
Two threads are generated from the library , One I/O Threads , One SQL Threads ;
I/O The thread will request the main library binlog Log files , And will get binlog Log files Write local relay-log ( relay logs ) In file ;
The main library will generate a dump Threads , Used to give to the slave I/O Thread transfer binlog;
SQL Threads , Will read relay log Log in file , And resolved into SQL Statements are executed one by one .
> That right , We re specify which log file to synchronize , And the location of synchronization .
Solution :
Look at the main library node55 Log file status on .

Write down these two messages :File=mysql-bin.00001,Position=117748.( There's also a hole here : First lock the watch , Look at these two values , After starting synchronization from the library , Unlock the table ).
The specific orders are as follows :
FLUSH TABLES WITH READ LOCK;SHOW MASTER STATUSUNLOCK TABLESThen from the library node56 Reassign the synchronized log file and location on the :
# Stop synchronizing from the Library STOP SLAVE;# Set sync file and location CHANGE MASTER TO MASTER_HOST='10.2.1.55',MASTER_PORT=3306,MASTER_USER='vagrant',MASTER_PASSWORD='vagrant',MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=117748;# Turn on synchronization START SLAVE;Check again and no error will be reported ,I/O The thread also runs ,

And then node55 As a slave Library ,node56 As the master library , Also perform the above steps , The status display is normal , And then use navicat Connect the tool to the database , It's all normal , Under the feedback of the test group , Fix up the work .
I seem to have forgotten a question , Why? log The folder was wiped out ??
Why there's a problem ?
Then I asked if anyone had deleted this at that time /var/lib/mysql/log Catalog , No one will delete this directory casually .
But found log The parent directory of /var/lib/mysql There are many other folders , such as xxcloud, xxcenter etc. . This is the name of several databases in our project , Just in the folder of this directory , Will be shown in navicat On , It's one-to-one , As shown in the figure below . It also shows log database .

Is there anyone from navicat I got rid of it log database ? Very likely !
Sure enough , A colleague previously performed an operation to synchronize the database during the migration and upgrade process , Synchronize the old database to the new database at one time ( It can be understood as override operation ), But the old database does not log Database , This is equivalent to log The database is down , At the same time, I will also put log The folder is gone . Okay , Finally, it's all over the place !
improvement
Actually, when you synchronize the database , This should not be used to override synchronization , Single database synchronization can be adopted , Will not kill log Database . however , This log It's a little strange to put the database here , Can you not show up here ?
We just specify this log The directory is not in /var/lib/mysql It's just in the catalog .
Another question , Is our high availability really high ?
At least they didn't call the police in time ,MySQL Database hung , I don't know , They are all feedback from students through tests .
Can you feel it in time MySQL Unusual ?
It can be used here Keepalived The function of sending e-mail , Or through the log alarm system . This is what needs to be improved in the future . </ Containers >
边栏推荐
- Make track map
- Using alicloud RDS for SQL Server Performance insight to optimize database load - first understanding of performance insight
- 网络安全审查办公室对知网启动网络安全审查
- Using to release resources
- Volcano成Spark默认batch调度器
- 多云模式并非“万能钥匙”
- 我链接mysql 报这个错 是啥意思呀?
- Value passing and reference passing of value types and reference types in CSharp
- NFT pledge liquidity mining system development technology
- Introduction and tutorial of SAS planet software
猜你喜欢

Introduction and download tutorial of administrative division vector data

Why is nodejs so fast?

Necessary fault handling system for enterprise network administrator

Introduction and download tutorial of two types of soil data

干货 | 新手经常忽略的嵌入式基础知识点,你都掌握了吗?

制造业项目MDM主数据项目实施心得

「碎语杂记」这事儿不安全

程序员大部分时间不是写代码,而是。。。

three. Basic framework created by JS

Volcano成Spark默認batch調度器
随机推荐
初步学习Nuxt3
How to use R package ggtreeextra to draw evolution tree
智能合约安全审计入门篇 —— delegatecall (2)
Starring V6 platform development take out point process
Introduction, download and use of global meteorological data CRU ts from 1901 to 2020
Server lease error in Hong Kong may lead to serious consequences
Application DDoS attack principle and defense method
《Go题库·11》channel的应用场景
Starring develops httpjson access point + Database
three. Basic framework created by JS
1: Mosaic of 100W basic geographic information data
【计算讲谈社】第三讲:如何提出关键问题?
Real time rendering: the difference between real-time, offline, cloud rendering and hybrid rendering
Does version 2.2.0 support dynamic addition of MySQL synchronization tables
LabView之MQTT协议使用
NFT pledge liquidity mining system development technology
Unity移动端游戏性能优化简谱之 以引擎模块为划分的CPU耗时调优
Mqtt protocol usage of LabVIEW
西北工业大学遭黑客攻击?双因素认证改变局面!
Volcano becomes spark default batch scheduler