当前位置:网站首页>An accident caused by a MySQL misoperation, and the "high availability" cannot withstand it!
An accident caused by a MySQL misoperation, and the "high availability" cannot withstand it!
2022-06-24 14:40:00 【Wukong chat architecture】
Keep creating , Accelerate growth ! This is my participation 「 Nuggets day new plan · 6 Yuegengwen challenge 」 Of the 9 God , Click to see the event details
once MySQL Accidents caused by misoperation ,「 High availability 」 I can't stand it anymore !
This is Wukong's first 152 Original articles
Official website :www.passjava.cn
Hello , I'm Wukong .
Last time our project did not put MySQL Is the high availability deployment ready ,MySQL Dual master mode + Keepalived, To ensure high availability . Simply put, there are two MySQL Master node , There are two Keepalived Installed on the host computer to monitor MySQL The state of , Once problems are found , Just restart MySQL, And the client will automatically connect to another computer MySQL.
For details, please see this article written by Wukong : actual combat MySQL High availability Architecture
This is an accident we encountered in the project , Let's go over it .
The contents of this article are as follows :
The scene of the accident
- Environmental Science : Test environment
- Time : In the morning 10:30
- Feedback personnel : Test group , There's a frying pan , After preliminary investigation by R & D colleagues , It is found that there may be a database problem .
Then start looking for reasons . Because this cluster environment is deployed by me , So if I came to check, I was familiar with it .
System deployment diagram
First, let's talk about the deployment diagram of the system , So that you can understand .
Two databases are deployed in node55 and node56 Node , They are in a master-slave relationship with each other , So it is called double master .
There are two Keepalived Deployed in node55 and node56 above , Separately monitor MySQL Container state .
Reasons for error reporting and solutions
- ① My first thought was , Not having Keepalived To ensure high availability , Even if MySQL Hang up , It can also be done through Keepalived To restart automatically . Even if one fails to restart , There is another one that can be used ?
- ② Then go to the server and have a look MySQL The state of the container . To MySQL On two servers of , Let's take a look at MySQL Container state ,docker ps command , Found two MySQL Containers are not in the list , This means that the container is not functioning properly .
- ③ It's impossible , I installed Keepalived High availability components , Don't Keepalived I've also hung up ?
- ④ Check the wave quickly Keepalived, Found two Keepalived It works . View by executing the command :systemctl status keepalived
- ⑤ what ,Keepalived It's normal , Keepalived It will restart every few seconds MySQL, Maybe I didn't see it in that short free time MySQL Container start up ? Execute another command ,docker ps -a, List the status of all containers . You can see MySQL Started and exited , explain MySQL It's really rebooting .
- ⑥ That means Keepalived Although it was restarted MySQL Containers , however MySQL I have a problem with myself , that Keepalived There is no way to improve the high availability of .
- ⑦ Then how to fix ? Just look at MySQL Report something wrong . Execute the command to view the container log .docker logs < Containers id>. Find the recent log :
- ⑧ Tips mysql-bin.index file does not exist , This file is configured for master-slave synchronization , stay my.cnf In the configuration .
After this configuration , Then, when performing master-slave synchronization , Will be in var/lib/mysql/log Multiple... Are generated under the directory mysql-bin.xxx
The file of . One more mysql-bin.index
Index file , It will mark now binlog
Where are the log files recorded .
mysql-bin.index
The contents of the document are as follows :
/var/lib/mysql/log/mysql-bin.000001
This mysql-bin.000001
The document is still numbered , There's still a pit here , I'll talk about it later .
⑨ The error message indicates that there is a lack of mysql-bin.index, Let's check it out , Not really ! No matter how the file disappeared , Get this log Create the folder first , then mysql It will automatically generate this file for us .
Solution : Execute the following command to create a folder and add permissions .
mkdir logchmod 777 log -R
⑩ This is available on both servers log After the directory ,Keepalived Also help us restart automatically MySQL Containers , Then visit one of the nodes node56 Of MySQL The state of , Why , It's the wrong report .
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'
You can see several key messages :
- Slave_IO_Running: NO, Currently synchronized I/O The thread is not running , This I/O Threads are from the library , It will request the master database binlog, And will get binlog Write local relay-log ( relay logs ) In file . Is not running , It means that the slave database synchronization is not running normally .
- Master_Log_File: mysql-bin.000014, This indicates that the currently synchronized log file is
000014
, We saw the node before node56 On mysql.index It says 000001, This 000014 Not at all index In the document , So it will report an error .
This involves the principle of master-slave synchronization , Previous picture :
Two threads are generated from the library , One I/O Threads , One SQL Threads ;
I/O The thread will request the main library binlog Log files , And will get binlog Log files Write local relay-log ( relay logs ) In file ;
The main library will generate a dump Threads , Used to give to the slave I/O Thread transfer binlog;
SQL Threads , Will read relay log Log in file , And resolved into SQL Statements are executed one by one .
That right , We re specify which log file to synchronize , And the location of synchronization .
Solution :
Look at the main library node55 Log file status on .
Write down these two messages :File=mysql-bin.00001,Position=117748.( There's also a hole here : First lock the watch , Look at these two values , After starting synchronization from the library , Unlock the table ).
The specific orders are as follows :
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS
UNLOCK TABLES
Then from the library node56 Reassign the synchronized log file and location on the :
# Stop synchronizing from the Library STOP SLAVE;# Set sync file and location CHANGE MASTER TO MASTER_HOST='10.2.1.55',MASTER_PORT=3306,MASTER_USER='vagrant',MASTER_PASSWORD='vagrant',MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=117748;# Turn on synchronization START SLAVE;
Check again and no error will be reported ,I/O The thread also runs ,
Insert picture description here
And then node55 As a slave Library ,node56 As the master library , Also perform the above steps , The status display is normal , And then use navicat Connect the tool to the database , It's all normal , Under the feedback of the test group , Fix up the work .
I seem to have forgotten a question , Why? log The folder was wiped out ??
Why there's a problem ?
Then I asked if anyone had deleted this at that time /var/lib/mysql/log Catalog , No one will delete this directory casually .
But found log The parent directory of /var/lib/mysql There are many other folders , such as xxcloud, xxcenter etc. . This is the name of several databases in our project , Just in the folder of this directory , Will be shown in navicat On , It's one-to-one , As shown in the figure below . It also shows log database .
Is there anyone from navicat I got rid of it log database ? Very likely !
Sure enough , A colleague was in the process of migration and upgrading , Found this log There is no database in the old system , So I cleaned it up , This is equivalent to log The database is down , At the same time, I will also put log The folder is gone . Okay , Finally, it's all over the place ! In fact, I didn't consider this in the early stage log A problem with the catalog . you 're right , This is my pot ~
improvement
Actually, when you synchronize the database , This should not be used to override synchronization , Single database synchronization can be adopted , Will not kill log Database . however , This log It's a little strange to put the database here , Can you not show up here ?
We just specify this log The directory is not in /var/lib/mysql It's just in the catalog .
Dongge suggested :log Files and databases data File isolation :
datadir = /var/lib/mysql/data
log_bin = /var/lib/mysql/log
Another question , Is our high availability really high ?
At least they didn't call the police in time ,MySQL Database hung , I don't know , They are all feedback from students through tests .
Can you feel it in time MySQL Unusual ?
It can be used here Keepalived The function of sending e-mail , Or through the log alarm system . This is what needs to be improved in the future .
- END -
边栏推荐
- GO语言并发模型-MPG模型
- Some basic database operations (providing the original database information)
- Wide measuring range of jishili electrometer
- [deep learning] storage form of nchw, nhwc and chwn format data
- Concurrent writing of maps in golang
- Py之toad:toad的简介、安装、使用方法之详细攻略
- `Thymeleaf ` template engine comprehensive analysis
- Successfully solved: selenium common. exceptions. SessionNotCreatedException: Message: session not created: This versi
- Overview of SAP marketing cloud functions (IV)
- ES mapping之keyword;term查询添加keyword查询;更改mapping keyword类型
猜你喜欢
Don't underestimate the integral mall. It can play a great role
C language ---18 function (user-defined function)
[leetcode] 10. Regular expression matching
Keyword of ES mapping; Term query add keyword query; Change mapping keyword type
[learn ZABBIX from scratch] I. Introduction and deployment of ZABBIX
API data interface for announcement of Hong Kong listed companies
Multimeter resistance measurement diagram and precautions
laravel下视图间共享数据
同样是初级测试工程师,为啥他薪资高?会这几点面试必定出彩
box-sizing
随机推荐
List of PostgreSQL
3环杀掉360安全卫士进程
GO语言-goroutine协程的使用
From pair to unordered_ Map, theory +leetcode topic practice
IDEA 插件 Material Theme UI收费后的办法
I have been in the industry for 4 years and have changed jobs twice. I have learned a lot about software testing
六石管理学:垃圾场效应:工作不管理,就会变成垃圾场
ESP32系列--ESP32各个系列对比
[untitled]
R语言plotly可视化:可视化模型在整个数据空间的分类轮廓线(等高线)、meshgrid创建一个网格,其中每个点之间的距离由mesh_size变量表示、使用不同的形状标签表征、训练、测试及分类标签
Online text entity extraction capability helps applications analyze massive text data
As a developer, what is the most influential book for you?
update+catroot+c000021a+critical service failed+drivers+intelide+viaide+000000f
股票开户要找谁?在线开户安全么?
Common sense knowledge points
Py之toad:toad的简介、安装、使用方法之详细攻略
June training (day 24) - segment tree
Golang实现Biginteger大数计算
box-sizing
R language constructs regression model diagnosis (normality is invalid), performs variable transformation, and uses powertransform function in car package to perform box Cox transform to normality on