当前位置:网站首页>Detailed explanation of etcd backup and recovery principle and actual record of stepping on the pit
Detailed explanation of etcd backup and recovery principle and actual record of stepping on the pit
2022-06-24 08:22:00 【Dongdonger】
Work needs , This week I studied etcd Backup recovery scheme . It seems to be quite simple , But in the actual exercise , Mistakes in operation led to etcd All the data is lost , Fortunately, in the test environment , The online environment has already packed up and left . Make a brief note of some problems encountered .
1. Backup and recovery process
Backup requires etcdctl Tools :
ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db
Use... For recovery etcdutl Tools , The recovery function of the old version is also integrated in etcdctl in , Use the following command , You can start from snapshot.db Restore a new file etcd colony
$ etcdutl snapshot restore snapshot.db \
--name m1 \
--initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls http://host1:2380
--data-dir
$ etcdutl snapshot restore snapshot.db \
--name m2 \
--initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls http://host2:2380
$ etcdutl snapshot restore snapshot.db \
--name m3 \
--initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls http://host3:2380
- name yes etcd Node name, In the cluster name Must be different
- initial-cluster Is to restore the configuration of the cluster
- initial-cluster-token Will affect the calculation cluster member id, It's not a required parameter
- initial-advertise-peer-urls Data information of the node itself
- data-dir Restore the backup information to the specified path
2. Principle introduction
2.1 Backup principle
etcd server received snapshot After the request , Will call backend Storage engine snapshot Interface , Get one snapshot data , And then snapshot Data written to pipe in , release snapshot( Prevent prolonged ping live snapshot Cause to expire page Can't be released ), The following sending logic will start from pipe Send the read data back to the client .
func (ms *maintenanceServer) Snapshot(sr *pb.SnapshotRequest, srv pb.Maintenance_SnapshotServer) error {
snap := ms.bg.Backend().Snapshot()
pr, pw := io.Pipe()
defer pr.Close()
go func() {
snap.WriteTo(pw)
if err := snap.Close(); err != nil {
ms.lg.Warn("failed to close snapshot", zap.Error(err))
}
pw.Close()
}()
// send out snapshot data
...
...
}
Look again. backend Engine snapshot Logic , First, a transaction commit is called , This and etcd Its own transaction logic ,etcd The committed transaction of is not immediately written to the persistence engine boltdb Medium , I'll start with backend The cache , Brush regularly until boltdb in , What to do now snapshot, You need to cache The transaction of is committed to boltdb in , And then call boltdb The transaction interface of , Create a read transaction , Back to the top , The upper layer can obtain a through this read transaction snapshot file .boltdb The following separate introduction to .
func (b *backend) Snapshot() Snapshot {
b.batchTx.Commit()
b.mu.RLock()
defer b.mu.RUnlock()
tx, err := b.db.Begin(false)
if err != nil {
b.lg.Fatal("failed to begin tx", zap.Error(err))
}
stopc, donec := make(chan struct{
}), make(chan struct{
})
dbBytes := tx.Size()
...
...
return &snapshot{
tx, stopc, donec}
}
2.2 Recovery principle
The function of recovery is etcdutl Tool independent , The core function is restore, Besides a lot of parameter verification and preparation work , The core is the remaining three functions .
// Restore restores a new etcd data directory from given snapshot file.
func (s *v3Manager) Restore(cfg RestoreConfig) error {
...
...
// Clean up the... In the backup file raft Meta information
if err = s.saveDB(); err != nil {
return err
}
// Restore the backup file to raft Start what you need wal and snapshot file
hardstate, err := s.saveWALAndSnap()
if err != nil {
return err
}
// to update index Information to boltdb in
if err := s.updateCIndex(hardstate.Commit, hardstate.Term); err != nil {
return err
}
...
...
}
Let's look at it function by function , stay saveDB Function will copy the backup data to the corresponding directory , Then delete... From the backup data raft Meta information . The purpose of backup recovery is to restore a new data set on the current data set raft Cluster , So you only need to back up the user data in the data ,raft The metadata related information can be erased directly .
func (s *v3Manager) saveDB() error {
// Put the backup data into the corresponding directory
err := s.copyAndVerifyDB()
if err != nil {
return err
}
be := backend.NewDefaultBackend(s.lg, s.outDbPath())
defer be.Close()
// Delete... From the backup data raft Meta information
err = schema.NewMembershipBackend(s.lg, be).TrimMembershipFromBackend()
if err != nil {
return err
}
return nil
}
Let's look at the next function , The final process , How to recover the corresponding data from the backup data wal Document and snapshot file . Simple view , This function does just a few things :
- New raft Information written to boltdb in
- establish wal And write node Of meta data , Include node id and cluster id( This will be described in detail later )
- Create one for each node configuration in the cluster to be recovered raft Configuration change log
- Combine the log information with raft hard state Information is written to wal in
- For the current state machine ( Recovered dataset ) Create a snapshot ( reflection : Why do I need to create a snapshot for a new cluster ? Start a new raft Nodes are not allowed ?)
func (s *v3Manager) saveWALAndSnap() (*raftpb.HardState, error) {
...
...
// take raft Information written to boltdb in
for _, m := range s.cl.Members() {
s.cl.AddMember(m, true)
}
// Initialize the cluster meta Information ,nodeID and clusterID, establish wal File and write meta Information
m := s.cl.MemberByName(s.name)
md := &etcdserverpb.Metadata{
NodeID: uint64(m.ID), ClusterID: uint64(s.cl.ID())}
metadata, merr := md.Marshal()
w, walerr := wal.Create(s.lg, s.walDir, metadata)
// Initialize the configuration change log for each node
ents := make([]raftpb.Entry, len(peers))
nodeIDs := make([]uint64, len(peers))
for i, p := range peers {
nodeIDs[i] = p.ID
cc := raftpb.ConfChange{
Type: raftpb.ConfChangeAddNode,
NodeID: p.ID,
Context: p.Context,
}
d, err := cc.Marshal()
if err != nil {
return nil, err
}
ents[i] = raftpb.Entry{
Type: raftpb.EntryConfChange,
Term: 1,
Index: uint64(i + 1),
Data: d,
}
}
// initialization raft Of term And log submission information , And save to hardState in
commit, term := uint64(len(ents)), uint64(1)
hardState := raftpb.HardState{
Term: term,
Vote: peers[0].ID,
Commit: commit,
}
// Put the log and hard state Persist to wal in
if err := w.Save(hardState, ents); err != nil {
return nil, err
}
// For the current state machine ( Recovered data ) Create a raft snapshot, And the corresponding snapshot Information written to wal In the log .
b, berr := st.Save()
if berr != nil {
return nil, berr
}
confState := raftpb.ConfState{
Voters: nodeIDs,
}
raftSnap := raftpb.Snapshot{
Data: b,
Metadata: raftpb.SnapshotMetadata{
Index: commit,
Term: term,
ConfState: confState,
},
}
sn := snap.New(s.lg, s.snapDir)
if err := sn.SaveSnap(raftSnap); err != nil {
return nil, err
}
snapshot := walpb.Snapshot{
Index: commit, Term: term, ConfState: &confState}
return &hardState, w.SaveSnapshot(snapshot)
}
2.3 cluster member id
During the above backup and recovery process , The middle step will generate a for the cluster cluster id. Wrong deployment at some point etcd After cluster , You can often see an error message ,“remote cluster member id mismatch”. Let's look at this in detail cluster id What is it . According to the official website ,cluster id Is the identifier of a cluster , There is one for each cluster cluster id, If... Between two nodes cluster id atypism , It means that they are not a cluster . Let's take a look at this cluster id How it was generated
2.3.1 New clusters
New cluster cluster id Generation is very simple , Look directly at the code , First, the cluster will be generated according to the user's configuration information member Information , Then according to the cluster member Information generates a hash Value as cluster id, So on multiple machines , Start multiple nodes in the same cluster configuration etcd, Between them cluster id It's the same , So they can communicate with each other and form raft Clustered .
There is also a... In the parameter token Parameters , When you create a cluster, the –initial-cluster-token Parameter assignment , If not specified, the default value will be used , This parameter is equivalent to hash Calculation cluster id Add salt when necessary , In the end :clusterID = hash( Initial cluster configuration … , initial-cluster-token)
This logic is similar to the above backup and recovery etcdutl The logic of the tool is the same , The recovery tool will also use the cluster configuration in the parameter to generate cluster id.
func NewClusterFromURLsMap(lg *zap.Logger, token string, urlsmap types.URLsMap, opts ...ClusterOption) (*RaftCluster, error) {
c := NewCluster(lg, opts...)
// Initialize the cluster information according to the configuration information
for name, urls := range urlsmap {
...
...
c.members[m.ID] = m
}
// Generate a according to the cluster information hash Value as member id
c.genID()
return c, nil
}
2.3.2 Restart the node
For nodes with data , After restart , There is no need to recalculate cluster id, Directly from wal It can be read from the ,etcdutl The tool can also see the generated... In the process of restoring the backup data cluster id It says wal in . That is, only cluster initialization can generate cluster id, Once generated , No more changes , Even if the cluster nodes have configuration changes , Will not affect cluster id.
2.3.3 Join an existing cluster node
Start a node to join the existing cluster , You need to set the flag bit at startup initial-cluster-state by existing, This flag bit will be determined in the code , If the node is newly added to the cluster, it will go to the following logic , Pull the cluster information from the remote node , And will cluster id Assign to local , Of course, there is a lot of verification logic , For example, it is far away cluster Whether the configuration node in is consistent with the local node .
func getClusterFromRemotePeers(lg *zap.Logger, urls []string, timeout time.Duration, logerr bool, rt http.RoundTripper) (*membership.RaftCluster, error) {
if lg == nil {
lg = zap.NewNop()
}
cc := &http.Client{
Transport: rt,
Timeout: timeout,
}
// Try getting... From a node cluster Configuration information
for _, u := range urls {
addr := u + "/members"
resp, err := cc.Get(addr)
...
...
// Use remote cluster Configure and initialize local nodes , There are many verification logic in the middle , If it works , It will be cluster id Assign to local node
return membership.NewClusterFromMembers(lg, id, membs), nil
...
}
return nil, fmt.Errorf("could not retrieve cluster information from the given URLs")
}
3. Record of stepping on the pit , Improper backup and recovery operations result in etcd Data lost
I have maintained a 3 Replica etcd colony , There are three nodes e1,e2,e3, Because of some accidents , among e1 and e3 Hang up , And the data file is damaged , Unable to restart these two processes . Then the repair started :
- At first I didn't know etcd Of member id Mechanism , I want to delete it directly e1 Data on a node , take e1 As an empty raft Restart the node , After this e1 Should be able to communicate with e2 form raft Two copies , Then restore the cluster service , But after trying to do this , Find out e1 and e2 The communication between them all reported errors “cluster member id mismatch”, Later, the investigation found that , My cluster has experienced node changes , The first three nodes are e0,e1,e2, Later, the node changes to ,e1,e2,e3, In other words, the cluster's cluster id yes hash(e0,e1,e2), And I delete e1 After the data , Start with the current configuration e1,e1 Accounting is a new cluster id, namely hash(e1,e2,e3), So both sides are not match Of .
- incapable of action , After that, the backup recovery scheme is adopted , Start with the living e2 Get a copy of... On the node snapshot file , Then on e1 Backup and recovery operations have been performed , And will e1 Restart , Find out e1 and e2 Communications are still emerging cluster id mismatch, The reason as above , The backup and recovery tool uses the current configuration to calculate cluster id And e2 Upper cluster id It doesn't meet .
- but , take e2 And I survived , Then delete the data file , After going through the backup and recovery process , Re e2 Start it up , Now e1 and e2 Can communicate normally , formation raft Two copies .
- All of the above is normal , Then I let my guard down , This led to an error in the last step , I will e3 After cleaning up the data file , Forget to use the backup and restore tool to restore data for them , will e3 It's on , And at this point e3 It can also start normally . as a result of e3 Start as an empty node ,cluster id It is calculated by using the configuration , namely hash(e1,e2,e3), This is similar to the backup recovery e1 and e2 It's consistent , however e3 The last data is empty .
- After the cluster works normally for a period of time , Because the O & M operation will etcd leader Switch to the e3, At this time, I found out etcd There is no data in . The reason is that e3 The data is empty
put questions to : At this time, a puzzling question arises , Why? e3 Joined the raft After cluster , Not from e1 or e2 Up-sync data ?raft It's agreed to ensure strong data consistency ?
answer :raft Really don't carry this pot , because e1 and e2 It is also the node recovered from the backup , From the above recovery logic, you can see , The recovered node has only a few logs , here e3 start-up , from leader Node synchronization log , Soon the synchronization was completed , Not through install snapshot To realize data synchronization . Unless waiting e1 and e2 Run for a while , Let the log be compact fall , Restart e3, It will trigger raft Of install snapshot Logic , Finally let e3 Get the full amount of data .
边栏推荐
- Synthesize video through ffmpeg according to m3u8 file of video on the network
- Tool functions – get all files in the project folder
- Getting started with crawler to giving up 06: crawler play Fund (with code)
- List of Li Bai's 20 most classic poems
- longhorn安装与使用
- 2022年流动式起重机司机特种作业证考试题库及在线模拟考试
- How to design a highly available and extended image storage function
- dhcp、tftp基础
- Pat 1157: school anniversary
- Simple refraction effect
猜你喜欢
![Leetcode 515 find the leetcode path of the maximum [bfs binary tree] heroding in each row](/img/16/011ba3aef1315c39526daac7e3ec89.png)
Leetcode 515 find the leetcode path of the maximum [bfs binary tree] heroding in each row

问题4 — DatePicker日期选择器,2个日期选择器(开始、结束日期)的禁用

不止于观测|阿里云可观测套件正式发布

jwt(json web token)

解决笔记本键盘禁用失败问题

longhorn安装与使用

Introduction to software engineering - Chapter 2 - feasibility study
![[008] filter the table data row by row, jump out of the for cycle and skip this cycle VBA](/img/a0/f03b8d9c8f5e53078c38cce11f8ad3.png)
[008] filter the table data row by row, jump out of the for cycle and skip this cycle VBA

宝塔面板安装php7.2安装phalcon3.3.2

JDBC 在性能测试中的应用
随机推荐
Shader common functions
3D数学基础[十七] 平方反比定理
新准则金融资产三分类:AMC、FVOCI和FVTPL
Robot acceleration level task priority inverse kinematics
How to use the virtual clock of FPGA?
All you know is the test pyramid?
Phonics
Understanding of the concept of "quality"
You get in Anaconda
Four models of iPhone 13 series have been exposed, and indeed, they are 13 fragrant!
Swift Extension NetworkUtil(网络监听)(源码)
Model effect optimization, try a variety of cross validation methods (system operation)
一文带你了解Windows操作系统安全,保护自己的电脑不受侵害
Online education fades
自动化测试的未来趋势
Final review and key points of software process and project management
基金的募集,交易与登记
2021-03-16 COMP9021第九节课笔记
Vscode topic recommendation
05-ubuntu安装mysql8