当前位置：网站首页>Detailed explanation of etcd backup and recovery principle and actual record of stepping on the pit

Detailed explanation of etcd backup and recovery principle and actual record of stepping on the pit

2022-06-24 08:22:00 【Dongdonger】

Work needs , This week I studied etcd Backup recovery scheme . It seems to be quite simple , But in the actual exercise , Mistakes in operation led to etcd All the data is lost , Fortunately, in the test environment , The online environment has already packed up and left . Make a brief note of some problems encountered .

1. Backup and recovery process

Backup requires etcdctl Tools ：

ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db

Use... For recovery etcdutl Tools , The recovery function of the old version is also integrated in etcdctl in , Use the following command , You can start from snapshot.db Restore a new file etcd colony

$ etcdutl snapshot restore snapshot.db \
  --name m1 \
  --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls http://host1:2380
  --data-dir 
$ etcdutl snapshot restore snapshot.db \
  --name m2 \
  --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls http://host2:2380
$ etcdutl snapshot restore snapshot.db \
  --name m3 \
  --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls http://host3:2380

name yes etcd Node name, In the cluster name Must be different
initial-cluster Is to restore the configuration of the cluster
initial-cluster-token Will affect the calculation cluster member id, It's not a required parameter
initial-advertise-peer-urls Data information of the node itself
data-dir Restore the backup information to the specified path

2. Principle introduction

2.1 Backup principle

etcd server received snapshot After the request , Will call backend Storage engine snapshot Interface , Get one snapshot data , And then snapshot Data written to pipe in , release snapshot（ Prevent prolonged ping live snapshot Cause to expire page Can't be released ）, The following sending logic will start from pipe Send the read data back to the client .

func (ms *maintenanceServer) Snapshot(sr *pb.SnapshotRequest, srv pb.Maintenance_SnapshotServer) error {
    
	snap := ms.bg.Backend().Snapshot()
	pr, pw := io.Pipe()

	defer pr.Close()

	go func() {
    
		snap.WriteTo(pw)
		if err := snap.Close(); err != nil {
    
			ms.lg.Warn("failed to close snapshot", zap.Error(err))
		}
		pw.Close()
	}()
	//  send out snapshot data 
	...
	...
}

Look again. backend Engine snapshot Logic , First, a transaction commit is called , This and etcd Its own transaction logic ,etcd The committed transaction of is not immediately written to the persistence engine boltdb Medium , I'll start with backend The cache , Brush regularly until boltdb in , What to do now snapshot, You need to cache The transaction of is committed to boltdb in , And then call boltdb The transaction interface of , Create a read transaction , Back to the top , The upper layer can obtain a through this read transaction snapshot file .boltdb The following separate introduction to .

func (b *backend) Snapshot() Snapshot {
    
	b.batchTx.Commit()

	b.mu.RLock()
	defer b.mu.RUnlock()
	tx, err := b.db.Begin(false)
	if err != nil {
    
		b.lg.Fatal("failed to begin tx", zap.Error(err))
	}

	stopc, donec := make(chan struct{
    }), make(chan struct{
    })
	dbBytes := tx.Size()
	...
	...
	return &snapshot{
    tx, stopc, donec}
}

2.2 Recovery principle

The function of recovery is etcdutl Tool independent , The core function is restore, Besides a lot of parameter verification and preparation work , The core is the remaining three functions .

// Restore restores a new etcd data directory from given snapshot file.
func (s *v3Manager) Restore(cfg RestoreConfig) error {
    
	...
	...
	//  Clean up the... In the backup file raft Meta information 
	if err = s.saveDB(); err != nil {
    
		return err
	}
	//  Restore the backup file to raft Start what you need wal and snapshot file 
	hardstate, err := s.saveWALAndSnap()
	if err != nil {
    
		return err
	}
	//  to update index Information to boltdb in 
	if err := s.updateCIndex(hardstate.Commit, hardstate.Term); err != nil {
    
		return err
	}
	...
	...
}

Let's look at it function by function , stay saveDB Function will copy the backup data to the corresponding directory , Then delete... From the backup data raft Meta information . The purpose of backup recovery is to restore a new data set on the current data set raft Cluster , So you only need to back up the user data in the data ,raft The metadata related information can be erased directly .

func (s *v3Manager) saveDB() error {
    
    //  Put the backup data into the corresponding directory 
	err := s.copyAndVerifyDB()
	if err != nil {
    
		return err
	}

	be := backend.NewDefaultBackend(s.lg, s.outDbPath())
	defer be.Close()

    //  Delete... From the backup data raft Meta information 
	err = schema.NewMembershipBackend(s.lg, be).TrimMembershipFromBackend()
	if err != nil {
    
		return err
	}

	return nil
}

Let's look at the next function , The final process , How to recover the corresponding data from the backup data wal Document and snapshot file . Simple view , This function does just a few things ：

New raft Information written to boltdb in
establish wal And write node Of meta data , Include node id and cluster id( This will be described in detail later )
Create one for each node configuration in the cluster to be recovered raft Configuration change log
Combine the log information with raft hard state Information is written to wal in
For the current state machine （ Recovered dataset ） Create a snapshot （ reflection ： Why do I need to create a snapshot for a new cluster ？ Start a new raft Nodes are not allowed ？）

func (s *v3Manager) saveWALAndSnap() (*raftpb.HardState, error) {
    
	...
	...
	//  take raft Information written to boltdb in 
	for _, m := range s.cl.Members() {
    
		s.cl.AddMember(m, true)
	}

    //  Initialize the cluster meta Information ,nodeID and clusterID, establish wal File and write meta Information 
	m := s.cl.MemberByName(s.name)
	md := &etcdserverpb.Metadata{
    NodeID: uint64(m.ID), ClusterID: uint64(s.cl.ID())}
	metadata, merr := md.Marshal()
	w, walerr := wal.Create(s.lg, s.walDir, metadata)
	
	//  Initialize the configuration change log for each node 
	ents := make([]raftpb.Entry, len(peers))
	nodeIDs := make([]uint64, len(peers))
	for i, p := range peers {
    
		nodeIDs[i] = p.ID
		cc := raftpb.ConfChange{
    
			Type:    raftpb.ConfChangeAddNode,
			NodeID:  p.ID,
			Context: p.Context,
		}
		d, err := cc.Marshal()
		if err != nil {
    
			return nil, err
		}
		ents[i] = raftpb.Entry{
    
			Type:  raftpb.EntryConfChange,
			Term:  1,
			Index: uint64(i + 1),
			Data:  d,
		}
	}

    //  initialization raft  Of term And log submission information , And save to hardState in 
	commit, term := uint64(len(ents)), uint64(1)
	hardState := raftpb.HardState{
    
		Term:   term,
		Vote:   peers[0].ID,
		Commit: commit,
	}
	//  Put the log and hard state Persist to wal in 
	if err := w.Save(hardState, ents); err != nil {
    
		return nil, err
	}

    //  For the current state machine （ Recovered data ） Create a raft snapshot, And the corresponding snapshot Information written to wal In the log .
	b, berr := st.Save()
	if berr != nil {
    
		return nil, berr
	}
	confState := raftpb.ConfState{
    
		Voters: nodeIDs,
	}
	raftSnap := raftpb.Snapshot{
    
		Data: b,
		Metadata: raftpb.SnapshotMetadata{
    
			Index:     commit,
			Term:      term,
			ConfState: confState,
		},
	}
	sn := snap.New(s.lg, s.snapDir)
	if err := sn.SaveSnap(raftSnap); err != nil {
    
		return nil, err
	}
	snapshot := walpb.Snapshot{
    Index: commit, Term: term, ConfState: &confState}
	return &hardState, w.SaveSnapshot(snapshot)
}

2.3 cluster member id

During the above backup and recovery process , The middle step will generate a for the cluster cluster id. Wrong deployment at some point etcd After cluster , You can often see an error message ,“remote cluster member id mismatch”. Let's look at this in detail cluster id What is it . According to the official website ,cluster id Is the identifier of a cluster , There is one for each cluster cluster id, If... Between two nodes cluster id atypism , It means that they are not a cluster . Let's take a look at this cluster id How it was generated

2.3.1 New clusters

New cluster cluster id Generation is very simple , Look directly at the code , First, the cluster will be generated according to the user's configuration information member Information , Then according to the cluster member Information generates a hash Value as cluster id, So on multiple machines , Start multiple nodes in the same cluster configuration etcd, Between them cluster id It's the same , So they can communicate with each other and form raft Clustered .
There is also a... In the parameter token Parameters , When you create a cluster, the –initial-cluster-token Parameter assignment , If not specified, the default value will be used , This parameter is equivalent to hash Calculation cluster id Add salt when necessary , In the end ：clusterID = hash（ Initial cluster configuration … , initial-cluster-token）
This logic is similar to the above backup and recovery etcdutl The logic of the tool is the same , The recovery tool will also use the cluster configuration in the parameter to generate cluster id.

func NewClusterFromURLsMap(lg *zap.Logger, token string, urlsmap types.URLsMap, opts ...ClusterOption) (*RaftCluster, error) {
    
	c := NewCluster(lg, opts...)
	//  Initialize the cluster information according to the configuration information 
	for name, urls := range urlsmap {
    
		...
		...
		c.members[m.ID] = m
	}
	//  Generate a according to the cluster information hash Value as member id
	c.genID()
	return c, nil
}

2.3.2 Restart the node

For nodes with data , After restart , There is no need to recalculate cluster id, Directly from wal It can be read from the ,etcdutl The tool can also see the generated... In the process of restoring the backup data cluster id It says wal in . That is, only cluster initialization can generate cluster id, Once generated , No more changes , Even if the cluster nodes have configuration changes , Will not affect cluster id.

2.3.3 Join an existing cluster node

Start a node to join the existing cluster , You need to set the flag bit at startup initial-cluster-state by existing, This flag bit will be determined in the code , If the node is newly added to the cluster, it will go to the following logic , Pull the cluster information from the remote node , And will cluster id Assign to local , Of course, there is a lot of verification logic , For example, it is far away cluster Whether the configuration node in is consistent with the local node .

func getClusterFromRemotePeers(lg *zap.Logger, urls []string, timeout time.Duration, logerr bool, rt http.RoundTripper) (*membership.RaftCluster, error) {
    
	if lg == nil {
    
		lg = zap.NewNop()
	}
	cc := &http.Client{
    
		Transport: rt,
		Timeout:   timeout,
	}
	//  Try getting... From a node cluster  Configuration information 
	for _, u := range urls {
    
		addr := u + "/members"
		resp, err := cc.Get(addr)
		...
		...
		// Use remote cluster  Configure and initialize local nodes , There are many verification logic in the middle , If it works , It will be cluster id Assign to local node 
		return membership.NewClusterFromMembers(lg, id, membs), nil
		...
	}
	return nil, fmt.Errorf("could not retrieve cluster information from the given URLs")
}

3. Record of stepping on the pit , Improper backup and recovery operations result in etcd Data lost

I have maintained a 3 Replica etcd colony , There are three nodes e1,e2,e3, Because of some accidents , among e1 and e3 Hang up , And the data file is damaged , Unable to restart these two processes . Then the repair started ：

At first I didn't know etcd Of member id Mechanism , I want to delete it directly e1 Data on a node , take e1 As an empty raft Restart the node , After this e1 Should be able to communicate with e2 form raft Two copies , Then restore the cluster service , But after trying to do this , Find out e1 and e2 The communication between them all reported errors “cluster member id mismatch”, Later, the investigation found that , My cluster has experienced node changes , The first three nodes are e0,e1,e2, Later, the node changes to ,e1,e2,e3, In other words, the cluster's cluster id yes hash（e0,e1,e2）, And I delete e1 After the data , Start with the current configuration e1,e1 Accounting is a new cluster id, namely hash(e1,e2,e3), So both sides are not match Of .
incapable of action , After that, the backup recovery scheme is adopted , Start with the living e2 Get a copy of... On the node snapshot file , Then on e1 Backup and recovery operations have been performed , And will e1 Restart , Find out e1 and e2 Communications are still emerging cluster id mismatch, The reason as above , The backup and recovery tool uses the current configuration to calculate cluster id And e2 Upper cluster id It doesn't meet .
but , take e2 And I survived , Then delete the data file , After going through the backup and recovery process , Re e2 Start it up , Now e1 and e2 Can communicate normally , formation raft Two copies .
All of the above is normal , Then I let my guard down , This led to an error in the last step , I will e3 After cleaning up the data file , Forget to use the backup and restore tool to restore data for them , will e3 It's on , And at this point e3 It can also start normally . as a result of e3 Start as an empty node ,cluster id It is calculated by using the configuration , namely hash(e1,e2,e3), This is similar to the backup recovery e1 and e2 It's consistent , however e3 The last data is empty .
After the cluster works normally for a period of time , Because the O & M operation will etcd leader Switch to the e3, At this time, I found out etcd There is no data in . The reason is that e3 The data is empty

put questions to ： At this time, a puzzling question arises , Why? e3 Joined the raft After cluster , Not from e1 or e2 Up-sync data ？raft It's agreed to ensure strong data consistency ？
answer ：raft Really don't carry this pot , because e1 and e2 It is also the node recovered from the backup , From the above recovery logic, you can see , The recovered node has only a few logs , here e3 start-up , from leader Node synchronization log , Soon the synchronization was completed , Not through install snapshot To realize data synchronization . Unless waiting e1 and e2 Run for a while , Let the log be compact fall , Restart e3, It will trigger raft Of install snapshot Logic , Finally let e3 Get the full amount of data .

原网站

版权声明
本文为[Dongdonger]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/175/202206240444169539.html