当前位置:网站首页>Disaster recovery series (V) -- database disaster recovery construction
Disaster recovery series (V) -- database disaster recovery construction
2022-06-24 02:58:00 【Kaiyuan】
In an age when data is king , Data security is regarded as the lifeblood of an enterprise , Therefore, how to ensure enterprise data security is particularly important . This paper mainly from the perspective of database disaster recovery scheme , Based on current customer business and combined with technology & product , Make the best disaster recovery plan . It is mainly introduced from the following three aspects :
- Elements of scheme design
- Cloud disaster recovery scheme
- Cloud customer cases
1. Elements of scheme design
Main data synchronization of database disaster recovery scheme design elements , Data consistency and data repair .
1.1 Data synchronization
Data synchronization mainly refers to the data synchronization between two availability zones or different regions , It is mainly divided into one-way synchronization and two-way synchronization .
Synchronization mode | The principle of replication | Specific scenarios | advantage | Inferiority |
|---|---|---|---|---|
One way synchronization | The master-slave copy mode is adopted | Upper level business order writing mode | 1. Low data consistency challenges 2. Minor business changes | Business delay is highly dependent |
Two way synchronization | The master master bidirectional replication mode is adopted | Single write or double write mode of upper layer business | Business delay dependency is weak | 1. Data consistency challenges are high 2. Major business changes |
Describe the scenario of bi-directional synchronization :
Business traffic a single game service is hosted in a single region , As shown in the figure below, generally , The services in the red box do not carry traffic , But as a business emergency escape route ; At the same time, the database is read and written in the same region , Two way synchronization between different regions .
1.2 Data consistency
Data consistency , It mainly refers to the upper layer business when reading the database , The data stored in the master and slave databases of the database set group shall be consistent . If the data in different database clusters are inconsistent , The business will read dirty data . for instance , Data inconsistency exists in the bank balance database , One day, Xiao Wang, a white-collar worker, just got paid , At the same time, the balance is notified by SMS 3 Ten thousand yuan , But when Xiao Wang logs into the bank APP The query found that the balance was only 2000 element , The newly added balance is not synchronized to all primary and secondary databases in time , Leading to data inconsistency ; Imagine this situation , How many customer service staff does the bank need to support ?
1. Write a business scenario
Write a business scenario , It shows that the business has only one database system , Therefore, the consistency guarantee depends on the master-slave database replication mode in the database cluster , Including asynchronous , Half a synchronous , And strong synchronization . Normally speaking , The final consistency of data is adopted for general business , Half synchronous replication is the most popular choice .
Copy way | Technical principle | Uniformity | performance |
|---|---|---|---|
Asynchronous replication | The application initiates an update request , The master node responds to the application immediately after completing the corresponding operations , The master node asynchronously replicates data to the slave node . | weak | strong |
Semi-synchronous replication | The application initiates an update request , After the master node performs the update operation, it immediately reports to From the node Copy the data , Data is received from the node and written to relay log in ( No execution required ) Then the success information is returned to the master node , The master node must return a response to the application after receiving the success information from the slave . | in | in |
Strong synchronization | The application initiates an update request , After the master node completes the operation, it copies data to the slave node , Data received from the node is sent back to The master node returns success information , The master node will respond to the application after receiving the feedback from the slave node .Master towards Slave Copying data is synchronous | strong | weak |
2. For the double write business scenario
Business double write , It shows that the business system has two sets of database clusters , On the one hand, the data consistency guarantee is the data replication mode in the cluster , On the other hand, the data replication between two clusters . The double write data consistency guarantee mainly depends on the business layer . The following figure shows the mainstream database double write scheme in the industry ,
1) It is divided into different categories according to user information IDC Computer room , adopt API The gateway forwards different users to different IDC colony
2) database mysql The data has been divided into units , Dual entry writable , But the same user data can only be accessed at one portal , To ensure the consistency of read data .
3) If the data conflicts , The system can overwrite the old data in time stamp order .
1.3 Data recovery
After the database cluster breaks down , How to ensure data consistency .
1) Interrupt scene awareness and switching
It is usually detected through the arbitration center , During the detection period , The master node exception is found , Conduct VIP Handoff .MHA As a mature database high availability fault solution in the industry ; Tencent cloud adopts ZK Way to perceive switching , After testing, prepare to switch about 30s complete .
2) Interrupt scenario consistency guarantee
be based on 1.2 The chapter describes how to ensure data consistency , It mainly depends on the actual business scenarios for reinforcement . Generally speaking , Business that is not very sensitive to data , The master-slave switch does not need to compare the data consistency . If a business is sensitive to data consistency requirements , Generally, there is an internal full calibration tool to verify , If inconsistencies are found , The automatic repair is overwritten by time stamping according to the established principles , Or analyze manual processing through local logs to ensure data security .
2. Platform disaster recovery scheme
The most commonly used Tencent cloud data products for customer business scenarios are redis,cdb,mongoDB as well as TDSQL.
Data products | Trans regional disaster recovery | Visit nearby | Cross region disaster recovery |
|---|---|---|---|
CDB | Support Console self-service configuration | Support Span AZ/ Cross region RO example | Scheme 1 : adopt DTS Support , Manual business switching is required VIP Option two : Support DTS Dual writing ability , Above and below the cloud or in many places . |
redis | Support Console self-service configuration | Support Span AZ/ Regional copy | Scheme 1 : adopt DTS Replication support Option two : adopt DTS Support global replication capabilities , Read from many places nearby |
TDSQL | Support Console self-service configuration | Support automatic separation of read and write Span AZ/ Cross region | adopt DCN Replication to support |
MongoDB | Support Console self-service configuration | Support Span AZ copy | adopt DTS Replication support |
3. Cloud customer cases
At present, a financial company on the cloud , Use cloud TDSQL product , The data stored in the database is the order business , The current single availability capability needs to be upgraded to the multi availability zone capability . Upgrade to multi zone capability at the same time , The following risk factors will be introduced
- The service delay will be 3ms Left and right network delay ,tdsql stay proxy To db There is no principle of proximity
- In extreme cases, the probability of the master-slave consistency problem increases
- Network jitter across availability zones will lead to write services hang live
The same region is different AZ Will exist 3ms Network delay , For disaster recovery, it is recommended to choose performance here , about 2 and 3 The appeal point of , combination tdsql The product provides disaster recovery suggestions .
Based on the present tdsql The core database adopts a single availability zone, a master and two slaves architecture , The data replication mode is strong synchronization , There are three main schemes , Comprehensive consideration Scheme III is adopted :
among TDSQL Strong synchronization description :https://cloud.tencent.com/document/product/557/10570
programme | Details of the scheme | advantage | Inferiority |
|---|---|---|---|
Scheme 1 | Dual zone deployment : One availability zone, one master and one slave , Another available zone is from | 1. Business delay : The service delay is less affected by the cross availability zone delay , It is almost the same as the delay in the same zone , From the theory of strong synchronization, most ACK All by Slave1 Return to Master node . 2. Writing data hang live : Consistent with the business scenario of the same availability zone . | 1. Data consistency : Poor data consistency , Strong synchronization depends on Slave1, about slave2 The data may not be up to date , There may be data inconsistency in the availability zone 2. Read dirty data :AZ1 and AZ2 Network exceptions across availability zones , When ZK Eliminate in judgment slave2 period (20s), Read only services are available in slave2 The probability of reading expired dirty data ( Delay sensitive business , It is also not recommended to read the slave node data ) |
Option two | Dual zone deployment : One zone and one master , Another zone has two slaves | 1. Data consistency :master Zone failure , According to the strong synchronization rule , Ensure the final consistency of data . 2. Read dirty data : In theory, two slave return ack The time difference is small , Therefore, the network across availability zones is abnormal , In two slave The probability of a node reading dirty data is very low . | 1. Business delay : Span AZ There will be 3ms Network delay , The business is comprehensively evaluated in combination with specific affairs . 2. Writing data hang live : There is only one logical link across the availability zone , It depends on the link stability between availability zones , Writing data will be added hang Probability of staying . |
Option three | 3. Availability zone deployment : Three zones , One node per zone | 1. Data consistency :master Zone failure , According to the strong synchronization rule , Ensure the final consistency of data . 2. Read dirty data : In theory, two slave return ack The time difference is small , Therefore, the network across availability zones is abnormal , In two slave The probability of a node reading dirty data is very low . 3. Writing data hang live : There are two logical links across the availability zone , Enhanced span AZ Network stability , It will reduce the cost of writing data hang Probability of staying | Span AZ There will be 3ms Network delay , The business is comprehensively evaluated in combination with specific affairs |
边栏推荐
- What about foreign trade companies? Is this another difficult year?
- Crawler series: using API
- Is the server connected to the fortress machine a virtual machine? What if the fortress machine IP is not connected
- Tstor onecos, focusing on a large number of object scenes
- Implementing an ORM framework against SQL injection with builder mode
- [51nod] 2102 or minus and
- How to pair cloud game servers? Is the cloud game server expensive?
- Afnetworking usage and cache processing
- 2022-2028 global genome editing mutation detection kit industry survey and trend analysis report
- Grpc: implement service end flow restriction
猜你喜欢

2022-2028 Global Industry Survey and trend analysis report on portable pressure monitors for wards

The cost of on-site development of software talent outsourcing is higher than that of software project outsourcing. Why

Permission maintenance topic: domain controller permission maintenance

IOS development - multithreading - thread safety (3)

2022-2028 global cancer biopsy instrument and kit industry research and trend analysis report

2022-2028 global indoor pressure monitor and environmental monitor industry research and trend analysis report

2022-2028 global pilot night vision goggle industry research and trend analysis report

What is etcd and its application scenarios

2022-2028 global marine wet exhaust hose industry research and trend analysis report

2022-2028 global third-party data platform industry research and trend analysis report
随机推荐
How does [lightweight application server] build a cross-border e-commerce management environment?
How does easydss solve the problem that the concurrency is too large and the disk read / write cannot keep up?
The server size of the cloud desktop. The cloud desktop faces the server configuration requirements
Grpc: how to enable tls/ssl?
Some tips for using uitextview
2022-2028 global medical modified polypropylene industry research and trend analysis report
Easycvr cannot be played when cascaded to the superior platform. Troubleshooting
What is the difference between cloud desktop cloud terminal and server? What are the advantages of cloud desktop?
[1024 programmers' day] Why do some programmers leave work earlier than you?
Innovation or hype? Is low code a real artifact or a fake tuyere?
What is data matrix code
Block
How to install an application publisher
Cloud function pressure measurement based on wechat applet
How to handle the occasional address request failure in easygbs live video playback?
MySQL Cases-MySQL 8.0.26 bug ERROR 1064 (42000) at line1: You have an error
2022-2028 global portable two-way radio equipment industry research and trend analysis report
Iranian gas station paralyzed by cyber attack, babuk blackmail software source code leaked | global network security hotspot
What about foreign trade companies? Is this another difficult year?
[Tencent cloud load balancing CLB] cross region binding 2.0 (new version) idc-ip best practices!