当前位置:网站首页>Distributed transaction principle and solution

Distributed transaction principle and solution

2022-06-24 10:27:00 Juvenile deer

Local transactions

        In most scenarios , Our applications only need to operate a single database , The transaction in this case is called a local transaction (Local Transaction). Local affairs ACID The feature is that the database provides direct support . The local transaction application architecture is as follows :

stay JDBC Programming , We go through java.sql.Connection Object to open 、 Close or commit a transaction . The code is as follows :

Connection conn = ... // Get database connection 
conn.setAutoCommit(false); // Open transaction 
try{
   //... Add, delete, modify and check sql
   conn.commit(); // Commit transaction 
}catch (Exception e) {
  conn.rollback();// Transaction rollback 
}finally{
   conn.close();// Close links 
}

Typical scenario of distributed transaction     

     At present, the development of Internet is in full swing , The vast majority of companies have split and serviced their databases (SOA). under these circumstances , Completing a business function may need to span multiple services , Working with multiple databases . This involves distributed transactions , The resource to be operated is located on multiple resource servers , The application needs to ensure the operation of data for multiple resource servers , All or nothing , All or nothing . In essence , Distributed transaction is to ensure the data consistency of different resource servers .

Typical distributed transaction scenario

Cross-database transaction

     Cross database transactions refer to , A certain function of an application needs to operate multiple libraries , Different business data are stored in different libraries . I have seen a relatively complex business , In a business at the same time 9 Databases . The following figure shows a service operating at the same time 2 The situation of a library : 

Sub database and sub table

     Generally, a database has a large amount of data, or it is expected that there will be a large amount of data in the future , Will be split horizontally , That is, sub database and sub table . Here's the picture , Will database B Split into 2 Databases : 

        For the case of sub database and sub table , General developers will use some database middleware to reduce sql Complexity of operation . Such as , about sql:insert into user(id,name) values (1," Zhang San "),(2," Li Si "). This article sql Is the syntax of the operation list Library , In case of single warehouse , Can guarantee the consistency of transactions .

      But now because of the sub database and sub table , Developers want to 1 No. record insertion branch 1,2 No. record insertion branch 2. So database middleware should rewrite it as 2 strip sql, Insert two different sub databases , At this time, we need to ensure the success of both libraries , Or they all fail , So basically all database middleware are faced with the problem of distributed transaction .

As a service

     Microservice architecture is a popular concept at present . For example, a case mentioned by the author above , An application operates at the same time 9 Databases , The application logic is very complex , It's a great challenge for developers , It should be split into separate services , To simplify business logic . After break up , Independent services through RPC Framework to make remote calls , To communicate with each other . The figure below shows a 3 The architecture that services call each other :

        Service A To complete a function, you need to operate the database directly , Also call Service B and Service C, and Service B And at the same time 2 A database ,Service C Also operated a library . We need to ensure that these cross service operations on multiple databases are successful , Or they all fail , In fact, this is probably the most typical distributed transaction scenario .

        Summary : In the distributed transaction scenario discussed above , Without exception, they operate multiple databases directly or indirectly . How to guarantee the ACID characteristic , For distributed transaction implementation , It's a very big challenge . meanwhile , The implementation of distributed transaction must also consider the problem of performance , If in order to guarantee strictly ACID characteristic , Resulting in a serious performance degradation , So for some businesses that require quick response , It's unacceptable .

X/Open DTP Model and XA standard

      X/Open, That is, the present open group, It's an independent organization , Mainly responsible for the development of various industry technical standards .  Distributed transaction processing (Distributed Transaction Processing, abbreviation DTP) for ,X/Open The following reference documents are provided :

   DTP Reference model : <<Distributed Transaction Processing: Reference Model>>

   DTP XA standard : << Distributed Transaction Processing: The XA Specification>>

DTP Model

    constitute DTP Model 5 Two basic elements :

    Applications (Application Program , abbreviation AP): Used to define transaction boundaries ( Define the beginning and end of a transaction ), And operate on resources within the transaction boundaries .

    Explorer (Resource Manager, abbreviation RM): Such as a database 、 File system, etc , And provide access to resources .

    Transaction manager (Transaction Manager , abbreviation TM): Responsible for assigning transaction unique identifier , Monitor the progress of transactions , And responsible for the submission of affairs 、 Roll back, etc .

    Communication resource manager (Communication Resource Manager, abbreviation CRM): Control one TM Domain (TM domain) Inside or across TM Communication between distributed applications in the domain .

    Communication protocol (Communication Protocol, abbreviation CP): Provide CRM Provide the underlying communication services between distributed application nodes .

XA standard

      stay DTP In the local model instance , from AP、RMs and TM form , No other elements are needed .AP、RM and TM Between , We need to interact with each other , As shown in the figure below : 

In this picture (1) Express AP-RM The interface of ,(2) Express AP-TM The interface of ,(3) Express RM-TM The interface of .

XA The main function of norms is , That's the definition RM-TM The interface of ,XA The norm is in addition to the definition of RM-TM Interface of interaction (XA Interface) outside , We also optimize the two-phase commit protocol .

Two-stage agreement (two-phase commit) Is in OSI TP In the standard ; stay DTP Reference model (<<Distributed Transaction Processing: Reference Model>>) in , Specifies that the commit of a global transaction uses two-phase commit agreement ; and XA standard (<< Distributed Transaction Processing: The XA Specification>>) It just defines the interface to be used in the two-phase commit protocol , That is to say RM-TM Interface of interaction , Because the participants in the two-phase submission process , Only TM and RMs.

stay XA There are two stages in the agreement :

  • The transaction manager requires that each database involved in a transaction be pre committed (Precommit) This operation , And reflect whether you can submit .
  • The transaction coordinator requires each database to commit data , Or roll back the data

Two phase submission agreement (2PC)

        Two phase submission agreement (Two Phase Commit) Not in XA Put forward in the specification , however XA The specification optimizes it . And literally ,Two Phase Commit, That is to submit (commit) The process is divided into 2 Stages (Phase):

Stage 1:

       TM Notifications RM Ready to commit their transaction branches . If RM Judge that your work can be submitted , Then make it persistent , Give again TM A positive reply ; If something else happens , Here TM All the answers are negative . After sending a negative reply and rolling back the work already done ,RM You can discard the transaction branch information .

        With mysql Database, for example , In the first phase , The transaction manager issues prepare" Prepare to submit " request , After receiving the request, the database performs data modification and logging , After processing is completed, only the state of the transaction is changed to " You can submit ", Then return the result to the transaction manager .

Stage 2

    TM According to the stage 1 each RM prepare Result , Decide whether to commit or roll back the transaction . If all RM all prepare success , that TM Inform all RM Submit ; If there is RM prepare Failure words , be TM Inform all RM Roll back your own transaction branch .

       With mysql Database, for example , If all the databases in the first phase are prepare success , Then the transaction manager issues " Confirm the submission " request , The database server sends the transaction to " You can submit " Status changed to " Submit completed " state , Then return to answer . If an error occurs in any database operation in the first phase , The database manager did not receive a response , Transaction failure , Rollback all database transactions . The database server can't receive the second stage confirmation submission request , Will also put " You can submit " Back of business .

XA It's a resource level distributed transaction , Strong consistency , In the whole process of two-stage submission , Always hold the lock of resources .

TCC It's a business level distributed transaction , Final consistency , Will not always hold the lock of resources .

TCC( Compensation Affairs )

       TCC It's a two-stage programming model for service , Every business service must implement Try,Confirm,Cancel Three methods , These three ways can correspond to SQL Transaction Lock,Commit,Rollback.

        Compared to the two-phase commit ,TCC Solved several problems : Synchronous blocking , The timeout mechanism is introduced , Compensation after timeout , It doesn't lock the entire resource like a two-phase commit , Convert resources to business logic , The particle size becomes smaller .

        Because of the compensation mechanism , It can be controlled by the business activity manager , Ensure data consistency .

Try Stage :Try It's just a preliminary operation , Make a preliminary confirmation , Its main responsibility is to complete all business checks , Reserve business resources .

Confirm Stage :Confirm Is in Try After the stage inspection is completed , Continue with the confirmation operation , Must satisfy idempotent operation , If Confirm Execution failed in , There will be transaction coordinators that trigger continuous execution , Until satisfied .

Cancel Cancel execution : stay Try Failed and released Try Resources reserved in the stage , It must also satisfy idempotence , Follow Confirm It's also possible to be constantly executed .

One place an order , An example of generating an order to deduct inventory :

So let's see , How to add our inventory deduction process TCC:

stay Try When , Will allow inventory services to be reserved N Stock for this order , Let the order service generate a “ Unconfirmed ” Order , These two reserved resources are generated at the same time .

stay Confirm When , Will be used in Try Reserved resources , stay TCC In the transaction mechanism, I think , If in Try Resources that can be normally reserved in the stage , So in Confirm Must be able to submit completely .

stay Try When , One side of the task failed to execute , Will perform Cancel Interface operation of , Will be in Try Release the resources reserved in the stage .

This is not the point TCC How transactions are implemented , The focus is on distributed transactions CAP+BASE The application of the theory .

tcc Transaction implementation : https://github.com/changmingxie/tcc-transaction 

Two phase submission agreement (2PC) The problem is

Two phase commit does seem to provide atomic operations , But unfortunately , There are still several shortcomings in the two-stage submission :

1、 Synchronization blocking problem ( The biggest problem ).

        In the two-phase commit scheme, the global transaction ACID characteristic , It depends on RM Of . A global transaction contains multiple independent transaction branches , Either this set of transaction branches is successful , Or they all fail . For each transaction branch ACID Together, features make up the ACID characteristic . That is to say, the support of a single transaction branch ACID The feature elevates a level to the category of distributed transactions .  Even in local transactions , If it's sensitive to operation reading , We also need to set the transaction isolation level to SERIALIZABLE. And for distributed transactions , Even more so , The level of repeatable read isolation is not enough to guarantee the consistency of distributed transactions . If we use mysql To support XA Distributed transactions , Then it's best to set the transaction isolation level to SERIALIZABLE, However SERIALIZABLE( Serialization ) Is the highest of the four transaction isolation levels , It's also the lowest level of execution efficiency .

After the resources are ready , The resources in the resource manager are always blocked , Until the submission is complete , To release resources .

2、 A single point of failure .

        Because of the importance of the coordinator , Once the coordinator TM failure , participants RM It will keep blocking . Especially in the second stage , Coordinator failed , Then all participants are still in the state of locking transaction resources , Cannot continue to complete the transaction .( If the coordinator dies , You can re elect a coordinator , But it can't solve the problem that participants are blocked due to coordinator downtime )

3、 Data inconsistency .

        In phase II of phase II submission , When the coordinator sends commit After the request , There is a local network exception or sending commit The coordinator failed during the request , This will result in only a few participants receiving commit request , And in this part of the participants received commit The request is then executed commit operation , But the rest didn't receive commit The requested machine is unable to perform a transaction commit . So the whole distributed system appears the phenomenon of data inconsistency .

4. uncertainty

      When the transaction manager sends commit after , And only one participant received commit, So when the participant and the transaction manager go down at the same time , The re elected transaction manager cannot determine whether the message was committed successfully .

There are synchronization blocks due to two-phase commit 、 Single point problem and other defects , therefore , The researchers improved on the two-phase commit , A three-phase commit is proposed . 

Three stage submission agreement (Three-phase commit)

Three stage commit (3PC), It's a two-stage submission (2PC) Improved version .

Different from the two-stage submission is , There are two changes in the three-phase submission :

    1、 Introduce timeout mechanism . At the same time, the timeout mechanism is introduced in both the coordinator and the participants .

    2、 Insert a preparation stage in the first and second stages . It ensures that the states of participating nodes are consistent before the final submission stage . in other words , In addition to introducing a timeout mechanism ,3PC hold 2PC Once again, the preparation phase of the project is divided into two parts , In this way, there are three stages of submission CanCommit、PreCommit、DoCommit Three stages .

CanCommit Stage

    3PC Of CanCommit The stage is actually the same as 2PC The preparation stage of is very similar to . The coordinator sends... To the participants commit request , Participants return if they can submit Yes Respond to , Otherwise return to No Respond to .

    1. Business inquiry The coordinator sends... To the participants CanCommit request . Ask if the transaction commit operation can be performed . Then start waiting for the response from the participants .

    2. Respond to feedback The participants received CanCommit After the request , Under normal circumstances , If it thinks it can execute the transaction smoothly , Then return to Yes Respond to , And get ready . Otherwise feedback No

PreCommit Stage

     The coordinator decides whether the transaction can be remembered according to the response of the participants PreCommit operation . According to the response , There are two possibilities .

    If the coordinator's feedback from all participants is Yes Respond to , Then the pre execution of the transaction will be executed .

    1. Send pre submit request The coordinator sends... To the participants PreCommit request , And enter Prepared Stage .    

    2. Transaction pre commit Participant received PreCommit After the request , Will perform transaction operations , And will undo and redo Information is recorded in the transaction log .

    3. Respond to feedback If the participant successfully performs the transaction operation , Then return to ACK Respond to , And start waiting for the final order .

    If any of the participants sent No Respond to , Or wait for the timeout , None of the coordinators received a response from the participants , Then the interruption of the execution of the transaction .

    1. Send interrupt request The coordinator sends... To all participants abort request .

    2. Interrupt the business The participants received... From the coordinator abort After the request ( Or after the timeout , The request of the coordinator has not yet been received ), The interruption of the execution of a transaction .

doCommit Stage

    In this phase, the real transaction commit , It can also be divided into the following two situations .

    Case 1: Execute commit

    1. Send submit request Coordinate to receive... Sent by participants ACK Respond to , Then he will go from pre submission to submission . And send it to all participants doCommit request .

    2. Transaction submission Participant received doCommit After the request , Perform formal transaction submission . And release all transaction resources after transaction commit .

    3. Respond to feedback After the transaction is committed , Send... To the coordinator Ack Respond to .

    4. Complete the business The coordinator receives... From all participants ack After responding , Complete the business .

   Case 2: Interrupt the business   The coordinator did not receive the ACK Respond to ( It may be that the recipient sent it not ACK Respond to , It's also possible that the response timed out ), Then the interrupt transaction will be executed .

    1. Send interrupt request The coordinator sends... To all participants abort request

    2. Transaction rollback Participant received abort After the request , Take advantage of the undo Information to perform the rollback operation of the transaction , And release all transaction resources after rollback .

    3. Feedback results After the participant completes the transaction rollback , Send... To the coordinator ACK news

    4. Interrupt the business The coordinator received feedback from the participants ACK After message , The interruption of the execution of a transaction . 

     stay doCommit Stage , If the participant cannot receive the... From the coordinator in time doCommit perhaps rebort When asked , After the timeout , Transaction commit will continue .( In fact, this should be based on probability , When entering the third stage , Indicate that participants have received... In the second phase PreCommit request , Then the coordinator produces PreCommit The premise of the request is that he is , Received... From all participants CanCommit The responses are all Yes.( Once the participants have received PreCommit, It means that he knows that everyone actually agrees to modify ) therefore , In a word, it is , When entering the third stage , Due to network timeout and other reasons , Although participants did not receive commit perhaps abort Respond to , But he has reason to believe : The chances of a successful submission are great . )

2PC And 3PC The difference between

     be relative to 2PC,3PC The main single point of failure to solve , And reduce congestion , Because once the participants can't receive the information from the coordinator in time , He will default to commit. Instead of holding transaction resources and blocking them all the time . But this mechanism also leads to data consistency problems , because , Because of the Internet , Sent by the coordinator abort The response was not received by the participants in time , Then the participant executes after the timeout commit operation . In this way, we will receive abort There is data inconsistency between the participants who command and perform the rollback .

     I understand 2PC and 3PC after , We can find out , No matter two-phase commit or three-phase commit, it can't completely solve the problem of distributed consistency .

Local message table

The scheme of local message table was originally eBay Proposed ,eBay The whole scheme of :

https://queue.acm.org/detail.cfm?id=1394128 


Local message table is the most widely used method in the industry , Its core idea is to split distributed transactions into local transactions for processing .


For local message queues , The core is to turn big business into small business , Let's use the above example to illustrate :

  • When we go to create an order , We add a new local message table , Write the created order and inventory deduction to the local message table , In the same transaction ( Rely on database local transactions to ensure consistency ).
  • Configure a scheduled task to poll the local transaction table , Scan this local transaction table , Send messages that have not been sent , Send to inventory service , When the inventory service receives the message , Inventory will be reduced , And write to the transaction table of the server , Update the status of the transaction table .
  • The inventory server notifies the order service through scheduled tasks or directly , The order service updates the status in the local message table .

      It should be noted here that , For some scanning tasks that fail to be sent , Will be resend , Therefore, the idempotency of the interface must be guaranteed . The local message queue is BASE theory , Is the final consistency model , It is applicable to the case that the requirements for consistency are not high .

RocketMQ Business

RocketMQ Distributed transaction is implemented in , It's actually an encapsulation of the local message table , Moved the local message table to MQ Inside .

Transaction message as an asynchronous assured transaction , Branch two transactions through MQ Asynchronous decoupling ,RocketMQ The design process of transaction message also draws on the two-stage commit theory .

The overall interaction process is shown in the figure below :

        MQ Transactions are a layer of encapsulation of local message tables , Moved the local message table to MQ Inside , So it is also based on BASE theory , Is the ultimate consistency pattern , It is applicable to transactions that do not require strong consistency , meanwhile MQ Transactions asynchronize the entire process , It is also very suitable for high concurrency . This chapter does not explain in detail RocketMQ Business and RocketMq Characteristics of .

seata

Seata The three characters of

stay Seata In the framework of , There are three characters :

TC (Transaction Coordinator) - A business coordinator

Maintain the state of global and branch transactions , Drive global transaction commit or rollback .

TM (Transaction Manager) - Transaction manager

Define the scope of the global transaction : Start global transaction 、 Commit or roll back global transactions .

RM (Resource Manager) - Explorer

Manage resources for branch transactions , And TC Talk to register branch transactions and report the status of branch transactions , And drive branch transaction commit or rollback .

among ,TC For separately deployed Server Server side ,TM and RM For embedded in the application Client client .

stay Seata in , The life cycle of a distributed transaction is as follows :

1.TM request TC Start a global transaction .TC Will generate a XID As the number of the global transaction .XID, It will propagate in the invocation link of microservices , Ensure that the subtransactions of multiple microservices are associated together .

2.RM request TC Register local transaction as branch transaction of global transaction , Through global transactions XID Association .

3.TM request TC tell XID Whether the corresponding global transaction is committed or rolled back .

4.TC drive RM We will XID Commit or roll back the corresponding local transaction .

Design thinking

AT The core of the pattern is no intrusion into the business , It's an improved two-stage submission , The design idea is shown in the figure

The first stage

        Business data and rollback logging are committed in the same local transaction , Release local locks and connection resources . The core is to the business sql To analyze , convert to undolog, And put it in storage at the same time , How is this done ? First throw out a concept DataSourceProxy Proxy data sources , Through the name, you can basically guess what operation it is , Specific analysis will be made later

Refer to official documentation : Seata AT Pattern

The second stage

Distributed transaction operation succeeded , be TC notice RM Delete asynchronously undolog

        Distributed transaction operation failed ,TM towards TC Send rollback request ,RM Roger the coordinator TC Rollback request from , adopt XID and Branch ID Find the corresponding rollback log record , Generate reverse updates by rolling back records SQL And implement , To complete the rollback of the branch .

Overall execution process

Design highlights

Compared with other distributed transaction frameworks ,Seata There are several highlights of the architecture :

The application layer is based on SQL The analysis realizes automatic compensation , So as to minimize business intrusion ;

In distributed transactions TC( A business coordinator ) Independent deployment , Responsible for the registration of affairs 、 Roll back ;

Write isolation and read isolation are realized through global lock .

The problem is

Performance loss

        One Update Of SQL, A global transaction is required xid obtain ( And TC Communications )、before image( analysis SQL, Query the database once )、after image( Query the database once )、insert undo log( Write a database )、before commit( And TC Communications , Judge lock conflict ), All these operations require a remote communication RPC, And it's synchronous . in addition undo log When writing blob The insertion performance of the field is also not high . Write each SQL It's going to cost so much , A rough estimate would increase 5 Times the response time .

Cost performance

        For automatic compensation , All transactions need to be mirrored and persisted , But in the actual business scenario , This is the success rate , Or how many percentage of distributed transaction failures need to be rolled back ? Estimate according to the principle of 28 , in order to 20% The transaction is rolled back , Need to put 80% Increased response time for successful transactions 5 times , This cost is compared to whether it's worth letting the application develop a compensation transaction ?

Global lock

Hot data

        comparison XA,Seata Although in a successful stage will release the database lock , But one stage is commit The determination of the front global lock also lengthens the occupation time of the data lock , The cost ratio is XA Of prepare How much lower needs to be tested according to the actual business scenario . The introduction of global locks enables isolation , But the problem is congestion , Reduce concurrency , Especially hot data , This problem will be more serious .

Rollback lock release time

     Seata When rolling back , You need to delete the undo log, Then we can release TC Lock in memory , So if the second phase is rollback , It takes longer to release the lock .

The deadlock problem

        Seata The introduction of global lock will increase the risk of deadlock , But if a deadlock occurs , Will keep retrying , Finally, wait for the global lock timeout , It's not elegant , It also extends the time of database lock possession .

Seata Is an open source distributed transaction solution , Committed to providing high-performance and easy-to-use distributed transaction services .Seata Will provide users with AT、TCC、SAGA and XA Transaction mode , Create a one-stop distributed solution for users .AT The mode is the first mode promoted by Ali , There are commercial versions on Alibaba cloud GTS(Global Transaction Service Global transaction services )

Official website :Seata

Source code : https://github.com/seata/seata

official Demo: https://github.com/seata/seata-samples

seata Supported distributed :Seata Introduction to minimalism

Taro Road Spring Boot Distributed transactions Seata introduction | Taro Road source code —— Pure source parsing blog

原网站

版权声明
本文为[Juvenile deer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206240921386972.html