One 、 Preface

Caching due to its high concurrency and high performance characteristics , Has been widely used in projects . On the read cache side , The business process is shown in the figure below ：

But in terms of updating the cache , Data inconsistency may occur , So after updating the database , It's update cache , Or delete the cache . Or delete the cache first , Update the database , There's a lot of controversy , In this article, we will briefly compare the advantages and disadvantages of several methods .

Two 、 Text

Let me give you an explanation , In theory , Set the expiration time for the cache , Is a solution that guarantees ultimate consistency . Under this scheme , We can set an expiration time for the data stored in the cache , All writes are subject to the database , Just do your best for the cache operation . That is, if the database write succeeds , Cache update failed , So just reach the expiration time , The subsequent read requests will naturally read the new value from the database and backfill the cache . therefore , The next line of thinking does not rely on setting an expiration time for the cache .
ad locum , We discuss Three Update strategy ：

Update the database first , Update the cache again
So let's delete the cache , Update the database
Update the database first , Delete the cache

（1） Update the database first , Update the cache again

This kind of scheme is universally opposed , Because of the following thread unsafe problems ：

If Zhang San and Li Si update at the same time , The following may occur

Zhang San updated the database
Li Si updated the database
Li Si updated the cache
Zhang San updated the cache

This leads to Li Si updating the cache earlier than Zhang San , The data is dirty .

（2） Delete cache first , Update the database

The reason why this plan will lead to inconsistencies is that . At the same time, there is a request for Zhang San to update , Another request Li Si to perform the query operation . Then the following will happen :

Zhang San delete cache
Li Si found that the cache does not exist
Li Si queries the database to get the old value
Li Si writes the old value to the cache
Zhang San writes the new value into the database

The above situation will also lead to data inconsistency

So what's the solution ？ We can use the delayed double deletion strategy

So let's delete the cache
Write the database again （ It's the same here as above ）
Sleep for a certain time , Delete cache again

Do it , You can cache dirty data caused by a certain period of time , Delete again

So how should we determine the time

For the above case , Readers should assess their own project's time consumption of the read data business logic . The sleep time for writing the data is then based on the time spent reading the data's business logic , Add a few hundred ms that will do . The purpose of this is , Is to make sure that the read request ends , Write requests can remove cached dirty data caused by read requests .

If you use mysql What about the read-write separation architecture ？

ok, under these circumstances , The reasons for the inconsistent data are as follows , Or two requests , A request A Update operation , Another request B Query operation .

（1） request A Write operation , Delete cache
（2） request A Write the data to the database ,
（3） request B Query cache discovery , The cache has no value
（4） request B Go to the library and look up , At this time , Master slave synchronization is not complete yet , So the query is the old value
（5） request B Write the old value to the cache
（6） The database completes master-slave synchronization , Changes from library to new value

The above situation , That's why the data is inconsistent . Again, use the double-delete delay strategy . It's just , The sleep time is changed to be based on the delay time of master-slave synchronization , Add a few hundred ms.

Here, some students will ask again , Delete for the second time , What if the deletion fails ？ If the second deletion fails , There are also cache and database inconsistencies .

Let's first look at the third scheme ：

（3） Update the database first , Delete the cache So is there a concurrency problem in this case ？ May also be , Let's take a look at the following steps ：

Suppose there are two requests , A request A Do query operation , A request B Do update operation , So this is going to happen （1） The cache just failed （2） request A Query the database , Get an old value （3） request B Writes the new value to the database （4） request B Delete cache （5） request A Writes the old value found to the cache ok, If this happens , Dirty data does happen .

However , What's the probability of that happening ？ There is a congenital condition for this to happen , It's the steps （3） Write database operations than steps （2） The read database operation takes less time , It's possible to make steps （4） Before the steps （5）. But , Think about it , Database reads are much faster than writes （ Why else do read and write separation , The point of doing read/write separation is because the read operation is faster , Less resources ）, So step （3） It takes more time than steps （2） shorter , It's very difficult for this to happen .

hypothesis , Some people have to be aggressive , Obsessive compulsive disorder , We have to figure out what to do ？

First , Setting an effective time for the cache is one option . secondly , Adopt a strategy （2） The asynchronous delay deletion strategy given in , Ensure that the read request is completed , Delete again .

Are there any other reasons for the discrepancy ？ yes , we have , This is also a cache update strategy （2） And cache update policies （3） There is a problem , What if the delete cache fails , There's going to be inconsistencies . For example, a request to write data , And then it's written to the database , The delete cache failed , So there's going to be an inconsistency . This is also a cache update strategy （2） The last question left in .

How to solve ？

Just provide a guaranteed retry mechanism , Here are two scenarios .

First option ：

However , There is a drawback to this scheme , A large number of intrusions into line of business code . So we have plan two , In scheme two , Start a subscription program to subscribe to the database binlog, Get the data you need to operate on . In the application , Let's do another procedure , Get the information from the subscriber , Do the delete cache operation .

Second option ：