当前位置:网站首页>Is your posture correct—— A detailed discussion on horizontal sub database and sub table

Is your posture correct—— A detailed discussion on horizontal sub database and sub table

2022-06-24 02:43:00 2020labs assistant

One 、 background

Lift the sub warehouse and sub table , For most server development , In fact, it is not a new term . As the business grows , The amount of data in our table will become larger and larger , Fields may also increase as business complexity increases , In order to solve the query performance problem of single table , Generally, it will perform table splitting operation .

At the same time, the user activity of our business will be higher and higher , The level of concurrency is increasing , Then the processing capacity limit of a single database may be reached . At this time, in order to solve the bottleneck of database processing performance , Generally, it will be used for sub database operation . Whether it is database operation or table operation , We usually have two ways to deal with , One is vertical split , One is horizontal splitting .

On the differences and characteristics of the two splitting methods , There are many references on the Internet , Many people have written about it , I won't go into details here , Interested readers can search by themselves .

This article mainly talks about , Some special details of our most practical and common method of horizontal database and table , I hope it can help you avoid detours , Find the database and table design that is most suitable for your business .

【 notes 1】 The cases in this paper are based on Mysql database , The sub warehouse and sub table in the following refers to the horizontal sub warehouse and sub table .【 notes 2】 Later, it is mentioned that M library N surface , All refer to common M A database , Each database has a total of N A minute table , That is, the total number of tables is actually M*N.

Two 、 What is a good database and table scheme ?

2.1 Programme sustainability

The amount of business data in the early stage is not large , When the flow is low , We don't need to divide the database into tables , It is also not recommended to divide the warehouse and table . But once we want to design the business by database and table , We must consider the sustainability of the sub database and sub table scheme .

What is sustainability ? In fact, that is : The business data level and business traffic will further increase to a new level in the future , Our sub database and sub table scheme can be used continuously .

A popular case , Let's assume that our current scheme of sub database and sub table is 10 library 100 surface , So at some point in the future , if 10 A library is still unable to cope with the flow pressure of users , perhaps 10 When the disk usage of libraries is about to reach the physical limit , Our scheme can smoothly expand the capacity .

In the following article, we will introduce the doubling expansion method and consistency commonly used in the industry Hash Expansion method .

2.2 Data skew problem

A good database and table scheme , Its data should be evenly distributed in each database table . If we design a database and table with a racket head , It is easy to encounter the following similar problems :

a、 In a database instance , There are a lot of data in some tables , The data in other tables are very few , Business performance is often delayed from high to low , Erratic .b、 In the database cluster , The disk usage growth of some clusters is particularly large , The disk growth of some clusters is very slow . The growth pace of each library is inconsistent , This situation will lead to inconsistent pace of subsequent capacity expansion , The problem of not being able to operate uniformly .

Here, we define the maximum data skew rate of sub database and sub table as :( The sample with the largest amount of data - The smallest sample size )/ The smallest sample size . Generally speaking , If our maximum data skew rate is 5% Within is acceptable .

3、 ... and 、 Common database and table splitting schemes

3.1 Range Sub database and sub table

seeing the name of a thing one thinks of its function , The scheme divides the storage location of data according to the data range .

Take the simplest example , We can take the order form by year , The annual data is stored in a separate library ( Or table ) in . As shown in the figure below :

/**
 *  Through the year table 
 *
 * @param orderId
 * @return
 */
public static String rangeShardByYear(String orderId) {
    int year = Integer.parseInt(orderId.substring(0, 4));
    return "t_order_" + year;
}

Divide the database and table according to the range of data , This scheme is one of the most simple sub database schemes , It can also be flexibly combined with other sub database and sub table schemes . Nowadays, distributed database is very popular :TiDB database , in the light of TiKV Scattering of data in , Is based on Range By , Put... In different ranges [StartKey,EndKey) Assign to different Region On .

Let's look at the disadvantages of this scheme :

  • a、 The most obvious problem is the data hotspot , For example, the order table in the above case , Obviously, the database table of the current year belongs to hot data , Need to carry most of the IO And computing resources .
  • b、 The addition of new libraries and tables . Generally, our online applications do not have the permission to build databases and tables , Therefore, we need to create a new database table in advance , Prevent online failures .

This is very easy to forget , Especially after running steadily for several years without iterative tasks , Or modules with frequent personnel alternation .

  • c、 Data processing within the cross scope of business . for instance , The order module cannot avoid data compensation logic in some intermediate states , That is, you need to scan the orders in the order table through scheduled tasks in the status of pending payment confirmation for a long time .

Here we need to pay attention to , Because it is through the year to divide the database and table , So on New Year's Day , Your scheduled tasks are likely to miss the data scan on the last day of the previous year .

3.2 Hash Sub database and sub table

Although there are many schemes for sub database and sub table , however Hash Sub database and sub table is the most popular and common scheme , It is also the most lengthy part of this article .

in the light of Hash Details of sub database and sub table , There is not much relevant information . Most of them explain the concept and give a few examples , The details are not particularly in-depth , If you rashly refer to and quote without combining your own business , It is very easy to have various problems in the later stage .

Before formally introducing this method of dividing database and table , Let's first look at some common error cases .

Common error case 1 : Data skew caused by non coprime relationship

public static ShardCfg shard(String userId) {
    int hash = userId.hashCode();
    //  The residual result of warehouse quantity is warehouse serial number 
    int dbIdx = Math.abs(hash % DB_CNT);
    //  The balance result of table quantity is table No 
    int tblIdx = Math.abs(hash % TBL_CNT);
 
    return new ShardCfg(dbIdx, tblIdx);
}

The above scheme is a misunderstanding that is particularly easy for first-time users to enter , use Hash Value is used to obtain the remainder of the number of sub databases and sub tables respectively , Get the library serial number and table serial number . Actually, think about it a little , We'll find out , With 10 library 100 Table as an example , If one Hash It's worth it 100 Take remainder as 0, So it's right 10 The remainder must also be 0.

This means that only 0 In the library 0 Tables can have data , And in other libraries 0 The table is always empty !

Similarly, we can deduce ,0 There are... In the library 100 A watch , Only 10 Zhang biaozhong ( The single digit is 0 Table serial number ) There can be data . This brings a very serious data skew problem , Because there can never be data in some tables , The maximum data skew rate reaches infinity .

So obviously , The scheme is a wrong scheme that does not achieve the expected effect . The general diagram of data scattering is as follows :

in fact , As long as the library quantity and table quantity are not coprime , There will be no data in some tables .

Prove the following :

So, can we use this database and table splitting scheme as long as the library quantity and table quantity are mutually qualitative ? For example, I use 11 library 100 The scheme of the table , Is it reasonable ?

The answer is No , In addition to considering the problem of data skew , We also need to consider the issue of sustainable expansion , Generally, this kind of Hash In the scheme of sub database and sub table, the capacity expansion method in the later stage is the double capacity expansion method , that 11 After doubling the library , and 100 No longer mutual prime .

Of course , If the number of sub libraries and sub tables are not only coprime , And the number of sub tables is odd ( for example 10 library 101 surface ), In theory, the scheme can be used , But I think most people may find it strange to use odd sub tables .

Common error case 2 : Capacity expansion is unsustainable

If you avoid the trap of case 1 above , Then we can easily fall into another trap , The general idea is as follows ;

We put 10 library 100 The table shows a total of 1000 Logic table , Will find Hash It's worth it 1000 Remainder , Get a value between [0,999) The number in , Then divide the number twice into each library and each table , The approximate logic code is as follows :

public static ShardCfg shard(String userId) {
        // ①  count Hash
        int hash = userId.hashCode();
        // ②  The total number of pieces 
        int sumSlot = DB_CNT * TBL_CNT;
        // ③  Piece number 
        int slot = Math.abs(hash % sumSlot);
        // ④  Error case of calculating library serial number and table serial number 
        int dbIdx = slot % DB_CNT ;
        int tblIdx = slot / DB_CNT ;
 
        return new ShardCfg(dbIdx, tblIdx);
    }

This scheme is really a clever solution to the problem of data skew , as long as Hash The value is uniform enough , Then, in theory, the allocation of serial numbers will be enough average , Therefore, the amount of data in each database and table can also maintain a more balanced state .

But there is a big problem with this scheme , That is when calculating the table serial number , Depends on the total number of Libraries , Then, when the subsequent doubling expansion method is used for capacity expansion , The data before and after capacity expansion are not in the same table , Therefore, it is impossible to implement .

Such as in the figure above , For example, before capacity expansion Hash by 1986 The data should be stored in 6 library 98 surface , But double the capacity to 20 library 100 After the table , It is assigned to 6 library 99 surface , The table serial number is offset . In this case , We will expand the capacity later , Not just migrate data based on Libraries , Also migrate data based on tables , Very troublesome and error prone .

After reading the above several typical error cases , So what are the more correct solutions ? The following will introduce several types of... In combination with some actual scenarios and cases Hash The scheme of sub database and sub table .

Common posture 1 : Standard secondary slice method

In the above error case 2 , The overall idea is completely correct , Only when calculating the library serial number and table serial number finally , The library quantity is used as the factor affecting the table serial number , As a result, the table serial number is offset during capacity expansion and cannot be expanded .

in fact , We just need to write it differently , We can get a more popular database and table scheme .

public static ShardCfg shard2(String userId) {
        // ①  count Hash
        int hash = userId.hashCode();
        // ②  The total number of pieces 
        int sumSlot = DB_CNT * TBL_CNT;
        // ③  Piece number 
        int slot = Math.abs(hash % sumSlot);
        // ④  Re modify the secondary evaluation scheme 
        int dbIdx = slot / TBL_CNT ;
        int tblIdx = slot % TBL_CNT ;
 
        return new ShardCfg(dbIdx, tblIdx);
    }

You can notice , The difference from the error case 2 is that the logic of recalculating the library serial number and table serial number by assigning serial number has changed . Its distribution is as follows :

So why can we use this scheme to have good extended persistence ? Let's make a brief proof :

From the above conclusion, we know that , After doubling and capacity expansion , The serial number of our watch must remain unchanged , The library serial number may still be in the original library , It may also be translated into a new library ( Original warehouse serial number plus original sub warehouse number ), A capacity expansion and persistence scheme that fully meets our needs .

【 Program drawback 】

1、 The double expansion method has high operability in the early stage , But in the future, if the number of sub libraries is already dozens of large , Each expansion is very resource consuming .

2、 Continuous slice key Hash The value probability will be scattered in the same library , Some businesses may be prone to library hotspots ( For example, a newly generated user Hash Adjacent and increasing , And the new users are active users with high probability , Then the new users generated in a period of time will be concentrated in several adjacent Libraries ).

Common posture 2 : Relational table redundancy

We can record the relationship between the partition key and the library through the relationship table , We call this relationship table " Routing relation table ".

public static ShardCfg shard(String userId) {
        int tblIdx = Math.abs(userId.hashCode() % TBL_CNT);
        //  Get from cache 
        Integer dbIdx = loadFromCache(userId);
        if (null == dbIdx) {
            //  Get... From the routing table 
            dbIdx = loadFromRouteTable(userId);
            if (null != dbIdx) {
                //  Save to cache 
                saveRouteCache(userId, dbIdx);
            }
        }
        if (null == dbIdx) {
            //  Here you can freely implement the logic of the computing library 
            dbIdx = selectRandomDbIdx();
            saveToRouteTable(userId, dbIdx);
            saveRouteCache(userId, dbIdx);
        }
 
        return new ShardCfg(dbIdx, tblIdx);
    }

The scheme is still through conventional Hash Algorithm calculation table serial number , When calculating the library serial number , Then read data from the routing table . Because in every data query , You need to read the routing table , Therefore, we need to record the corresponding relationship between the partition key and the library serial number in the cache at the same time to improve the performance .

In the above example selectRandomDbIdx Method is used to generate the repository serial number corresponding to the fragment key , This side can be very flexible dynamic configuration . For example, you can assign a weight to each library , Those with significant weight are more likely to be selected , The weight is configured as 0 You can close the assignment of some libraries . When the data is found to be skewed , You can also adjust the weight so that the usage of each library tends to be close to .

The scheme has another advantage , In theory, it is the time for subsequent capacity expansion , Just mount the new database node , Configure the weight to a larger value , No data migration is required to complete .

As shown in the figure below : At first we were 4 Databases are assigned the same weight , Theoretically, the probability of data falling in each database is equal . But because users also have high frequency and low frequency , Maybe the data of some libraries will grow faster . When a new database node is mounted , We flexibly adjust the new weight of each library .

The solution seems to solve many problems , So is there any inappropriate scene ? Of course. , This scheme is not suitable in many scenarios , Here is an example of .

a、 Every time you read data, you need to access the routing table , Although caching is used , But there is still some performance loss .

b、 Storage aspect of routing relation table , Some scenes are not appropriate . For example, in the above case, the user id The scale is probably in 10 Million within , We can store the relational table with a single database hundred tables . But if, for example, you want to use a file MD5 The summary value is used as the slice key , Because the sample set is too large , Can't be for every md5 Values to specify the relationship ( Of course we can use it md5 front N Bits to store relationships ).

c、 Hunger occupation problem , Details are as follows

We know , The feature of this scheme is that there is no need to expand the capacity in the future , The weight can be modified at any time to adjust the storage growth rate of each library . But this vision is more ethereal , And difficult to implement , We select a simple business scenario and consider the following issues .

【 Business scenario 】: Take the cloud disk service where users store files to the cloud as an example , The user's file information needs to be designed by database and table , There are the following hypothetical scenarios :

  • ① Suppose there is 2 Million theoretical users , Let's say that there is 3000W Effective users .
  • ② The average file size per user is 2000 Within a
  • ③ user id For random 16 A string
  • ④ The initial stage is 10 library , Each library 100 A watch .

We use the routing table to record the serial number information of the library where each user is located . Then the scheme will have the following problems :

First of all : We have 2 Billion users , Only 3000W Users who have generated transactions . If the program does not handle , If the user initiates any request, the routing table data is created , This will lead to the creation of routing tables in advance for a large number of users who actually have no transaction data .

I encountered this problem when I first stored cloud disk user data , client app The user space usage will be queried on the home page , This results in routing for each user almost from the beginning . as time goes on , This part has no data " silent " Users of , He may start his cloud disk use journey at any time and “ recovery ”, As a result, the library in which it is located grows rapidly and exceeds the space capacity limit of a single library , Thus forced to split and expand .

The solution to this problem , In fact, it is only for transaction operations ( For example, buy space , Upload data , Create folders, etc ) To allocate routes , In this way, there is some investment in the code level .

second 、 Follow the business scenario described above , The average end user has 2000 Data , Assume that the size of each row is 1K, In order to ensure B+ The hierarchy of numbers is 3 layer , We limit the amount of data per table to 2000W, The number of sub tables is 100 Words , We can get that theoretically, the number of users per library cannot exceed 100W Users .

That is, if it is 3000W Users who have generated transactions , We need to allocate 30 Databases , This will be in the early stage of the business , When the average amount of user data is relatively small , There is a huge waste of database resources .

Solve the second problem , We can generally put many databases on one instance , Subsequently, it will be split with the growth . You can also target the library that will be full later , Use conventional methods to split and migrate .

Common posture three : Gene method

Or inspired by the wrong case , We found that the main reason for the irrationality of case 1 , It is because the calculation logic of Library serial number and table serial number , The factor that has a common divisor affects the independence of the database table .

So can we change our thinking ? We use relatively independent Hash Value to calculate library serial number and table serial number .

public static ShardCfg shard(String userId) {
    int dbIdx = Math.abs(userId.substring(0, 4).hashCode() % DB_CNT );
    int tblIdx = Math.abs(userId.hashCode() % TBL_CNT);
    return new ShardCfg(dbIdx, tblIdx);
}

As shown above , We made some changes when calculating the library serial number , We use the first four digits of the fragment key as Hash Value to calculate the library serial number .

This is also a common scheme , We call it genetic method , That is, some genes in the original fragment bond ( For example, the top four ) As the calculation factor of the library , Other genes are used as the calculation factors of the table . The scheme is also a lot of practice schemes or variants on the Internet , It seems to have solved the problem very skillfully , However, in the actual generation process, we still need to be careful .

The author has adopted this scheme in the practice of database and table division of the spatial module of the cloud disk , Use 16 library 100 Table split data , The data is normal at the initial stage of launch . However, when the data size increases , It is found that the number of users per library is seriously unequal , Therefore, it is speculated that there is a certain data skew in the scheme .

To test the point , Perform the following tests , Random 2 Billion users id(16 Bit random string ), For different M library N Table scheme , After repeated several times, find the average value, and the conclusion is as follows :

8 library 100 surface 
min=248305(dbIdx=2, tblIdx=64), max=251419(dbIdx=7, tblIdx=8), rate= 1.25%            √
16 library 100 surface 
min=95560(dbIdx=8, tblIdx=42), max=154476(dbIdx=0, tblIdx=87), rate= 61.65%           ×
20 library 100 surface 
min=98351(dbIdx=14, tblIdx=78), max=101228(dbIdx=6, tblIdx=71), rate= 2.93%

We found that , The number of sub libraries is 16, The number of sub tables is 100, The minimum number of rows is only 10W Less than , But the most has reached 15W+, The maximum data skew rate is as high as 61%. Follow this trend , In the later stage, it is likely that the capacity of a database has been used up , And the other one is left 30%+ The capacity of .

The plan is not necessarily bad , But when we adopt , To synthesize the sample rules of the fragment key , Number of selected fragment key prefixes , Number of Libraries , Number of tables , All four variables have an effect on the final deflection rate .

For example, in the above example , If not 16 library 100 surface , It is 8 library 100 surface , perhaps 20 library 100 surface , The data skew rate can be reduced to 5% The following acceptable ranges . So the hidden of the scheme " pit " More , We should not only estimate the deflection rate at the beginning of the line , It is also necessary to calculate the data bias slope after several times of doubling and expansion .

For example, you use the perfect one at the beginning 8 library 100 The scheme of the table , Later expansion into 16 library 100 Table time , Trouble follows .

Common posture four : The common factor elimination method

Or inspired by the wrong case , In many scenarios, we still want to be adjacent Hash Can be divided into different libraries . It's like N When using the inventory list , We usually use... Directly to calculate the library serial number Hash The value is used to take the remainder of the warehouse quantity .

So is there a way to remove the influence of the common factor ? The following is an implementation case that can be considered :

public static ShardCfg shard(String userId) {
        int dbIdx = Math.abs(userId.hashCode() % DB_CNT);
        //  When calculating the table serial number, the influence of common divisor shall be eliminated first 
        int tblIdx = Math.abs((userId.hashCode() / TBL_CNT) % TBL_CNT);
        return new ShardCfg(dbIdx, tblIdx);
}

After calculation , The maximum data skewness of this scheme is also relatively small , For many businesses, from N library 1 Upgrade table to N library M Table below , In the case that the library serial number needs to be maintained unchanged, you can consider .

Common posture five : Uniformity Hash Law

Uniformity Hash The algorithm is also a popular cluster data partition algorithm , such as RedisCluster That is, through consistency Hash Algorithm , Use 16384 A virtual slot node manages each piece of data . About consistency Hash I won't repeat the description here , Readers can look through the materials by themselves .

Here's how to use consistency in detail Hash Carry out the design of sub database and sub table .

We usually persist the configuration of each actual node in a configuration item or database , When the application starts or switches, it will load the configuration . The configuration generally includes a [StartKey,Endkey) Left closed right open interval and a database node information , for example :

Sample code :

private TreeMap<Long, Integer> nodeTreeMap = new TreeMap<>();
 
@Override
public void afterPropertiesSet() {
    //  Load partition configuration at startup 
    List<HashCfg> cfgList = fetchCfgFromDb();
    for (HashCfg cfg : cfgList) {
        nodeTreeMap.put(cfg.endKey, cfg.nodeIdx);
    }
}
 
public ShardCfg shard(String userId) {
    int hash = userId.hashCode();
    int dbIdx = nodeTreeMap.tailMap((long) hash, false).firstEntry().getValue();
    int tblIdx = Math.abs(hash % 100);
    return new ShardCfg(dbIdx, tblIdx);
}

We can see , This form is similar to that described above Range The sub tables are very similar ,Range The database and table splitting method divides the range according to the partition key itself , And consistency Hash It's for the slice key Hash Value for range configuration .

Formal consistency Hash The algorithm will introduce virtual nodes , Each virtual node points to a real physical node . This design scheme is mainly to add new nodes , There can be a scheme to ensure that the data level of each node is almost equal to the pressure of each node after migration .

But it is used in sub database and sub table , Generally, most of them only use actual nodes , There are few cases of introducing virtual nodes , There are mainly the following reasons :

a、 The application takes extra time and memory to load the configuration information of the virtual node . If there are many virtual nodes , Memory usage will also be somewhat less optimistic .b、 because mysql There is a very perfect master-slave replication scheme , Instead of migrating by filtering the range data to be migrated from each virtual node , It is not as simple and controllable as deleting redundant data after upgrading from the library .c、 The main pain point of virtual node is the load imbalance of each node in the process of node data relocation , The virtual nodes are dispersed to each node and the pressure is shared equally for processing .

And as a OLTP database , We rarely need to suddenly take a database offline , After a new node is added, it will not start from 0 Start moving data from other nodes , It's a way to prepare most of the data in advance , Therefore, it is generally unnecessary to introduce virtual nodes to increase the complexity .

Four 、 Common capacity expansion schemes

4.1 Double expansion method

The main thinking of the double expansion method is that each expansion , The number of libraries is doubled , The doubled data source is usually obtained from the original data source through master-slave replication. The slave library is upgraded to the master library to provide services . So some documents call it " Upgrade from library ".

Theoretically , After doubling and expansion , We will double the database used to store data and deal with traffic , The disk usage of the original database will also be released by half . As shown in the figure below :

The specific process is as follows :

①、 Point in time t1: Add slave libraries for each node , Enable master-slave synchronization for data synchronization .

②、 Point in time t2: After master-slave synchronization , Disable writing to the main library .

Writing is prohibited here to ensure the correctness of the data . If you do not disable writing , Data inconsistency will occur in the following two time windows :a、 After disconnecting the master and slave , If the main library can't help writing , If the main library still has data to write , This part of the data will not be synchronized to the slave Library .b、 The application cluster recognizes that the time point of doubling the number of sub databases cannot be strictly consistent , At a certain point in time, two applications may use different sub databases , Operation to different library serial numbers , Causes an error to write .

③、 Point in time t3: When the synchronization is complete , Break the master-slave relationship , In theory, the slave database and the master database have exactly the same data set .

④、 Point in time t4: Upgrade from library to cluster node , After the business application recognizes the new number of sub databases , A new routing algorithm will be applied .

In general , We put the configuration of sub database number into the configuration center , When the above three steps are completed , We modify the number of sub libraries to double , After the application takes effect , The application service will use the new configuration . What needs to be noted here is , The time when the business application receives the new configuration is not necessarily the same , So there must be a time window , During this period, some machines use the original sub warehouse number , Number of new sub libraries used by some nodes . This is why our write ban operation can only be released after this step is completed .

⑤、 Point in time t5: After confirming that all applications accept the configuration of the total number of Libraries , Release the write disable operation of the original primary Library , At this point, the application completely restores the service .

⑥、 Start offline scheduled tasks , Clear about half of the redundant data in each library .

To save disk usage , We can choose Offline scheduled tasks to clear redundant data . You can also design the table structure at the beginning of the business , The of the index key Hash The value is saved as a field . Then take the above four common postures as an example , Our offline cleaning task can be simply through sql That is to say ( Need to prevent locking the whole meter , Can be split into several id The children of the range sql perform ):delete from db0.tbl0 where hash_val mod 4 <> 0;delete from db1.tbl0 where hash_val mod 4 <> 1;delete from db2.tbl0 where hash_val mod 4 <> 2;delete from db3.tbl0 where hash_val mod 4 <> 3;

Refer to the following figure for specific expansion steps :

summary : As can be seen from the above Migration Scheme , From point in time t2 To t5 In the time window , You need to disable writing to the database , So the server is partially damaged in this time range , The overall time consumption of this stage is almost in the range of minutes . If business is acceptable , This operation can be performed during the low peak period .

Of course, there will be many applications that cannot tolerate the unavailability of minute level writes , For example, write operations are much larger than read operations , This can be combined with canel The open source framework performs data double write operation in the window period to ensure data consistency .

The scheme mainly relies on mysql Powerful and perfect master-slave synchronization mechanism , Most of the required data in the new node can be prepared in advance , Save a lot of manual data migration operations .

But the disadvantages are obvious , First, the whole service may need to be at the cost of damage , Second, the number of warehouses needs to be doubled for each expansion , It will waste a lot of database resources in advance .

4.2 Uniformity Hash Capacity expansion

Let's mainly look at consistency without virtual slots Hash Expansion method , If the current database node DB0 The load or disk usage is too large and needs to be expanded , We can achieve the effect shown in the figure below through capacity expansion .

The following figure , Three are configured before capacity expansion Hash piecewise , Find out [-Inf,-10000) When the amount of data in the range is too large or the pressure is too high , It needs to be expanded .

The main steps are as follows :

①、 Point in time t1: Add a slave node for the database node that needs to be expanded , Enable master-slave synchronization for data synchronization .

②、 Point in time t2: After master-slave synchronization , Disable writing to the original master database .

The reason here is similar to the double expansion method , It is necessary to ensure the data consistency between the new slave database and the original master database .

③、 Point in time t3: When the synchronization is complete , Break the master-slave relationship , In theory, the slave database and the master database have exactly the same data set .

④、 Point in time t4: Modify consistency Hash Configuration of scope , And make the application service read again and take effect .

⑤、 Point in time t5: Ensure that all applications receive new consistency Hash After range configuration , Release the write disable operation of the original primary Library , At this point, the application completely restores the service .

⑥、 Start offline scheduled tasks , Clear redundant data .

You can see , This scheme is similar to the scheme of double expansion method , But it's more flexible , The capacity can be selectively expanded according to the pressure of each node in the current cluster , Without doubling the capacity of the whole cluster at the same time .

5、 ... and 、 Summary

This paper mainly describes some common schemes in the design of horizontal database and table .

When we design the sub database and sub table , You can select, for example, range sub table ,Hash table , Routing table , Or consistency Hash Various schemes such as sub table . The sustainability of subsequent expansion needs to be fully considered when selecting , Factors such as maximum data skew rate .

Some common error examples are also listed , For example, the influence of common divisor in library table calculation logic , Use the first few bits to calculate the library serial number, common data skew factors, etc .

When we actually make a choice , Be sure to consider your own business characteristics , Fully verify the data skew degree of fragment key under various parameter factors , And plan and consider the subsequent expansion plan in advance .

author :vivo Platform product development team -Han Lei

原网站

版权声明
本文为[2020labs assistant]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/10/20211025184747407a.html