当前位置：网站首页>MySQL sub database and sub table and its smooth expansion scheme

MySQL sub database and sub table and its smooth expansion scheme

2022-07-24 03:01:00 【Young】

as everyone knows , Database is easy to become the bottleneck of application system . The resources and processing capacity of stand-alone database are limited , In a highly concurrent distributed system , We can break through the limitation of single machine by using separate database and table . This paper summarizes the related concepts of sub database and sub table 、 overall situation ID Generation strategy for 、 Fragmentation strategy 、 Smooth expansion scheme 、 And popular programs .

** author ：** Ke Feng Wang
** Source ：**https://kefeng.wang/2018/07/22/mysql-sharding/
Copyright ： Free Reprint - Non commercial - Non derivative - Keep signature , Please indicate the author and source of the reprint .

1 Overview of sub database and sub table

When the business volume is small , Single warehouse and single table can support .
When the amount of data is too large to store 、 Or when the amount of concurrency is too large to afford the load , We should consider the sub base and sub table .

1.1 Related terms of sub database and sub table

Read / write separation : Different databases , Synchronize the same data , They are only responsible for reading and writing data ;
Partition : Specifies the partition list expression , Split records into different areas ( Must be the same server , It can be different hard disks ), The application seems to be the same table , There is no change ;
sub-treasury ： Multiple data sheets of a system , Store in multiple database instances ;
table : For a multi line sheet ( Record ) Multiple columns ( Field ) Two dimensional data table , There are two situations ：
(1) Vertical sub table : Vertical segmentation , Different sub tables store different fields , You can put unused or large capacity 、 Or split the fields of different businesses ;
(2) Horizontal sub table ( The most complicated ): Transverse segmentation , According to the specific segmentation algorithm , Different sub tables store different records .

1.2 We really need to use sub database and sub table ？

It should be noted that , Sub database and sub table will bring a series of complexity and performance loss to database maintenance and business logic , Unless the estimated volume of business is so large that it is a last resort , Don't over design 、 Premature optimization .
Data volume and performance problems during the planning period , Try the following ways to solve ：

Current data volume ： If it doesn't reach millions , There is usually no need to divide the database and table ;
Data volume problem ： Add disk 、 Add sub database ( Different business menus , Split the whole table into different databases );
Performance issues ： upgrade CPU/ Memory 、 Read / write separation 、 Optimize database system configuration 、 Optimize the data table / Indexes 、 Optimize SQL、 Partition 、 Vertical segmentation of data table ;
If it still doesn't work , Just think about the most complex solution ： Horizontal segmentation of data table .

2 overall situation ID Generation strategy

2.1 Auto grow Columns

advantage ： The database has its own functions , Orderly , Good performance .
shortcoming ： Single warehouse, single table , If there is no plan for sub database and sub table ,ID May repeat . Solution ：

2.1.1 Set auto increment offset and step

##  Suppose there are  10  A minute table 
##  Level options : SESSION( Session level ), GLOBAL( overall situation )
SET @@SESSION.auto_increment_offset = 1; ##  Starting value ,  The values are respectively  1~10
SET @@SESSION.auto_increment_increment = 10; ##  Step increments

If the scheme is adopted , During the expansion, you need to migrate the existing data to the new partition .

2.1.2 overall situation ID The mapping table

In the global Redis Create one for each data table ID Key , Record the current maximum ID;
Every time you apply ID when , All increase by themselves 1 And back to the app ;
Redis To persist to the global database on a regular basis .

2.2 UUID(128 position )

Numbers generated on a machine , It's guaranteed to be unique to all machines in the same space-time . Usually the platform will provide a build UUID Of API.
UUID from 4 Hyphen (-) take 32 String generated after byte long string separation , in total 36 Byte length . Form like ：550e8400-e29b-41d4-a716-446655440000.
UUID The calculation factors of include ： Ethernet card address 、 Nanosecond time 、 chip ID Code and many possible numbers .
UUID It's a standard , There are several kinds of implementations , The most common is Microsoft's GUID(Globals Unique Identifiers).

advantage ： Simple , The only global ;
shortcoming ： Large storage and transmission space , disorder , Poor performance .

2.3 COMB( Combine )

Reference material ：The Cost of GUIDs as Primary Keys
Combine GUID(10 byte ) And time (6 byte ), To achieve an orderly effect , Improve index performance .

2.4 Snowflake( snow ) Algorithm

Reference material ：twitter/snowflake,Snowflake Algorithm details
Snowflake yes Twitter Open source distributed ID generating algorithm , The result is long(64bit) The numerical .
Its characteristic is that each node does not need to be coordinated 、 Roughly in order by time 、 And the nodes of the whole cluster are not repeated .
The default composition of this value is as follows ( Three parts outside the symbol bit allow for personalization )：

1bit: Sign bit , Always 0( To ensure that the value is positive ).
41bit: Number of milliseconds ( You can use 69 year );
10bit: node ID(5bit Data Center + 5bit node ID, Support 32 * 32 = 1024 Nodes )
12bit: Serial number ( Each node supports 4096 individual ID, amount to 409 Ten thousand QPS, At the same time ID In case of overturning , Wait until the next millisecond )

3 Fragmentation strategy

3.1 Continuous fragmentation

According to specific fields ( Such as user ID、 The order time ) The scope of the , Value in the interval , Divided into specific nodes .
advantage ： After cluster expansion , Specify that the new range falls on the new node , No data migration is required .
shortcoming ： If divided by time , Data hotspots are unevenly distributed ( The historical data is cold and the current data is hot ), Resulting in uneven node load .

3.3 ID Take the die and slice

shortcoming ： Data migration is required after capacity expansion .

3.2 Uniformity Hash Algorithm

advantage ： There is no need to migrate data after capacity expansion .

3.4 Snowflake Fragmentation

advantage ： There is no need to migrate data after capacity expansion .

4 The introduction of sub database and sub table

4.1 Distributed transactions

See Solutions for distributed transactions
Due to two stages / The three-stage submission has great performance loss , The transaction compensation mechanism can be used instead .

4.2 Cross node JOIN

For a single library JOIN,MySQL Native supports ;
For multi Library , For performance reasons , Not recommended MySQL Self contained JOIN, The following scheme can be used to avoid cross node JOIN：

Global table : Some stable common data sheets , Keep a copy in each database ;
Field redundancy : Some common common fields , Keep a copy in each data sheet ;
Application assembly ： After the application obtains the data, it can be assembled .

in addition , Some ID Which node is the user information of , His correlation data ( Such as the order ) Also at which node , It can avoid distributed query .

4.3 Cross node aggregation

Can only be done on the application side .
But for paging queries , Pagination after each large aggregation , Poor performance .

4.4 Node expansion

After node expansion , The new sharding rule causes the sharding of data to change , Therefore, it is necessary to migrate data .

5 Node expansion scheme

Related information : Second level smooth database expansion scheme

5.1 Conventional scheme

If the number of additional nodes and capacity expansion operations are not planned , So the slice to which most of the data belongs has changed , Need to migrate between slices ：

Estimated migration time , Release service suspension notice ;
Stop taking ( Users cannot use the service ), Use pre prepared migration scripts , Data migration ;
Change to new fragmentation rule ;
Start the server .

5.2 Migration free expansion

Adopt double expansion strategy , Avoid data migration . Data of each node before expansion , Half of them will be migrated to a new node , The correspondence is relatively simple .
The specific operation is as follows ( Suppose there is already 2 Nodes A/B, Double the capacity to A/A2/B/B2 this 4 Nodes )：

There is no need to stop the application server ;
Two new databases A2/B2 As a slave Library , Set the master-slave synchronization relationship to ：A=>A2、B=>B2, Until the master-slave data synchronization is completed ( Early data can be synchronized manually );
Adjust the fragmentation rule and make it effective ：
primary ID%2=0 => A Change it to ID%4=0 => A, ID%4=2 => A2;
primary ID%2=1 => B Change it to ID%4=1 => B, ID%4=3 => B2.
Release the master-slave synchronization relationship of the database instance , And make it work ;
here , The data of the four nodes are complete , It's just redundancy ( Save more data of the node that is paired with you ), Choose a machine to clear it ( At any time after that , It doesn't affect the business ).

5.3 Sub database and sub table & 10 billion level data migration

6 Scheme of sub database and sub table

6.1 Agent layer approach

Deploy a proxy server disguised as MySQL The server , The proxy server is responsible for communicating with the real server MySQL Docking of nodes , The application only interfaces with the proxy server . Transparent to applications .
such as MyCAT, Official website , Source code , Reference documents ：MyCAT+MySQL Read write separation deployment
MyCAT The back end can support MySQL, SQL Server, Oracle, DB2, PostgreSQL Isomainstream database , Also support MongoDB This new type NoSQL How to store , More types of storage will be supported in the future .
MyCAT It's not just a read-write separation , And sub table sub database 、 Disaster recovery management , And it can be used for multi tenant application development 、 Cloud platform infrastructure , Let your architecture have strong adaptability and flexibility .

6.2 Application layer mode

At the business level and JDBC Middle layer , In order to JAR The package mode is provided to the application call , Intrusive to code . The main options are ：
(1) Taobao.com TDDL: Has been in 2012 The maintenance channel was closed in , Not recommended .
(2) Dangdang. Com Sharding-JDBC: Still active maintenance ：
It's a Dangdang application framework ddframe in , From the relational database module dd-rdb The database horizontal slicing framework separated from the , Database access is transparent , Realized Snowflake Sharding algorithm ;
Sharding-JDBC Positioned as lightweight Java frame , Use client direct connection database , No additional deployment required , No other dependence ,DBA There is no need to change the original operation and maintenance mode .
Sharding-JDBC Slicing strategy is flexible , Can support equal sign 、between、in And so on , Can also support multi partition key .
SQL The analysis function is perfect , Support aggregation 、 grouping 、 Sort 、limit、or Etc , And support Binding Table And Cartesian product table queries .

Sharding-JDBC Direct encapsulation JDBC API, It can be understood as an enhanced version of JDBC drive , The cost of migrating old code is almost zero ：

Can be applied to any based on Java Of ORM frame , Such as JPA、Hibernate、Mybatis、Spring JDBC Template Or use it directly JDBC.
Can be based on any third-party database connection pool , Such as DBCP、C3P0、 BoneCP、Druid etc. .
In theory, it can be implemented arbitrarily JDBC Standardized database . Although it only supports MySQL, But there is support Oracle、SQLServer Wait for the database plan .

原网站

版权声明
本文为[Young]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/204/202207222336143132.html