当前位置:网站首页>MySQL sub database and sub table and its smooth expansion scheme
MySQL sub database and sub table and its smooth expansion scheme
2022-07-24 03:01:00 【Young】
as everyone knows , Database is easy to become the bottleneck of application system . The resources and processing capacity of stand-alone database are limited , In a highly concurrent distributed system , We can break through the limitation of single machine by using separate database and table . This paper summarizes the related concepts of sub database and sub table 、 overall situation ID Generation strategy for 、 Fragmentation strategy 、 Smooth expansion scheme 、 And popular programs .
** author :** Ke Feng Wang
** Source :**https://kefeng.wang/2018/07/22/mysql-sharding/
Copyright : Free Reprint - Non commercial - Non derivative - Keep signature , Please indicate the author and source of the reprint .
1 Overview of sub database and sub table
When the business volume is small , Single warehouse and single table can support .
When the amount of data is too large to store 、 Or when the amount of concurrency is too large to afford the load , We should consider the sub base and sub table .
1.1 Related terms of sub database and sub table
- Read / write separation : Different databases , Synchronize the same data , They are only responsible for reading and writing data ;
- Partition : Specifies the partition list expression , Split records into different areas ( Must be the same server , It can be different hard disks ), The application seems to be the same table , There is no change ;
- sub-treasury : Multiple data sheets of a system , Store in multiple database instances ;
- table : For a multi line sheet ( Record ) Multiple columns ( Field ) Two dimensional data table , There are two situations :
(1) Vertical sub table : Vertical segmentation , Different sub tables store different fields , You can put unused or large capacity 、 Or split the fields of different businesses ;
(2) Horizontal sub table ( The most complicated ): Transverse segmentation , According to the specific segmentation algorithm , Different sub tables store different records .
1.2 We really need to use sub database and sub table ?
It should be noted that , Sub database and sub table will bring a series of complexity and performance loss to database maintenance and business logic , Unless the estimated volume of business is so large that it is a last resort , Don't over design 、 Premature optimization .
Data volume and performance problems during the planning period , Try the following ways to solve :
- Current data volume : If it doesn't reach millions , There is usually no need to divide the database and table ;
- Data volume problem : Add disk 、 Add sub database ( Different business menus , Split the whole table into different databases );
- Performance issues : upgrade CPU/ Memory 、 Read / write separation 、 Optimize database system configuration 、 Optimize the data table / Indexes 、 Optimize SQL、 Partition 、 Vertical segmentation of data table ;
- If it still doesn't work , Just think about the most complex solution : Horizontal segmentation of data table .
2 overall situation ID Generation strategy
2.1 Auto grow Columns
advantage : The database has its own functions , Orderly , Good performance .
shortcoming : Single warehouse, single table , If there is no plan for sub database and sub table ,ID May repeat . Solution :
2.1.1 Set auto increment offset and step
## Suppose there are 10 A minute table
## Level options : SESSION( Session level ), GLOBAL( overall situation )
SET @@SESSION.auto_increment_offset = 1; ## Starting value , The values are respectively 1~10
SET @@SESSION.auto_increment_increment = 10; ## Step increments
If the scheme is adopted , During the expansion, you need to migrate the existing data to the new partition .
2.1.2 overall situation ID The mapping table
In the global Redis Create one for each data table ID Key , Record the current maximum ID;
Every time you apply ID when , All increase by themselves 1 And back to the app ;
Redis To persist to the global database on a regular basis .
2.2 UUID(128 position )
Numbers generated on a machine , It's guaranteed to be unique to all machines in the same space-time . Usually the platform will provide a build UUID Of API.
UUID from 4 Hyphen (-) take 32 String generated after byte long string separation , in total 36 Byte length . Form like :550e8400-e29b-41d4-a716-446655440000.
UUID The calculation factors of include : Ethernet card address 、 Nanosecond time 、 chip ID Code and many possible numbers .
UUID It's a standard , There are several kinds of implementations , The most common is Microsoft's GUID(Globals Unique Identifiers).
advantage : Simple , The only global ;
shortcoming : Large storage and transmission space , disorder , Poor performance .
2.3 COMB( Combine )
Reference material :The Cost of GUIDs as Primary Keys
Combine GUID(10 byte ) And time (6 byte ), To achieve an orderly effect , Improve index performance .
2.4 Snowflake( snow ) Algorithm
Reference material :twitter/snowflake,Snowflake Algorithm details
Snowflake yes Twitter Open source distributed ID generating algorithm , The result is long(64bit) The numerical .
Its characteristic is that each node does not need to be coordinated 、 Roughly in order by time 、 And the nodes of the whole cluster are not repeated .
The default composition of this value is as follows ( Three parts outside the symbol bit allow for personalization ):
- 1bit: Sign bit , Always 0( To ensure that the value is positive ).
- 41bit: Number of milliseconds ( You can use 69 year );
- 10bit: node ID(5bit Data Center + 5bit node ID, Support 32 * 32 = 1024 Nodes )
- 12bit: Serial number ( Each node supports 4096 individual ID, amount to 409 Ten thousand QPS, At the same time ID In case of overturning , Wait until the next millisecond )
3 Fragmentation strategy
3.1 Continuous fragmentation
According to specific fields ( Such as user ID、 The order time ) The scope of the , Value in the interval , Divided into specific nodes .
advantage : After cluster expansion , Specify that the new range falls on the new node , No data migration is required .
shortcoming : If divided by time , Data hotspots are unevenly distributed ( The historical data is cold and the current data is hot ), Resulting in uneven node load .
3.3 ID Take the die and slice
shortcoming : Data migration is required after capacity expansion .
3.2 Uniformity Hash Algorithm
advantage : There is no need to migrate data after capacity expansion .
3.4 Snowflake Fragmentation
advantage : There is no need to migrate data after capacity expansion .
4 The introduction of sub database and sub table
4.1 Distributed transactions
See Solutions for distributed transactions
Due to two stages / The three-stage submission has great performance loss , The transaction compensation mechanism can be used instead .
4.2 Cross node JOIN
For a single library JOIN,MySQL Native supports ;
For multi Library , For performance reasons , Not recommended MySQL Self contained JOIN, The following scheme can be used to avoid cross node JOIN:
- Global table : Some stable common data sheets , Keep a copy in each database ;
- Field redundancy : Some common common fields , Keep a copy in each data sheet ;
- Application assembly : After the application obtains the data, it can be assembled .
in addition , Some ID Which node is the user information of , His correlation data ( Such as the order ) Also at which node , It can avoid distributed query .
4.3 Cross node aggregation
Can only be done on the application side .
But for paging queries , Pagination after each large aggregation , Poor performance .
4.4 Node expansion
After node expansion , The new sharding rule causes the sharding of data to change , Therefore, it is necessary to migrate data .
5 Node expansion scheme
Related information : Second level smooth database expansion scheme
5.1 Conventional scheme
If the number of additional nodes and capacity expansion operations are not planned , So the slice to which most of the data belongs has changed , Need to migrate between slices :
- Estimated migration time , Release service suspension notice ;
- Stop taking ( Users cannot use the service ), Use pre prepared migration scripts , Data migration ;
- Change to new fragmentation rule ;
- Start the server .
5.2 Migration free expansion
Adopt double expansion strategy , Avoid data migration . Data of each node before expansion , Half of them will be migrated to a new node , The correspondence is relatively simple .
The specific operation is as follows ( Suppose there is already 2 Nodes A/B, Double the capacity to A/A2/B/B2 this 4 Nodes ):
- There is no need to stop the application server ;
- Two new databases A2/B2 As a slave Library , Set the master-slave synchronization relationship to :A=>A2、B=>B2, Until the master-slave data synchronization is completed ( Early data can be synchronized manually );
- Adjust the fragmentation rule and make it effective :
primaryID%2=0 => AChange it toID%4=0 => A, ID%4=2 => A2;
primaryID%2=1 => BChange it toID%4=1 => B, ID%4=3 => B2. - Release the master-slave synchronization relationship of the database instance , And make it work ;
- here , The data of the four nodes are complete , It's just redundancy ( Save more data of the node that is paired with you ), Choose a machine to clear it ( At any time after that , It doesn't affect the business ).

5.3 Sub database and sub table & 10 billion level data migration
6 Scheme of sub database and sub table
6.1 Agent layer approach
Deploy a proxy server disguised as MySQL The server , The proxy server is responsible for communicating with the real server MySQL Docking of nodes , The application only interfaces with the proxy server . Transparent to applications .
such as MyCAT, Official website , Source code , Reference documents :MyCAT+MySQL Read write separation deployment
MyCAT The back end can support MySQL, SQL Server, Oracle, DB2, PostgreSQL Isomainstream database , Also support MongoDB This new type NoSQL How to store , More types of storage will be supported in the future .
MyCAT It's not just a read-write separation , And sub table sub database 、 Disaster recovery management , And it can be used for multi tenant application development 、 Cloud platform infrastructure , Let your architecture have strong adaptability and flexibility .
6.2 Application layer mode
At the business level and JDBC Middle layer , In order to JAR The package mode is provided to the application call , Intrusive to code . The main options are :
(1) Taobao.com TDDL: Has been in 2012 The maintenance channel was closed in , Not recommended .
(2) Dangdang. Com Sharding-JDBC: Still active maintenance :
It's a Dangdang application framework ddframe in , From the relational database module dd-rdb The database horizontal slicing framework separated from the , Database access is transparent , Realized Snowflake Sharding algorithm ;
Sharding-JDBC Positioned as lightweight Java frame , Use client direct connection database , No additional deployment required , No other dependence ,DBA There is no need to change the original operation and maintenance mode .
Sharding-JDBC Slicing strategy is flexible , Can support equal sign 、between、in And so on , Can also support multi partition key .
SQL The analysis function is perfect , Support aggregation 、 grouping 、 Sort 、limit、or Etc , And support Binding Table And Cartesian product table queries .
Sharding-JDBC Direct encapsulation JDBC API, It can be understood as an enhanced version of JDBC drive , The cost of migrating old code is almost zero :
- Can be applied to any based on Java Of ORM frame , Such as JPA、Hibernate、Mybatis、Spring JDBC Template Or use it directly JDBC.
- Can be based on any third-party database connection pool , Such as DBCP、C3P0、 BoneCP、Druid etc. .
- In theory, it can be implemented arbitrarily JDBC Standardized database . Although it only supports MySQL, But there is support Oracle、SQLServer Wait for the database plan .
边栏推荐
- Zone d'entraînement Web d'attaque et de défense (View source, get Post, robots)
- Wonderful! The description of meituan Octo distributed service management system is too clear
- Ugui source code analysis - iclippable
- Symbol類型
- Nirvana rebirth! Byte Daniel recommends a large distributed manual, and the Phoenix architecture makes you become a God in fire
- Tweenmax+svg Pikachu transformation ball
- go IO操作-文件写
- (6) Decorator extension [email protected] Principle of use
- PMP first-hand data and information acquisition
- openEuler 资源利用率提升之道 01:概论
猜你喜欢

Tweenmax+svg Pikachu transformation ball

Take you into the world of MySQL mvcc

Skywalking distributed system application performance monitoring tool - upper

compostion-api(setup中) watch使用细节

CMT registration - Google Scholar ID, semantic scholar ID, and DBLP ID
![SSM based blog system [with background management]](/img/6b/6a488f5d6926de07c8b1b365362ff6.png)
SSM based blog system [with background management]
![JS when transferring parameters, the incoming string has data; No data when number is passed in; 2[0] is right! Number type data can be subscripted](/img/4e/3d0c25d9579b6d5c00473048dbbd83.png)
JS when transferring parameters, the incoming string has data; No data when number is passed in; 2[0] is right! Number type data can be subscripted

SIGIR‘22 推荐系统论文之多样性篇

Diversity of SIGIR '22 recommendation system papers
[email protected] Principle of use"/>(6) Decorator extension [email protected] Principle of use
随机推荐
(6) Decorator extension [email protected] Principle of use
JVM initial
Ugui source code analysis - iclippable
Basic knowledge of trigger (Part 2)
The solution of using non root user management in secure stand-alone database under general machine environment
Analyze the overall planning of steam and maker education classroom
I developed an app similar to wechat runnable applet with fluent
The function of SIP account - tell you what is SIP line
Unity message push
C language exercises
Symbol类型
老公,我们现在无家可归了
攻防世界WEB练习区(weak_auth、simple_php、xff_referer)
PMP preparation experience | good habits, good process, good results
How to get gait energy map Gei
软考---程序设计语言基础(上)
Soft test --- fundamentals of programming language (Part 1)
Ugui source code analysis - maskutilities
Babylon.js cool canvas background animation JS special effects
Data Lake (XV): spark and iceberg integrate write operations