当前位置:网站首页>What do you pay special attention to when you insert / update / delete / obtain millions of rows of data in a DML statement?
What do you pay special attention to when you insert / update / delete / obtain millions of rows of data in a DML statement?
2022-06-22 23:49:00 【Kunlunbase Kunlun database】
Preface
Any node of a distributed computing and storage system may be overloaded , Node calculation 、 Insufficient storage resources , Network delay , The network is temporarily unreachable, resulting in operation timeout .
Any operation of the distributed system while waiting for the remote node to return , Usually hold various resources , You can't wait indefinitely , Otherwise, the overall operation of the system will be blocked and gradually stagnate .
So timeout control is a problem that all distributed systems need to solve , And if it is not solved well, it will lead to the stagnation of the system , Not working properly .
A brief introduction to the timeout control mechanism of Kunlun distributed database
Kunlun distributed database has the following timeout control variables :
Part of it is in the computing node , The timeout variables of the compute node are all in the configuration file of the compute node instance , It can be modified as needed , And refresh the parameters of the running instance after modification .
Part of it is in the storage node , The timeout variable of the storage node is in the storage node configuration file , Profile can be modified , It can also be performed on a compute node or a storage node set Statement to modify the corresponding variable value .
In general, users do not need to modify these variables , Because we have optimized the configuration parameters of the computing node and the storage node for the general situation .
But in the Special scenes You still need to modify these timeout variables .
A typical scenario is to be in a DML Insert... In the statement / to update / Delete millions of rows or more of data , Or a select Statements return millions of rows or more of data .
for example , Logical import of large data tables or full data , Update the entire table for the super large table , Data analysis (OLAP) The query needs to scan a very large table , And programmers or DBA I plan to delete the database and run away .
In these scenarios, it is best for the user to insert according to the estimation / to update / Delete / Amount of data read , Increase the following timeout values in advance , To ensure that relevant statements and operations can work normally until they are completed , It will not be mistaken by the timeout mechanism as a statement that has timed out and cannot be executed correctly and terminated in advance .
Or the user can try these operations and get errors , Increase these timeout values .
Let's take a look at all timeout control variables of Kunlun distributed database .
Calculate the timeout variable function of the node
1. statement_timeout: Statement timeout .
If the total query execution time of the computing node exceeds this limit , The statement will be rolled back .
such as , If the computing node uses part of the data returned by the storage cluster to perform table connection, the time consumption is too long , Then it will eventually stop after the timeout limit is reached ( Default 100 second ).
2. mysql_read_timeout and mysql_write_timeout: Compute nodes vs. storage nodes / The communication between metadata nodes ( Reading and writing ) Overtime .
Read more than mysql_read_timeout Or write more than mysql_write_timeout Then the calculation node uses mysql The client library will report an error and read from the / Write waiting returns , In this way, the execution of the statement is terminated in advance .
If one is sent to the computing node insert Statement will insert 100 Ten thousand rows of data , Or one select Statement will return millions of rows of data from the storage node , Then it is better to increase the value of these two variables , By default they are 50 second .
in addition , In this case, it needs to be increased mysql_max_packet_size Variable , Ensure that such large packets can be sent to the storage node correctly .
3. lock_timeout: Calculate the time that the node waits for the table lock .
The addition, deletion, modification and query statements executed concurrently are compatible with the table , No need to wait for the lock .
But if one alter table Statement is executing , In this case, other connections on the same compute node cannot execute the DML sentence , They can't wait that long , If you can't get the lock, you will report an error and return ( Default 100 second ).
3. log_min_duration_statement: Statements that exceed this time will be recorded in the log file as slow queries .
If you want to be in each insert Insert tens of thousands of lines or more into the statement , Then we must increase this variable , Otherwise, a large amount of data will be recorded in the log file , This causes the computing node to run out of disk space ( Default 10 second ).
The timeout variable function of the storage node
1. lock_wait_timeout:mysql server Lock timeout variable of layer .
wait for server Maximum time of table lock of layer . If one DDL Statements in alter table, So all the things you do to this table DML Statement will block up to so many tables waiting , If the table lock is not obtained, an error will be returned .
stay MySQL8.0 Time , The most common operations, such as adding columns and adding citations, which once had to lock the entire table, no longer require long-term locking of the entire table , Has become online ddl, So default 5 Seconds are generally enough .
2. innodb_lock_wait_timeout:mysql innodb Lock timeout variable of , wait for innodb Maximum time of row lock .
More than that DML Statement will report an error and return .
If you want to update the whole table , And the amount of data in the table is very large , For example, hundreds of GB Even more , that update Statement will lock a large number of lines for a long time , At this time, other transactions usually have lock timeouts , Unless its innodb_lock_wait_timeout( Default 20 second ).
3. If the storage cluster uses MySQL Group Replication High availability , Then you need to increase
MGR Of group_replication_member_expel_timeout,group_replication_component_stop_timeout, group_replication_unreachable_majority_timeout Timeout control variable , otherwise MGR The standby node of is down by mistake, thus initiating the active / standby switchover , Or the primary node loses contact with the standby machine and cannot write to it .
Conclusion
Kunlun distributed database has a perfect timeout control mechanism , There is a timeout control in any inter node communication mechanism , Ensure that any operation has a maximum time consumption limit , Ensure that the system status can continue to advance , System resources can continuously serve more service requests .
The project is open source
【GitHub:】
https://github.com/zettadb
【Gitee:】
https://gitee.com/zettadb
THE END
边栏推荐
猜你喜欢

'dare not doubt the code, but have to doubt the code 'a network request timeout analysis

JSBridge

Synchronization circuit and cross clock domain circuit design 2 -- cross clock domain transmission (FIFO) of multi bit signals

swagger2 使用方法

Reverse proxy haproxy

Learning the interpretable representation of quantum entanglement, the depth generation model can be directly applied to other physical systems

07 项目成本管理

【GO】Go数组和切片(动态数组)

弱电转职业网工难不难?华为售前工程师分享亲身经历

10 Super VIM plug-ins, I can't put them down
随机推荐
OJ daily practice - sorting and naming
Enterprise digitalization is not a separate development, but a comprehensive SaaS promotion
Fibonacci sequence set
在Word中自定义多级列表样式
Reddit's discussion on lamda model: it is not stateless. It adopts a dual process. Compared with the way it edits Wikipedia, it doesn't matter whether it has feelings or not
反向代理HAProxy
OLAP ——Druid简介
昆仑分布式数据库Sequence功能及其实现机制
Synchronization circuit and cross clock domain circuit design 2 -- cross clock domain transmission (FIFO) of multi bit signals
Thead Safety心得体会
OJ每日一练——验证子串
Finding the value of the nth term of Fibonacci sequence by recursion
OJ每日一练——病毒的增生
OJ daily practice - class dining
eslint 简单配置
07 项目成本管理
14. 最长公共前缀
Bubble sort pointer
Various schemes for lazy loading of pictures
How to use swagger2