当前位置：网站首页>What do you pay special attention to when you insert / update / delete / obtain millions of rows of data in a DML statement?

What do you pay special attention to when you insert / update / delete / obtain millions of rows of data in a DML statement?

2022-06-22 23:49:00 【Kunlunbase Kunlun database】

Preface

Any node of a distributed computing and storage system may be overloaded , Node calculation 、 Insufficient storage resources , Network delay , The network is temporarily unreachable, resulting in operation timeout .
Any operation of the distributed system while waiting for the remote node to return , Usually hold various resources , You can't wait indefinitely , Otherwise, the overall operation of the system will be blocked and gradually stagnate .

So timeout control is a problem that all distributed systems need to solve , And if it is not solved well, it will lead to the stagnation of the system , Not working properly .

A brief introduction to the timeout control mechanism of Kunlun distributed database

Kunlun distributed database has the following timeout control variables ：

Part of it is in the computing node , The timeout variables of the compute node are all in the configuration file of the compute node instance , It can be modified as needed , And refresh the parameters of the running instance after modification .
Part of it is in the storage node , The timeout variable of the storage node is in the storage node configuration file , Profile can be modified , It can also be performed on a compute node or a storage node set Statement to modify the corresponding variable value .

In general, users do not need to modify these variables , Because we have optimized the configuration parameters of the computing node and the storage node for the general situation .

But in the Special scenes You still need to modify these timeout variables .

A typical scenario is to be in a DML Insert... In the statement / to update / Delete millions of rows or more of data , Or a select Statements return millions of rows or more of data .

for example , Logical import of large data tables or full data , Update the entire table for the super large table , Data analysis （OLAP） The query needs to scan a very large table , And programmers or DBA I plan to delete the database and run away .

In these scenarios, it is best for the user to insert according to the estimation / to update / Delete / Amount of data read , Increase the following timeout values in advance , To ensure that relevant statements and operations can work normally until they are completed , It will not be mistaken by the timeout mechanism as a statement that has timed out and cannot be executed correctly and terminated in advance .

Or the user can try these operations and get errors , Increase these timeout values .

Let's take a look at all timeout control variables of Kunlun distributed database .

Calculate the timeout variable function of the node

1. statement_timeout： Statement timeout .

If the total query execution time of the computing node exceeds this limit , The statement will be rolled back .

such as , If the computing node uses part of the data returned by the storage cluster to perform table connection, the time consumption is too long , Then it will eventually stop after the timeout limit is reached （ Default 100 second ）.

2. mysql_read_timeout and mysql_write_timeout： Compute nodes vs. storage nodes / The communication between metadata nodes （ Reading and writing ） Overtime .

Read more than mysql_read_timeout Or write more than mysql_write_timeout Then the calculation node uses mysql The client library will report an error and read from the / Write waiting returns , In this way, the execution of the statement is terminated in advance .

If one is sent to the computing node insert Statement will insert 100 Ten thousand rows of data , Or one select Statement will return millions of rows of data from the storage node , Then it is better to increase the value of these two variables , By default they are 50 second .

in addition , In this case, it needs to be increased mysql_max_packet_size Variable , Ensure that such large packets can be sent to the storage node correctly .

3. lock_timeout： Calculate the time that the node waits for the table lock .

The addition, deletion, modification and query statements executed concurrently are compatible with the table , No need to wait for the lock .

But if one alter table Statement is executing , In this case, other connections on the same compute node cannot execute the DML sentence , They can't wait that long , If you can't get the lock, you will report an error and return （ Default 100 second ）.

3. log_min_duration_statement： Statements that exceed this time will be recorded in the log file as slow queries .

If you want to be in each insert Insert tens of thousands of lines or more into the statement , Then we must increase this variable , Otherwise, a large amount of data will be recorded in the log file , This causes the computing node to run out of disk space （ Default 10 second ）.

The timeout variable function of the storage node

1. lock_wait_timeout：mysql server Lock timeout variable of layer .

wait for server Maximum time of table lock of layer . If one DDL Statements in alter table, So all the things you do to this table DML Statement will block up to so many tables waiting , If the table lock is not obtained, an error will be returned .

stay MySQL8.0 Time , The most common operations, such as adding columns and adding citations, which once had to lock the entire table, no longer require long-term locking of the entire table , Has become online ddl, So default 5 Seconds are generally enough .

2. innodb_lock_wait_timeout：mysql innodb Lock timeout variable of , wait for innodb Maximum time of row lock .

More than that DML Statement will report an error and return .

If you want to update the whole table , And the amount of data in the table is very large , For example, hundreds of GB Even more , that update Statement will lock a large number of lines for a long time , At this time, other transactions usually have lock timeouts , Unless its innodb_lock_wait_timeout（ Default 20 second ）.

3. If the storage cluster uses MySQL Group Replication High availability , Then you need to increase

MGR Of group_replication_member_expel_timeout,group_replication_component_stop_timeout, group_replication_unreachable_majority_timeout Timeout control variable , otherwise MGR The standby node of is down by mistake, thus initiating the active / standby switchover , Or the primary node loses contact with the standby machine and cannot write to it .

Conclusion

Kunlun distributed database has a perfect timeout control mechanism , There is a timeout control in any inter node communication mechanism , Ensure that any operation has a maximum time consumption limit , Ensure that the system status can continue to advance , System resources can continuously serve more service requests .

The project is open source

【GitHub：】
https://github.com/zettadb

【Gitee：】
https://gitee.com/zettadb

THE END

原网站

版权声明
本文为[Kunlunbase Kunlun database]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/173/202206222122590111.html

当前位置：网站首页>What do you pay special attention to when you insert / update / delete / obtain millions of rows of data in a DML statement?

What do you pay special attention to when you insert / update / delete / obtain millions of rows of data in a DML statement?

A brief introduction to the timeout control mechanism of Kunlun distributed database

Kunlun distributed database has the following timeout control variables ：

Part of it is in the computing node , The timeout variables of the compute node are all in the configuration file of the compute node instance , It can be modified as needed , And refresh the parameters of the running instance after modification .

Part of it is in the storage node , The timeout variable of the storage node is in the storage node configuration file , Profile can be modified , It can also be performed on a compute node or a storage node set Statement to modify the corresponding variable value .

Calculate the timeout variable function of the node

边栏推荐

猜你喜欢

随机推荐