当前位置:网站首页>Clickhouse optimize table comprehensive analysis
Clickhouse optimize table comprehensive analysis
2022-06-24 03:14:00 【2011aad】
Recently I am using Clickhouse In the process of , Yes Optimize Table command , In the process of business development , Because I don't understand Optimize Table An explicit act of command , A lot of things went wrong , In the process of checking the problem , Also found online about Optimize Table There is little introduction to the command , So I decided to combine the source code , Comprehensive analysis Optimize Table command .
Optimize Table Command functionality
Clickhouse As a OLAP database , Support for data updates is weak , And does not support the standard SQL update/delete grammar ; It provides alter table ...... update/delete The syntax is also asynchronous , That is, after receiving the command, it will be returned to the client successfully , It is uncertain when the data will be updated successfully .
Therefore, when the business needs data update ( Such as Mysql Synchronize to Clickhouse), You usually use ReplacingMergeTree or CollapsingMergeTree Data merge logic bypasses to realize asynchronous update , On the one hand, it can ensure the final consistency of data , On the other hand Clickhouse The performance overhead is also lower than alter table Small . But one drawback of this approach is MergeTree The data merging process of the engine (merge) yes Clickhouse Policy based control , The execution time is random , Therefore, data consistency lacks time guarantee , In extreme cases, the data has not been fully merged after a day .
and Optimize Table This command can force the trigger MergeTree Data consolidation of the engine , It can be used to solve the problem of uncertain data consolidation time .
Optimize Table Execution process source code analysis
Clickhouse Receiving a SQL After the statement , It will be implemented through the following process SQL:Parser( analysis SQL grammar , Turn into AST)-> Interpreter( Optimize the generation of execution plans RBO)-> Interpreter::executeImpl( adopt Block Stream Read or write data )[1].Optimize Table Statements are no exception , It's just Optimize Statement has no complex execution plan .
Clickhouse received Optimize Table Command will be called to ParserOptimizeQuery::parseImpl() Parse command .
bool ParserOptimizeQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
{
ParserKeyword s_optimize_table("OPTIMIZE TABLE");
ParserKeyword s_partition("PARTITION");
ParserKeyword s_final("FINAL");
ParserKeyword s_deduplicate("DEDUPLICATE");
ParserKeyword s_by("BY");
......
}You can see Optimize Table The following keywords are mainly parsed in the statement :“OPTIMIZE TABLE”、“PARTITION”、“FINAL”、“DEDUPLICATE”、“BY”. Official documents describe the role of these keywords [2]:
1. “OPTIMIZE TABLE”: Specify the need Optimize Table of , Only support MergeTree engine .
2. “PARTITION”: If partition is specified , The merge task will only be triggered for the specified partition .
3. “FINAL”: Merge even if there is only one file block , Even if there is a parallel merge in progress , This merge will also be enforced .
4. “DEDUPLICATE”: duplicate removal , If there is no follow-up “BY” Clause , Then remove the duplicate according to the same lines ( All field values are the same ).
5. “BY”: coordination “DEDUPLICATE” Key words use , Specify which columns are used for de duplication .
Next, compare the source code , See how these keywords control merge execution .
Get into InterpreterOptimizeQuery::execute(), Check it first “DEDUPLICATE BY” Whether the column of contains the partitioning key of the table 、 Primary key , If not, an exception will be thrown directly .Clickhouse The data storage of is divided into file blocks according to the partition key , The data in each file block is sorted by primary key , Therefore, if the partition key is included in the de duplication 、 Primary key ,Clickhouse You can de duplicate only the adjacent rows , There is no need to construct a hash table , It can greatly improve the execution efficiency .
BlockIO InterpreterOptimizeQuery::execute()
{
......
// Empty list of names means we deduplicate by all columns, but user can explicitly state which columns to use.
Names column_names;
if (ast.deduplicate_by_columns)
{
......
metadata_snapshot->check(column_names, NamesAndTypesList{}, table_id);
Names required_columns;
{
required_columns = metadata_snapshot->getColumnsRequiredForSortingKey();
const auto partitioning_cols = metadata_snapshot->getColumnsRequiredForPartitionKey();
required_columns.reserve(required_columns.size() + partitioning_cols.size());
required_columns.insert(required_columns.end(), partitioning_cols.begin(), partitioning_cols.end());
}
for (const auto & required_col : required_columns)
{
// Deduplication is performed only for adjacent rows in a block,
// and all rows in block are in the sorting key order within a single partition,
// hence deduplication always implicitly takes sorting keys and partition keys in account.
// So we just explicitly state that limitation in order to avoid confusion.
if (std::find(column_names.begin(), column_names.end(), required_col) == column_names.end())
throw Exception(ErrorCodes::THERE_IS_NO_COLUMN,
"DEDUPLICATE BY expression must include all columns used in table's"
" ORDER BY, PRIMARY KEY, or PARTITION BY but '{}' is missing."
" Expanded DEDUPLICATE BY columns expression: ['{}']",
required_col, fmt::join(column_names, "', '"));
}
}
table->optimize(query_ptr, metadata_snapshot, ast.partition, ast.final, ast.deduplicate, column_names, getContext());
return {};
}After verifying the de duplication , The table's optimize() Method . In fact only MergeTree and ReplicatedMergeTree Realized optimize() Method , Other storage engine calls optimize() Methods will throw exceptions directly .
Get into InterpreterOptimizeQuery::optimize(), In unspecified “PARTITION” And used “FINAL” when , Will traverse all partitions of the table , And perform merge logic for each partition ; If a partition is specified , I don't care anymore “FINAL” The key words , Is to merge the partition ; If no partition is specified , Not used “FINAL” Under the circumstances , In code partition_id It's empty , stay merge() Special treatment is made for this situation in the method .
bool StorageMergeTree::optimize(
const ASTPtr & /*query*/,
const StorageMetadataPtr & /*metadata_snapshot*/,
const ASTPtr & partition,
bool final,
bool deduplicate,
const Names & deduplicate_by_columns,
ContextPtr local_context)
{
......
String disable_reason;
if (!partition && final)
{
DataPartsVector data_parts = getDataPartsVector();
std::unordered_set<String> partition_ids;
for (const DataPartPtr & part : data_parts)
partition_ids.emplace(part->info.partition_id);
for (const String & partition_id : partition_ids)
{
if (!merge(
true,
partition_id,
true,
deduplicate,
deduplicate_by_columns,
&disable_reason,
local_context->getSettingsRef().optimize_skip_merged_partitions))
{......}
}
}
else
{
String partition_id;
if (partition)
partition_id = getPartitionIDFromQuery(partition, local_context);
if (!merge(
true,
partition_id,
final,
deduplicate,
deduplicate_by_columns,
&disable_reason,
local_context->getSettingsRef().optimize_skip_merged_partitions))
{......}
}
return true;
}InterpreterOptimizeQuery::merge() The logic of the method is simple , Select the file blocks to merge -> Merge selected file blocks .
bool StorageMergeTree::merge(
bool aggressive,
const String & partition_id,
bool final,
bool deduplicate,
const Names & deduplicate_by_columns,
String * out_disable_reason,
bool optimize_skip_merged_partitions)
{
......
{
merge_mutate_entry = selectPartsToMerge(
metadata_snapshot,
aggressive,
partition_id,
final,
out_disable_reason,
table_lock_holder,
lock,
optimize_skip_merged_partitions,
&select_decision);
}
......
return mergeSelectedParts(metadata_snapshot, deduplicate, deduplicate_by_columns, *merge_mutate_entry, table_lock_holder);
}Get into StorageMergeTree::selectPartsToMerge(), stay partition_id It's empty time ( Only if no partition is specified , Without using “FINAL” when , Will be empty ), Will execute selectPartsToMerge() Select some file blocks according to the policy to perform the merge , And in the partition_id When is not empty , It's execution selectAllPartsToMergeWithinPartition() Merge all the file blocks under the partition . therefore , No partition is specified , Not used “FINAL” In the case of keywords ,Optimize Table The command does not guarantee that the data will eventually become fully merged .
std::shared_ptr<StorageMergeTree::MergeMutateSelectedEntry> StorageMergeTree::selectPartsToMerge(
const StorageMetadataPtr & metadata_snapshot,
bool aggressive,
const String & partition_id,
bool final,
String * out_disable_reason,
TableLockHolder & /* table_lock_holder */,
std::unique_lock<std::mutex> & lock,
bool optimize_skip_merged_partitions,
SelectPartsDecision * select_decision_out)
{
......
if (partition_id.empty())
{
......
if (max_source_parts_size > 0)
{
select_decision = merger_mutator.selectPartsToMerge(
future_part,
aggressive,
max_source_parts_size,
can_merge,
merge_with_ttl_allowed,
out_disable_reason);
}
else if (out_disable_reason)
*out_disable_reason = "Current value of max_source_parts_size is zero";
}
else
{
while (true)
{
UInt64 disk_space = getStoragePolicy()->getMaxUnreservedFreeSpace();
select_decision = merger_mutator.selectAllPartsToMergeWithinPartition(
future_part, disk_space, can_merge, partition_id, final, metadata_snapshot, out_disable_reason, optimize_skip_merged_partitions);
auto timeout_ms = getSettings()->lock_acquire_timeout_for_background_operations.totalMilliseconds();
auto timeout = std::chrono::milliseconds(timeout_ms);
/// If final - we will wait for currently processing merges to finish and continue.
if (final
&& select_decision != SelectPartsDecision::SELECTED
&& !currently_merging_mutating_parts.empty()
&& out_disable_reason
&& out_disable_reason->empty())
{
LOG_DEBUG(log, "Waiting for currently running merges ({} parts are merging right now) to perform OPTIMIZE FINAL",
currently_merging_mutating_parts.size());
if (std::cv_status::timeout == currently_processing_in_background_condition.wait_for(lock, timeout))
{
*out_disable_reason = fmt::format("Timeout ({} ms) while waiting for already running merges before running OPTIMIZE with FINAL", timeout_ms);
break;
}
}
else
break;
}
}
......
}in addition , In the use of the “FINAL” In the case of keywords ,Optimize Table The command will wait for the merging task being executed to end , Then execute the merge , So when the partition is specified , Use “FINAL” Keyword response will be slower .
InterpreterOptimizeQuery::mergeSelectedParts() The logic of is more complicated , I won't go into details here , But the overall logic is to read all the selected file blocks , Then perform data consolidation , Form a new file block to write to disk . Therefore, in the case of a large amount of data , This is actually a very heavy operation , Because whether or not there is data to be merged , It is necessary to read out the full amount of data , Write a new copy to disk . In execution Optimize after , A new file block is generated , But old file blocks don't disappear immediately , It will be deleted asynchronously , Therefore, when executing the Optimize After that, you will see a brief increase in data storage capacity .
Some partitions do not need to be merged ,Clickhouse 21.1 The version has been optimized , In system variable (system.settings surface ) It's added optimize_skip_merged_partitions Parameters , This parameter turns on , stay selectAllPartsToMergeWithinPartition() Only one file block and level>0 The partition ( Such a partition means that the partition has been merged before ).
Experimental verification
To verify the above code logic , The author is in Clickhouse 20.3 edition ( No, optimize_skip_merged_partitions Parameters ) Some experiments were carried out on .
1. Optimize + Partition
Figure 2 shows the execution Optimize Table ...... Partition 20210209 The implementation effect of , You can see that after execution 20210209 In this partition 2 File block (Parts) Is merged into a file block , Its level by 3, Other partitions are not merged . Of course, the picture shows Optimize Final effect , When the command is just executed , The original 20210209_84_94_2、20210209_95_95_0 Folders don't disappear immediately , It took a few minutes before it was deleted .
2. Optimize + Final
Figure 3 is Optimize Table ...... Final The implementation effect of , You can see the execution Optimize Final After the command ,20211013 Multiple file blocks of this partition are merged into one file block ; meanwhile , Other partitions that have been merged ( Such as 20210729) Will be rewritten , Its level from 5 Change into 7( Because in the middle 2 Time Optimize Final sentence ).
3. Optimize
Finally, let's take a look at the simple Optimize The effect of , As shown in Figure 4 . You can see Clickhouse Only some file blocks of a partition are selected for merging according to the policy (20211013_0_231_28、20211013_232_410_30、20211013_411_432_10 Three file blocks are merged into 20211013_0_432_31 File block ), This does not guarantee that the final data will be fully merged .
Use summary
Based on Clickhouse In the construction of data warehouse , because Clickhouse It does not support complete data update , Real time and consistency of data exist trade-off, If the application scenario requires high data consistency , In case of data update , It is almost impossible to import data in real time , You can only periodically import offline to ensure Clickhouse The data in is a complete slice at a certain time . The offline task has scheduling delay , Generally speaking, the minimum cycle can only reach the hour level , It's hard to be on the minute scale . If the application scenario pays more attention to the real-time data , You can import in real time , because Clickhouse Of Merge The process is scheduled based on policy , Therefore, the data consistency will be poor ( You will find the data that should have been deleted ).
Based on real-time writing + regular Optimize The way , By changing Optimize cycle , In performance 、 Balance data consistency . When data consistency requirements are high , Can shorten Optimize cycle , In extreme cases, you can even execute every write Optimize, This can reduce the time of data inconsistency to minutes ( Of course that's right Clickhouse The performance requirements of are strict ); When the amount of data is large , It can be executed every half an hour or so Optimize, This is a guarantee Clickhouse Cluster performance at the same time , There is also a guarantee for the time when the data are inconsistent . In my practical use ,Clickhouse Cluster use 32 nucleus 64G machine , The original data volume of a single table is 1TB Within ,Optimize The execution cycle is 5min-10min There's no pressure .
reference
[1] ClickHouse Source code reading —— Detailed inquiry SQL Statement execution procedure . https://nowjava.com/article/43828
[2] Clickhouse docs. https://clickhouse.com/docs/en/sql-reference/statements/optimize/
边栏推荐
- "Sharp weapon" for enterprise resumption? When the sale comes, the contract should be signed like this!
- Grand summary of boutique idea plug-ins! Worth collecting
- Why does the fortress machine use an application publisher? What are the main functions of the fortress machine
- Which brand is a good backup all-in-one machine price
- How is intelligent character recognition realized? Is the rate of intelligent character recognition high?
- Tencent cloud CIF engineering effectiveness summit was successfully opened, and coding released a series of new products
- Grpc: how to add API Prometheus monitoring interceptors / Middleware?
- Is AI face detection and face recognition a concept? What's the difference?
- South Korea's national network is disconnected. Who launched the network "attack"?
- QT creator tips
猜你喜欢

QT creator tips

2022-2028 global tungsten copper alloy industry research and trend analysis report

2022-2028 global genome editing mutation detection kit industry survey and trend analysis report

Get to know MySQL database

2022-2028 global anti counterfeiting label industry research and trend analysis report

Community pycharm installation visual database

UI automation based on Selenium
![[51nod] 3395 n-bit gray code](/img/b5/2c072a11601de82cb92ade94672ecd.jpg)
[51nod] 3395 n-bit gray code
![[summary of interview questions] zj5](/img/d8/ece82f8b2479adb948ba706f6f5039.jpg)
[summary of interview questions] zj5

2022-2028 global cell-based seafood industry research and trend analysis report
随机推荐
Why can't cloud games connect to the server? What if the cloud game fails to connect to the server?
2022-2028 global cancer biopsy instrument and kit industry research and trend analysis report
RI Geng series: write a simple shell script, but it seems to have technical content
Tstor onecos, focusing on a large number of object scenes
Dry goods how to build a data visualization project from scratch?
Tke accesses the cluster through kubectl in pod
How do I check the trademark registration number? Where do I need to check?
[summary of interview questions] zj6 redis
Grpc: how to add API Prometheus monitoring interceptors / Middleware?
Supply chain system platform: two management areas
How to install the cloud desktop security server certificate? What can cloud desktops do?
Cloud desktop server resource planning, what are the advantages of cloud desktop
How to build a shopping website? What problems should be paid attention to in the construction of shopping websites?
How to access the cloud game management server? Which cloud game management server can I choose?
What is the GPU usage for cloud desktops and servers? What can cloud desktop do?
System library golang Org/x/time/rate frequency limiter bug
Heavy release! Tencent security hosting service TA is here!
[hot] with a budget of only 100 yuan, how to build a 1-year web site on Tencent cloud??
Sinclair radio stopped broadcasting many TV stations, suspected of being attacked by blackmail software
14. Tencent cloud IOT device side learning - data template application development