当前位置:网站首页>Clickhouse source code note 6: exploring the sorting of columnar storage systems
Clickhouse source code note 6: exploring the sorting of columnar storage systems
2022-06-24 07:06:00 【HappenLee】
The analysis completes the aggregation and vectorization filtering , After vectorized function calculation . This article , The author will analyze an important operator of database : Sort . Let's analyze it from the perspective of source code ClickHouse As a column storage system, how to achieve sorting .
The source code analysis of this series of articles is based on ClickHouse v19.16.2.2 Version of .
1. Implementation plan
Old rules , Let's start with a simple query , By implementing the plan step by step ClickHouse Execution logic .
select * from test order by k1;
Let's try to open it first ClickHouse Of Debug Log to see the specific implementation of pipeline.
image.png
It's divided into 5 A flow , And the flow that we need to focus on is coming out MergeSorting And PartialSorting,ClickHouse Read data from the storage engine first , And perform function operations , And sort the data first , And then for the data that is already in order MergeSort, Get the final, orderly data .
2. Sorting out the implementation process
Then the code we are going to comb next is also very clear , Namely PartialSortingBlockInputStream And MergingSortedBlockInputStream.
- PartialSortingBlockInputStream The implementation of the PartialSortingBlockInputStream Is very simple to implement , Let's just look at the code :
Block PartialSortingBlockInputStream::readImpl()
{
Block res = children.back()->read();
sortBlock(res, description, limit);
return res;
} It reads data from the underlying stream Block,Block It can be understood as Doris Among them Batch, A lot of rows of data , And then according to its own member variables SortDescription To the individual Block Sort , And according to limit Do length truncation .
SortDescription It's a vector, Each member describes the collation of a single permutation . such as : null Collation of values , Whether to sort in reverse order, etc .
/// Description of the sorting rule for several columns. using SortDescription = std::vector<SortColumnDescription>;
- sortBlock Function implementation of
Next , Let's see sortBlock Implementation of function , Take a look at how the column execution system uses the above information to sort data .
void sortBlock(Block & block, const SortDescription & description, UInt64 limit)
{
/// If only one column to sort by
if (description.size() == 1)
{
bool reverse = description[0].direction == -1;
const IColumn * column = !description[0].column_name.empty()
? block.getByName(description[0].column_name).column.get()
: block.safeGetByPosition(description[0].column_number).column.get();
IColumn::Permutation perm;
if (needCollation(column, description[0]))
{
const ColumnString & column_string = typeid_cast<const ColumnString &>(*column);
column_string.getPermutationWithCollation(*description[0].collator, reverse, limit, perm);
}
else
column->getPermutation(reverse, limit, description[0].nulls_direction, perm);
size_t columns = block.columns();
for (size_t i = 0; i < columns; ++i)
block.getByPosition(i).column = block.getByPosition(i).column->permute(perm, limit);
}There are two situations that need to be discussed here :1. Single column sort .2. Multi column sorting . Multi column sorting is similar to single column sorting , So let's start with the single column sorting code . Its core code is the following four lines :
column->getPermutation(reverse, limit, description[0].nulls_direction, perm);
size_t columns = block.columns();
for (size_t i = 0; i < columns; ++i)
block.getByPosition(i).column = block.getByPosition(i).column->permute(perm, limit); First, sort by a single column , Get each column after the sort IColumn::Permutation perm;. then Block Each of these columns uses this perm, Generate a new sort column , After replacing the old Columns , It's done Block It's sort of .
Generate Perm
As shown in the figure above ,Permutation Is a length of limit Of PodArray, It identifies the sort position after sorting according to the sort sequence . Follow this up perm Rules use functions permute Generate a new column , Sort the columns that have been completed .
ColumnPtr ColumnVector<T>::permute(const IColumn::Permutation & perm, size_t limit) const
{
typename Self::Container & res_data = res->getData();
for (size_t i = 0; i < limit; ++i)
res_data[i] = data[perm[i]];
return res;
} Careful friends here will find that ,String Listed in sortBlock There's some extra judgment in the function
if (needCollation(column, description[0])) {
const ColumnString & column_string = typeid_cast<const ColumnString &>(*column);
column_string.getPermutationWithCollation(*description[0].collator, reverse, limit, perm);
} This part is a special string generation perm The logic of ,ClickHouse It supports sorting string columns with different codes . Such as through GBK If you sort by coding , Then the order of Chinese will be based on the order of Pinyin .
- getPermutation The implementation of the therefore , stay ClickHouse In the process of sorting .
getPermutationIt's the most important of all the sorting operators , It isColumnA virtual function of class , In other words, each column of different data types can implement its own sorting logic . We go throughColumnVectorThe implementation of the , To manage Zhonggui leopard .
template <typename T>
void ColumnVector<T>::getPermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res) const
{
if (reverse)
std::partial_sort(res.begin(), res.begin() + limit, res.end(), greater(*this, nan_direction_hint));
else
std::partial_sort(res.begin(), res.begin() + limit, res.end(), less(*this, nan_direction_hint));
}
else
{
/// A case for radix sort
if constexpr (std::is_arithmetic_v<T> && !std::is_same_v<T, UInt128>)
{
return;
}
}
/// Default sorting algorithm.
for (size_t i = 0; i < s; ++i)
res[i] = i;
pdqsort(res.begin(), res.end(), less(*this, nan_direction_hint));
}
}This part of the code is more , The author simplifies the logic of this part .
- If there is
limitConditions , And the length of the column is greater thanlimit, usestd::partial_sortConductpermSort . - If it's a number type , And not for
UInt128Type , ThenRadix SortCount and sortpermSort . - If the first two conditions are not met , Quicksort is used as the final default implementation .
well , See here . It's completely sorted out PartialSortingBlockInputStream, We get the... Of each output Block It's sorted according to our sorting rules . Next, please come out MergeSortingBlockInputStream To do the final sorting .
- MergeSortingBlockInputStream The implementation of the It can be seen from the name , It needs to be done once Merge sort , To get the final ordered sorting result . As for the sort objects , Naturally, it goes through PartialSortingBlockInputStream Output
Block了 .
Go straight to readImpl() The implementation of the ,ClickHouse Here comes Spill to disk The external sort logic of , To simplify , I'll take away this part of external sorting logic for the time being .
Block MergeSortingBlockInputStream::readImpl()
{
/** Algorithm:
* - read to memory blocks from source stream;
*/
/// If has not read source blocks.
if (!impl)
{
while (Block block = children.back()->read())
{
blocks.push_back(block);
sum_rows_in_blocks += block.rows();
sum_bytes_in_blocks += block.allocatedBytes();
/** If significant amount of data was accumulated, perform preliminary merging step.
*/
if (blocks.size() > 1
&& limit
&& limit * 2 < sum_rows_in_blocks /// 2 is just a guess.
&& remerge_is_useful
&& max_bytes_before_remerge
&& sum_bytes_in_blocks > max_bytes_before_remerge)
{
remerge();
}
if ((blocks.empty() && temporary_files.empty()) || isCancelledOrThrowIfKilled())
return Block();
if (temporary_files.empty())
{
impl = std::make_unique<MergeSortingBlocksBlockInputStream>(blocks, description, max_merged_block_size, limit);
}
Block res = impl->read();
return res;
}� As you can see from the code above ,MergeSortingBlockInputStream This part is constantly from the bottom PartialSortingBlockInputStream Read out , And store it all . After the final read is complete , utilize MergeSortingBlocksBlockInputStream class , To complete all Blocks The merging and sorting work of . and MergeSortingBlocksBlockInputStream Class is simply to complete the process code of multi-channel merge and sort using heap , I will not start here , Interested students can refer to MergeSortingBlockInputStream.cpp Partial implementation .
3. The point to comb
The second section is finished ClickHouse The implementation process of sorting operator of , Here's a brief summary of the main points :
- ClickHouse The implementation of sorting needs to take advantage of Sort columns Generate corresponding
perm, In the end usepermComplete every one of them Block Sort . - So every column of a different data type , Both need to be implemented
getPermutationAndpermuteTo achieve sorting . And according to the data type , Choose a different sort implementation . such asradix sortThe time complexity of is O(n), Compared with the time complexity of quick sort, there are obvious advantages . - There is a lot of data dependence in sorting algorithm , So it's hard to play
SIMDAdvantage of . Only inradix sortThere are only a few parts that can be vectorized , So relative to the implementation of non vectorization , There are not many performance advantages .
4. Summary
OK, Only this and nothing more , We can start from Clickhouse In the implementation of the source code, sort out how to complete the sorting of the column storage system . Of course , This section skips over some important implementations :Spill to disk. This is to ensure that under a certain memory limit , When sorting massive data , Disk can be used to cache the intermediate results of sorting . The implementation of this part is also very interesting , Interested friends , We can further expand to see the implementation of this part . The author is a ClickHouse Beginners , Yes ClickHouse Interested students , Welcome to give me more advice , communication .
5. Reference material
边栏推荐
- Multi sensor fusion track fusion
- What are the audio formats? Can the audio format be converted
- Use of SQL join
- On BOM and DOM (6): bit value calculation of DOM objects and event objects, such as offsetx/top and clearx
- 虚拟文件系统
- Application configuration management, basic principle analysis
- [problem solving] the connection to the server localhost:8080 was referred
- mysql中的 ON UPDATE CURRENT_TIMESTAMP
- Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.
- You have a chance, here is a stage
猜你喜欢

数据库 存储过程 begin end

Mysql开启BINLOG

你有一个机会,这里有一个舞台

. Net7 miniapi (special part):preview5 optimizes JWT verification (Part 1)

With a goal of 50million days' living, pwnk wants to build a "Disneyland" for the next generation of young people

Leetcode概率题面试突击系列11~15

智能视觉组A4纸识别样例

Rockscache schematic diagram of cache operation

Typora收费?搭建VS Code MarkDown写作环境
![Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.](/img/2c/d04f5dfbacb62de9cf673359791aa9.png)
Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.
随机推荐
Why does the remote end receive a check-out notice when the TRTC applet turns off audio and video locally
JVM debugging tool -arthas
NVIDIA control panel does not open what is NVIDIA control panel
Leetcode概率题面试突击系列11~15
开源与创新
Localized operation on cloud, the sea going experience of kilimall, the largest e-commerce platform in East Africa
记录--关于virtual studio2017添加报表控件的方法--Reportview控件
Actual combat | how to deploy flask project using wechat cloud hosting
How to send SMS in groups? What are the reasons for the poor effect of SMS in groups?
Challenges brought by maker education to teacher development
JSON formatting method advantages of JSON over XML
How to build an app at low cost
成为 TD Hero,做用技术改变世界的超级英雄 | 来自 TDengine 社区的邀请函
What are the audio formats? Can the audio format be converted
Go operation SQLite code error
What is the role of domain name websites? How to query domain name websites
setInterval里面的函数不能有括号
EasyDSS_ The dash version solves the problem that the RTSP source address cannot play the video stream
Spark accumulators and broadcast variables
Laravel document reading notes -laravel str slug() function example