当前位置:网站首页>How is a Clickhouse query completed?

How is a Clickhouse query completed?

2022-06-24 05:58:00 felixxdu

Clickhouse SQL FUNCTION Introduce

Clickhouse The functions in can be roughly divided into three categories :

  • Ordinary function It can also be called One line function , Detail function , from IFunction Interface definition . For the queried table or view Each row returns a result value . Common digital operation functions , Type conversion functions , Conditional function , Comparison functions, etc . see clickhouse There are as many detailed functions supported 600 Multiple , And the number of iterations supported is increasing . If you need to add support for new functions , At present, the only way is source code Medium hard code . Not yet create udf return type as ... Allied udf The function of . You can use the following SQL Query supported function: select * from system.functions where "is_aggregate"=0select * from mysql('host:port','database', 'table', 'user','password')  -- see mysql Data in the database select * from numbers() limit 10,1000000; -- Single threaded generation 10~1000010 Between the numbers select * from numbers_mt() limit 10,1000000; -- Multithreaded generation 10~1000010 Between the numbers
  • polymerization function from IAggregateFunction Interface definition , To rowset group ( A collection of rows ) Do aggregate calculations , Aggregate functions can only return one value per group . Common are sum,avg Functions, etc , The state of aggregate functions supports serialization and deserialization , So it can be transmitted between distributed nodes , To achieve incremental computing . Query supported aggregations function: select * from system.functions where "is_aggregate"=1
  • surface function Common ones are tables function Yes mysqlurlnumbersremote etc. , As a data source (storage) Use , With the from After Clause . Common use :

For the introduction of all functions, see : Official documents

AST The structure of the tree

Parser and Interpreter Are two very important sets of interfaces :Parser Responsible for creating AST object ,Interpreter The interpreter is responsible for explaining AST, And further create the execution of the query pipeline. They are associated with IStorage Together , The whole data query process is concatenated .

Parser Take one SQL The statement is recursively parsed into AST The form of the grammar tree . Different SQL sentence , Through different Parser Implement class parsing . Based on the current community master Branch version ,parser Has as many subclasses as 170 Multiple . The main one is src/parser Next , be responsible for clickhouse class sql Syntax parsing ;mysql Some of the following parser Mainly responsible for clickhouse It can be used as mysql Syntax parsing of the client side of .

They have implemented the two main interfaces according to their respective responsibilities :getName() And parseImpl(). It's responsible for parsing DDL Query statement ParserRenameQuery、ParserDropQuery and ParserAlterQuery Parser , There are also people who are responsible for parsing INSERT Of the statement ParserInsertQuery Parser , And responsible for SELECT Of the statement ParserSelectWithUnionQuery etc. .

This parser The way of working is to expand in a hierarchical way , One SQL Come here , First construct a parserQuery Of root parser , At the root parser The first category to judge the attribution , Then there are large categories of parserImpl Will be called to multiple secondary categories parser... And so on .

root / Class A parser(ParserQuery) There are the following two levels in parser( Then there are function notes )(ClickHouse/src/Parsers/ParserQuery.cpp):

ParserQueryWithOutput query_with_output_p; // The most common SQL Statements will match this parser
ParserInsertQuery insert_p(end); // insert  sentence 
ParserUseQuery use_p; // use db sentence 
ParserSetQuery set_p; // set key1 = value1 sentence 
ParserSystemQuery system_p; // system  Opening statement  https://clickhouse.tech/docs/en/sql-reference/statements/grant/#grant-system
ParserCreateUserQuery create_user_p; // CREATE USER or ALTER USER
ParserCreateRoleQuery create_role_p; // CREATE ROLE or ALTER ROLE
ParserCreateQuotaQuery create_quota_p; // CREATE USER or ALTER USER
ParserCreateRowPolicyQuery create_row_policy_p; //  Implement row level permission control 
ParserCreateSettingsProfileQuery create_settings_profile_p; // CREATE SETTINGS PROFILE or ALTER SETTINGS PROFILE
ParserDropAccessEntityQuery drop_access_entity_p; // DROP USER|ROLE | QUOTA
ParserGrantQuery grant_p; // GRANT or REVOKE  Table and column level permission control 
ParserSetRoleQuery set_role_p; // SET ROLE
ParserExternalDDLQuery external_ddl_p; //EXTERNAL DDL FROM external_source(...) DROP|CREATE|RENAME

The most important secondary parser ParserQueryWithOutput Then there are the following parser...

ParserShowTablesQuery show_tables_p; //  be responsible for show [tables /databases/...]  Syntax parsing 
ParserSelectWithUnionQuery select_p; //  be responsible for select Query syntax parsing entry , There are more inside parser
ParserTablePropertiesQuery table_p; // (EXISTS | SHOW CREATE) [TABLE|DICTIONARY] [db.]name [FORMAT format]
ParserDescribeTableQuery describe_table_p; // (DESCRIBE | DESC) ([TABLE] [db.]name | tableFunction) [FORMAT format]
ParserShowProcesslistQuery show_processlist_p; // SHOW PROCESSLIST
ParserCreateQuery create_p; // CREATE|ATTACH TABLE ...
ParserAlterQuery alter_p; // ALTER TABLE [db.]name
ParserRenameQuery rename_p; // RENAME TABLE [db.]name TO [db.]name, [db.]name TO [db.]name
ParserDropQuery drop_p; // DROP|DETACH|TRUNCATE TABLE [IF EXISTS] [db.]name
ParserCheckQuery check_p; // CHECK [TABLE] [database.]table
ParserOptimizeQuery optimize_p; // OPTIMIZE TABLE [db.]name [PARTITION partition] [FINAL] [DEDUPLICATE]
ParserKillQueryQuery kill_query_p; // KILL QUERY WHERE ... [SYNC|ASYNC|TEST]
ParserWatchQuery watch_p; // WATCH [db.]table EVENTS  Function is introduced :https://clickhouse.tech/docs/en/sql-reference/statements/watch/
ParserShowAccessQuery show_access_p; // SHOW ACCESS
ParserShowAccessEntitiesQuery show_access_entities_p; // SHOW USERS; SHOW [CURRENT|ENABLED] ROLES; SHOW [SETTINGS] PROFILES  etc. 
ParserShowCreateAccessEntityQuery show_create_access_entity_p; // SHOW CREATE USER [name | CURRENT_USER]
ParserShowGrantsQuery show_grants_p; // SHOW GRANTS [FOR user_name]
ParserShowPrivilegesQuery show_privileges_p; // SHOW PRIVILEGES
ParserExplainQuery explain_p; // EXPLAIN AST|PLAN|SYNTAX|PIPELINE SELECT...

And so on .parser At the end of the day, a Ast Syntax tree . They have a common interface IAST, Inheritance system and parser Very similar .

Lexical and grammatical analysis

Two concepts are introduced :

Token: Represents a meaningful... Composed of several characters ” word “,token There's a lot of type, see src/Parsers/Lexer.h Macro definition under .

Lexer: Lexical parser , Input sql sentence , Spit out one by one token. And finally put these token Add some meaningful information and organize it according to the rules Ast Trees .

AST Tree analysis Function The process of

Among them function Most relevant parser The entrance ParserExpressionList, Final parse Realize in ParserLambdaExpression in parseImpl. stay parser Stage , Can't test function Whether there is . First, we'll build a ASTIdentifier, And then, with the parameters, we build ASTFunction; stay pipeline The existence of the parameter will be verified only when it is actually executed .

Interpreter To pipeline Implementation

Interpreter The interpreter works like Service The service layer is the same , Aggregate the resources required by each operator and concatenate the entire query process . First, it will parse AST object , And then execute “ Business logic ”( For example, branch judgment 、 Set up Parameters 、 Call interface, etc ), Eventually return IBlock object , Set up a query execution in the form of thread pipeline.

One Query The processing flow is generally :

stay clickhouse in ,transformer Is the concept of operator . all transformer Arranged into a pipeline (pipeline), And then to pipelineExecutor stream perform , Every execution of a transformer A batch of data sets in will be processed and output , All the way downstream sinker.

Clickhouse A series of basic transformer modular , see src/Processors/Transforms, such as :

  • FilterTransform – WHERE filter
  • SortingTransform – ORDER BY Sort
  • LimitByTransform – LIMIT tailoring
  • ExpressionTransform - Expression execution

When we execute :

SELECT age + 1 FROM t1 WHERE id=1 ORDER BY time DESC LIMIT 10 about ClickHouse Of QueryPipeline Come on , It will be arranged and assembled in the following way :

QueryPipeline::addSimpleTransform(Source)
QueryPipeline::addSimpleTransform(FilterTransform)
QueryPipeline::addSimpleTransform(SortingTransform)
QueryPipeline::addSimpleTransform(LimitByTransform)
QueryPipeline::addSimpleTransform(ExpressionTransform)
QueryPipeline::addSimpleTransform(Sinker)

When QueryPipeline Conduct transformer When the choreography , There is also a need for a lower level DAG Connected construction .

connect(Source.OutPort, FilterTransform.InPort)
connect(FilterTransform.OutPort, SortingTransform.InPort)
connect(SortingTransform.OutPort, LimitByTransform.InPort)
connect(LimitByTransform.OutPort, ExpressionTransform.InPort)
connect(ExpressionTransform.OutPort, Sinker.InPort)

In this way, the data flow relationship is realized , One transformer Of OutPort Docking with another InPort. meanwhile , Different transformer The operator of , If it can be executed in parallel ( such as filter,expression Can be executed in parallel ), There will be more fission transformer , Achieve a parallel acceleration effect .

原网站

版权声明
本文为[felixxdu]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/07/20210730033714139C.html

随机推荐