当前位置:网站首页>SQL rewriting Series 6: predicate derivation
SQL rewriting Series 6: predicate derivation
2022-07-24 23:36:00 【Official blog of oceanbase database】
Introduction to the series of articles
OceanBase yes 100% Independent research and development , continuity 9 Annual stable support double 11, Innovative launch “ Three places five centers ” New urban disaster recovery standards , yes The only global stay TPC-C and TPC-H A domestic native distributed database that has set a new world record in the test , On 2021 year 6 The source code was officially opened in January . Query optimizer is the core module of relational database system , It is the key and difficult point of database kernel development , It is also a measure of the maturity of the whole database system “ Touchstone ”. To help you better understand OceanBase Query optimizer , We will write a series of articles about query rewriting , Take you to better grasp the essence of query rewriting , Familiar with complex SQL Equivalence of , Write effective SQL. This article is about OceanBase Rewrite the sixth part of the series , We will focus on predicate derivation , Welcome to explore ~
The columnist introduces
OceanBase Optimizer team , from OceanBase Senior technical expert Xifeng 、 Led by technical experts such as Shan Wen , We are committed to building a world leading distributed query optimizer .
Series content composition
This query rewriting series not only includes sub query optimization 、 Aggregate function optimization 、 Window function optimization 、 Four modules of complex expression optimization , This article will elaborate on the derivation of predicates , There are more modules , Coming soon .
Welcome to your attention OceanBase Open source users ( Nail No :33254054), Group entry and OceanBase Communicate with the query optimizer team .
One 、 Why predicate derivation is needed
Businesses usually only read part of the data when accessing the database , Therefore, some predicates will be specified to filter out unwanted data . When implementing a query semantics , We can use many different predicate combinations .
for example :Q1 and Q2 They are all read from the database with the number 1024 Remaining ticket information of film arrangement . These two queries use different predicate sets , The same query effect is achieved . In terms of query performance ,Q2 Filter predicates written better .Q2 Medium T.play_id = 1024 Is a base table filter predicate . It can filter out a batch of data in advance , Reduce the amount of data participating in the connection . further , When TICKETS Exists on the table (play_id, sale_date, seat) When indexing , On the one hand, the query optimizer can determine a very good data scanning range ; On the other hand, index order can also be used to eliminate ORDER BY The resulting sort operation . Final , The whole query only needs to read T Tabular 10 Row data .
Q1:
SELECT P.show_time, T.ticket_id, T.seat
FROM PLAY P, TICKETS T
WHERE P.play_id = T.play_id AND P.play_id = 1024 AND T.sale_date is NULL
ORDER BY T.seat LIMIT 10;
Q2:
SELECT P.show_time, T.ticket_id, T.seat
FROM PLAY P, TICKETS T
WHERE T.play_id = 1024 and P.play_id = 1024 AND T.sale_date is NULL
ORDER BY T.seat LIMIT 10;
To ensure good query performance , The database kernel needs to be capable of Q1 To query and deduce T.play_id = 1024 Such predicates . This ability we call “ Predicate derivation ”. stay OceanBase in , We aim at different predicate usage scenarios , Design and implement a variety of predicate derivation strategies . The following will mainly introduce these derivation strategies .
Two 、 Predicate derivation
Predicate derivation is based on multiple predicates , Some new predicates are derived . for example ,Q1 in P.play_id = T.play_id and P.play_id = 1024 Two predicates , A new predicate can be derived T.play_id = 1024. This is a T Single table filter predicate on table , It can be filtered out in advance T The data on the table , Reduce the amount of data involved in multi table connections . Deriving new predicates is meaningful in many optimization scenarios .
Size comparison derivation
Given multiple predicates for size comparison , We can arrange the size relationship between multiple expressions . for example , In the following query , There is T1.C1 > T2.C1 and T1.C1 < 10 Two predicates , Then we can arrange the size relationship between them as :T2.C1 <T1.C1 < 10 . obviously , For this scenario , We can derive a new predicate T2.C1 < 10 . This predicate can be filtered in advance T2 The data table , Reduce the amount of data participating in the connection .
SELECT * FROM T1, T2 WHERE T1.C1 > T2.C1 AND T1.C1 < 10;
SELECT * FROM T1, T2 WHERE T1.C1 > T2.C1 AND T1.C1 < 10 AND T2.C1 > 10;
Yes Q1 For inquiry , We can also use the size relationship given by the predicate (T.play_id = P.play_id = 1024), Derive a new predicate T.play_id = 1024. further , After deriving the new predicate , We can also eliminate a redundant join predicate P.play_id = T.play_id, Finally get the query Q2.
Complex predicate derivation
Except for size comparison 、 Besides the predicate of equivalence comparison , More complex predicates are often used in queries . for example , Use LIKE Prefix match the string . Given a complex predicate and some equivalent comparison Relations , We can also derive some new predicates . for example , The following query contains T1.C1 = T2.C1 and T1.C1 LIKE 'ABC%' Two predicates . because T1.C1 and T2.C1 There is an equivalence relationship , therefore ,T2.C1 LIKE 'ABC%' It must also be established . This predicate can also be filtered in advance T2 The data table , Reduce the amount of data participating in the connection .
SELECT *
FROM T1, T2 WHERE T1.C1 = T2.C1 AND T1.C1 LIKE 'ABC%';
SELECT *
FROM T1, T2 WHERE T1.C1 = T2.C1 AND T1.C1 LIKE 'ABC%' AND T2.C1 LIKE 'ABC%';
Given the equivalence relationship between two columns , And any predicate on one of the columns , We can almost derive predicates on another column . But that doesn't mean , We always have to derive new predicates . The computational cost of some complex predicates themselves may be relatively high , And the filterability of the predicate itself is not good , Derivation produces new complex predicates instead It will lead to query performance degradation . In fact, when making decisions , We should first judge whether the derived new predicate can filter out a large amount of data .
OR Predicate derivation
OR Predicates are also common in business queries . In the following query , There is a very interesting OR The predicate . First , This predicate refers to the data of multiple tables , therefore , This predicate can only filter the results after multi table connection . What's interesting is that : This OR In each branch of , It's all about T1 Predicate on table . We can construct T1 Filter predicates on the table :T1.C2 = 1 OR T1.C2 =2 . This is a single table filter predicate , It can be filtered in advance T1 The data of , Reduce the number of rows participating in the connection .
SELECT * FROM T1, T2
WHERE T1.C1 = T2.C1 AND
((T1.C2 = 1) OR (T1.C2 = 2 AND T2.C2 = 2))
SELECT * FROM T1 ,T2
WHERE T1.C1 = T2.C1 AND
(T1.C2 = 1 OR T1.C2 = 2) AND
((T1.C2 = 1) OR (T1.C2 = 2 AND T2.C2 = 2));
MIN/MAX Predicate derivation
The derivation of the above two scenarios is relatively intuitive . Now we introduce a more “ Obscurity ” Predicate derivation of .
In the following query , There is one. MAX(C2) > 10 Of HAVING The predicate . According to this predicate , We can derive a C2 > 10 Filter predicate of . The rationality here lies in : The original query is ultimately retained only MAX(C2) > 10 Group aggregation results , If a given row is not satisfied C2 > 10, There are two situations :
1、 This line is not in the same group C2 The maximum of ( It doesn't make sense for grouping aggregation , Can filter )
2、 This line is in the same group C2 The maximum of ( Will be HAVING Predicate filtering )
In both cases , dissatisfaction C2 > 10 All data can be filtered in advance . therefore , We can derive a new predicate C2 > 10.
SELECT C1, MAX(C2)
FROM T1
GROUP BY C1 HAVING MAX(C2) > 10;
=>
SELECT C1, MAX(C2)
FROM T1
WHERE C2 > 10
GROUP BY C1 HAVING MAX(C2) > 10;
Allied , Give the following band MIN Query of aggregate function , We can also derive a new predicate . These predicates can filter out some data in advance , Reduce the computation of grouping aggregation operations , Improve query performance .
SELECT C1, MIN(C2)
FROM T1
GROUP BY C1 HAVING MIN(C2) < 10;
=>
SELECT C1, MIN(C2)
FROM T1
WHERE C2 < 10
GROUP BY C1 HAVING MIN(C2) < 10;
This derivation method has many properties for the query form . Readers can consider , If there are other aggregate functions in the query , Whether the predicate derivation above can also be done ?
Derivation trap
There are also some pitfalls that are easy to make mistakes in deriving new predicates . for example : Consider the following query Q3, Can we according to T1.C_CI = ‘A’ and T1.C_CI = T2.C_BIN Derivation produces a new predicate T2.C_BIN = ‘A’ ?
This derivation is wrong .
This is because , When comparing predicates here , The way of comparison is different . stay T1.C_CI = ‘A’ in , String comparison is case insensitive , namely :‘a’, ‘A’ All meet the filtering conditions . but T1.C_CI = T2.C_BIN Is to compare strings in a case sensitive way . Combine these two predicates , It can only be inferred :T2.C_BIN The values for ‘a’ perhaps ‘A’. however T2.C_BIN = 'A’ Case sensitive comparison , It will directly filter out the value of ‘a’ The data of . therefore , It is incorrect to derive this new predicate .
CREATE TABLE T1 (C_CI VARCHAR(10) UTF8_GENERAL_CI);
CREATE TABLE T2 (C_BIN VARCHAR(10) UTF8_BIN);
Q3: SELECT * FROM T1, T2
WHERE T1.C_CI = 'ABC' AND T1.C_CI = T2.C_BIN;
=>
Q4: SELECT * FROM T1, T2
WHERE T1.C_CI = 'ABC' AND T1.C_CI = T2.C_BIN AND T2.C_BIN = 'ABC';
3、 ... and 、 summary
This paper mainly introduces the derivation of some predicates . Deriving new predicates is very important for query optimization . Based on the new predicate , The query optimizer can choose a better index , Generate better base table access paths . therefore , Predicate derivation is a very important optimization technique . There are many predicate related optimizations , In the next article , We will introduce the technology of predicate movement . It will adjust the position of predicates in the query , Move the predicate to a more reasonable position , Improve the performance of the whole query .
边栏推荐
- 基于Verilog HDL的数字秒表
- Mandatory interview questions: 1. shallow copy and deep copy_ Shallow copy
- Background image and QR code synthesis
- Let me introduce you to the partition automatic management of data warehouse
- I'd like to ask if the table creation DDL of ODPs can't be directly executed in MySQL. The string type is incompatible. Can you only adjust this by yourself
- Notes of Teacher Li Hongyi's 2020 in-depth learning series 3
- Multithreading & high concurrency (the latest in the whole network: interview questions + map + Notes) the interviewer is calm
- Horizontally centered element
- 来自大佬洗礼!2022 头条首发纯手打 MySQL 高级进阶笔记, 吃透 P7 有望
- Shell echo command
猜你喜欢

Network Security Learning (III) basic DOS commands

谢振东:公共交通行业数字化转型升级的探索与实践

Understanding complexity and simple sorting operation

Let me introduce you to the partition automatic management of data warehouse

做一个文艺的测试/开发程序员,慢慢改变自己......

Salesforce zero foundation learning (116) workflow - & gt; On flow

Convert a string to an integer and don't double it

Background image and QR code synthesis

用VS Code搞Qt6:编译源代码与基本配置

Burp's thinking from tracing to counteracting
随机推荐
Salesforce zero foundation learning (116) workflow - & gt; On flow
From the big guy baptism! 2022 headline first hand play MySQL advanced notes, and it is expected to penetrate P7
ASP.NET Core 6.0 基于模型验证的数据验证
先工程实践,还是先工程思想?—— 一位本科生从学oi到学开发的感悟
Use SQLite provided by the system
With screen and nohup running, there is no need to worry about deep learning code anymore | exiting the terminal will not affect the operation of server program code
How painful is it to write unit tests? Can you do it
QDir类的使用 以及部分解释
Network Security Learning (III) basic DOS commands
Qt | 事件系统 QEvent
新手哪个证券开户最好 开户最安全
JS ------ Chapter 5 functions and events
Notes of Teacher Li Hongyi's 2020 in-depth learning series 5
Which securities account is the best and safest for beginners
The laneatt code is reproduced and tested with the video collected by yourself
云计算三类巨头:IaaS、PaaS、SaaS,分别是什么意思,应用场景是什么?
Three ways of shell debugging and debugging
郑慧娟:基于统一大市场的数据资产应用场景与评估方法研究
Talk about how redis handles requests
谢振东:公共交通行业数字化转型升级的探索与实践