当前位置:网站首页>Optimization of lazyagg query rewriting in parsing data warehouse
Optimization of lazyagg query rewriting in parsing data warehouse
2022-06-25 17:58:00 【Huawei cloud developer Alliance】
Abstract : This paper deals with Lazy Agg Query rewrite optimization and GaussDB(DWS) Provided Lazy Agg Rewrite rules .
This article is shared from Huawei cloud community 《GaussDB(DWS) lazyagg Query rewrite optimization resolution 【 Gauss is not a mathematician this time 】》, author : OreoreO .
The aggregation operation groups the query results by the values of one or more columns , A set of equal values . Aggregation is a common operation and is widely used in financial customers . For example, the following statement :
SELECT a, count(a) FROM t1 GROUP BY a; -- Press a Group and calculate the number of duplicate values in the group
One 、Lazy Agg Rewriting rule
In the case of large amount of data , Due to the large amount of data, the footwall , The execution time of aggregation operation becomes a performance bottleneck , As a result, the whole query execution efficiency is very poor . for example :
SELECT t2.b, sum(cc) FROM (SELECT b, sum(c) AS cc FROM t1 GROUP BY b) AS s, t2 WHERE s.b=t2.b GROUP BY t2.b;
Subquery pair t1.b Columns are aggregated , Yes t1.c Summation , In an external query , There are also aggregation operations , Aggregate sum columns for subqueries cc Summation . For such statements , When the aggregation operation of sub query is time-consuming , Query rewriting rules can be used to eliminate the aggregation of subqueries , The aggregation function of the external query uniformly completes the aggregation operation . Eliminating a subquery may result in an increase in the number of rows in the subquery , But for the sub query aggregation operation t1.b Column distinct Scenarios with high values , The number of rows after the sub query aggregation operation will not be significantly reduced compared with the original table , Will not cause the outer layer JOIN A large increase in the amount of computation . That is, the statement can be rewritten as :
SELECT t2.b, sum(cc) FROM (SELECT b, c AS cc FROM t1) AS s, t2 WHERE s.b=t2.b GROUP BY t2.b;
This rewrite rule is called Lazy Agg, It is applicable to the large amount of base table data distinct Scenarios with high values . If there are fewer duplicate values , Then eliminating the aggregation operation will lead to Join After that, the number of lines surged ,Join Poor performance , Therefore, it is necessary to Agg Push down to Join Before , Through advance Agg Reduced operation Join The number of rows of the result , This rewrite rule is called Eager Agg.
Two 、GaussDB(DWS) lazyagg Optimize
To make tuning less difficult , Improve product ease of use ,GaussDB(DWS) Provides lazyagg Query rewrite optimization rules , Can be set by guc Parameters rewrite_rule contain ’lazyagg’ Use Lazy Agg Query rewrite optimization . Turn on lazyagg After query rewrite optimization , For the scenario that meets the conditions, the aggregation operation in the sub query will be optimized and eliminated . The original plan is as follows :
lazyagg Rewrite the optimized plan as follows :
You can see that compared with the original plan ,lazyagg After rewriting the optimization, the aggregation operation in the original plan is eliminated , namely 7 Number Subquery Scan Operator and 8 Number HashAggregate operator .
3、 ... and 、lazyagg Optimize specifications
- The sub query can be a single aggregate query or a query containing aggregate sub set operations . Collection operations only support UNION ALL, Some branch sub queries can be aggregated and eliminated . Subquery must be JOIN One of the tables ( be not in TargetList、Where Clause, etc ).
- Support all external queries Agg The parameter column is contained in the... Of one of its subqueries Agg Function column , The aggregation operation of the sub query can be eliminated .
- Support all kinds of aggregation functions with correct results after eliminating the aggregation operation of sub queries . See the following table for the correctness of aggregation function type results :
4. Scene constraint
On the basis of the above scenario expansion , For scenarios that may lead to incorrect results , No query rewriting , Including but not limited to :
- Eliminating is not supported Agg Function type .
- The subquery contains other conditions or operators , Will result in error after rewriting , for example HAVING、window agg、LIMIT、OFFSET、AP function、distinct、recursive etc. .
- Outer layer Agg Parameter column 、GROUP BY Column or JOIN Column contains volatile function , Such as random、timeofday etc. .
- Subquery Agg Out of function 、 External query Agg There are other expressions or function operations in the function , Such as sub query Agg Function column is sum+1、max+max(d), External query Agg Function column is sum(cc+1) etc. .
- For external queries JOIN Column 、GROUP BY Columns or other conditions contain subqueries Agg Function column .
- Subquery in LEFT JOIN、RIGHT JOIN Of inner Edge or FULL JOIN in , And subquery Agg Function is count, External query Agg Function is sum Of .
Four 、 Conclusion
Through the analysis of this paper , I believe the user friends have fully understood Lazy Agg Rewrite optimized usage scenarios , as well as GaussDB(DWS) Of lazyagg Realization way . I hope that the majority of users can have an in-depth understanding of , Yes GaussDB(DWS) Have a strong interest in and deeply participate in the performance tuning of .
Reference documents :
GaussDB(DWS) Performance Tuning Series 4 : One of the eighteen martial arts SQL rewrite
Theory is not as good as practice , How to experience it quickly DWS Well ?DWS Now we have launched a Demo Experience activities . Get into DWS home page , Click on “Demo Experience ”, A quick and convenient experience !( Any suggestions and comments during the experience , You can go to DWS Community BBS Feedback oh )
Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- conda 修改镜像源
- 图标丢失,URL附带JESSSIONID的什么来的?
- [machine learning] case study of college entrance examination prediction based on multiple time series
- MySQL mysql-8.0.19-winx64 installation and Navicat connection
- Sentinel哨兵机制
- [matlab] data statistical analysis
- 沁恒CH583 USB 自定义HID调试记录
- Vscode / * * generate function comments
- 有关QueryInterface函数
- cgi通过odbc连接数据库
猜你喜欢
解决nvprof 报错ERR_NVGPUCTRPERM - The user does not have permission to profile on the target device.
What is an operator?
Distributed remote management of distribution room environment
About Equilibrium - Simplified bottleneck model
篇4:win10安装MingW64
container of()函数简介
ASP.NET超市便利店在线购物商城源码,针对周边配送系统
20 provinces and cities announce the road map of the meta universe
ACY100油烟浓度在线监控仪针对饮食业厨房油烟排放
Unity technical manual - interference / noise sub module
随机推荐
User scheduling problem
篇6:CLion:Toolchains are not configured Configure Disable profile
SDN system method | 9 Access network
Vscode / * * generate function comments
[matlab] data statistical analysis
The Stackies 2022:32个营销技术栈入选
Getting started with kotlin (20) several common dialog boxes
Time series analysis of data mining [easy to understand]
C语言中%含义
What is an operator?
Accumulation of some common knowledge points
RuntimeError: Trying to backward through the graph a second time (or directly access saved variable
Precautions for using Jerry's timer [chapter]
Solve nvprof error err_ NVGPUCTRPERM - The user does not have permission to profile on the target device.
Huawei cloud gaussdb (for redis) unveiling issue 19: gaussdb (for redis) comprehensive comparison with CODIS
bert之我的小总结
QT generate random numbers (random strings) within the specified range
[matlab] numerical calculus and equation solving
The performance of the server's four channel memory is improved. How about the performance of the four channel memory
Acy100 oil fume concentration online monitor for kitchen oil fume emission in catering industry