当前位置:网站首页>Open source SPL redefines OLAP server
Open source SPL redefines OLAP server
2022-06-22 14:34:00 【Superdream dream dream】
OLAP(Online Analytical Processing) It refers to online analysis , Based on data query calculation and real-time return results . Reports in daily business 、 Data query 、 Multidimensional analysis and other data query tasks that require immediate return of results belong to OLAP The category of . Corresponding , There are also corresponding products in the industry to meet such needs , That's it OLAP Server.
OLAP Server present situation
The current mainstream OLAP Server Almost all based on RDB Or encapsulated into RDB Big data platform of , It's kind of like early ROLAP( The word has rarely been mentioned ), One of the key features is the use of SQL As a query language .
RDB and SQL The characteristics of will give OLAP Server Bring many difficulties .
Complex reports are difficult
in fact , Report is OLAP Business plays ,OLAP A large part of the query requirements of the are pre prepared report query interfaces , Instead of free drag and drop multidimensional analysis , Complex reports often account for more than half of the report requirements . The typical characteristic of this kind of report is that the data processing logic is complex , Each report needs to write separate code for data preparation , The most common approach is to use complex SQL Or stored procedure , If you encounter some scenarios that cannot be implemented by the database ( External data sources such as files 、 Computing across data sources 、 Separation of the front and rear ends, etc ) It also needs to go through JAVA complete , The process is very complicated .
SQL Implementing these calculations is difficult , Stored procedures also have many drawbacks ( No portability 、 There are potential safety hazards ) Resulting in less and less use ,Java Set operation is difficult and cannot be hot switched, so it is difficult to adapt to complex and changeable report requirements . At present OLAP Server The performance of complex reports is not ideal .
Self service association is poor
Even if you don't care about complex reports , Consider only the foundations of multidimensional analysis OLAP Mission , Use SQL It is also difficult to be competent as a query language , It can only solve a small number of unrelated single table analysis , Meet some relatively fixed multidimensional analysis requirements , The scope of application is very small , It is difficult to adapt to flexible self-help analysis scenarios .
The system is closed
At present OLAP Server Heavily dependent on Database , The database has “ library ” The concept of , Data only “ Put in storage ” Ability to deal with , And usually only one database can be processed at the same time , Cannot calculate data outside the database at the same time . and OLAP Called online analytics , Business also requires that T+0 Type real-time query analysis . Data from other data sources needs to be ETL It can only be calculated in the database , This leads to non real time . The typical scene is OLAP Businesses often need to query the real-time data of the business library , To convert real-time data ( Business Library ) And historical data ( Analysis Library ) Hybrid query analysis (T+0 Inquire about ), This is the present. OLAP Server Hard to satisfy . Moreover, there are many non relational database data that cannot be OLAP Server Direct calculation .
Low performance
Let's take a step back , Even if you only focus on historical data , Real time production data is not considered , Only a single database , At present OLAP Query also faces the problem of low performance , We often have to wait a few minutes to query the report 、 Real time query is not real time 、 Multidimensional analysis of Caton's situation . The root cause is still SQL The problem of , Based on the theory of relational algebra SQL Difficult to implement high-performance algorithms , The optimization of database in engineering alone can not fundamentally solve the problem ,SQL When complex, database optimization is often ineffective, resulting in low performance .
Open source SPL Redefinition OLAP Server
SPL After the advent of Technology , Will make OLAP Server The above-mentioned dilemma has been greatly improved .
SPL Is a special programming language for structured data computing (Structured Process Language) For short .SPL Providing rich computing class libraries and agile development syntax can quickly complete all kinds of complex data processing ;SPL The computing power of does not depend on the database ( data source ), Natural support for diverse data sources , It can perform hybrid computing across data sources , Realize real-time query across heterogeneous sources ;SPL A large number of high-performance algorithms, storage schemes and parallel computer systems are built in to ensure the high performance of computing .
Agile process computing adapts to complex reports
In complex data processing ,SPL Provide independent agile syntax to support process computing , be relative to SQL,SPL The grammar is more concise , It is suitable for preparing complex report data .
For example, to calculate : How many consecutive days has a stock risen ?
use SQL With the help of window function, it is necessary to write four levels of nested statements :
select max(continuousDays)-1
from (select count(*) continuousDays
from (select sum(changeSign) over(order by tradeDate) unRiseDays
from (select tradeDate,
case when closePrice>lag(closePrice) over(order by tradeDate)
then 0 else 1 end changeSign
from stock) )
group by unRiseDays)
And the same logic uses SPL Writing is much easier :
| A | |
| 1 | =T(“/dw/stockRecord.txt”) |
| 2 | [email protected](closePrice< closePrice[-1]).max(~.len()) |
SPL Advocate step-by-step operation , Complex computing can be realized step by step according to natural thinking .

Again with the help of SPL Rich computing class libraries can greatly simplify the difficulty of data processing .

in the light of SQL It is difficult to debug ,SPL It also provides a simple and easy-to-use development environment , Step by step 、 To set breakpoints , WYSIWYG results preview window …

Reports will be added continuously during business development 、 modify . Using report tools can solve the problem of making report presentation templates quickly , But it can not cope with the complex and changeable report data preparation , No matter in the past SQL/ Stored procedures are still Java It's hard to deal with .
Use SPL Complete report data preparation , Report data preparation can be realized as a tool , In addition, the original reporting tool on the presentation side , Make report development fully instrumented , So as to reduce the cost 、 Deal with endless reports quickly .
SPL Is a program language that interprets execution , natural Support hot switching . report form ( Data preparation ) The modification can take effect without restarting the service , To meet the needs of constantly changing reports .
More Than This , With the help of SPL Agile and switchable features , It can also be well integrated with development frameworks such as microservices .SPL Provide computing power independent of database , The algorithm peripherals complete the microservice data processing , relative Java Hard coding also has advantages , It can effectively reduce the coupling between application modules .
The system is open
Relative to tradition OLAP Server Closure of , be based on SPL Realized OLAP Sever The system is more open .SPL The calculation of does not depend on the database , There is no more “ library ” The limitation of , Not even “ library “ The concept of . Any data source can be used directly ,CSV、Excel、JSON/XML、NoSQL、RestAPI、HDFS、Kafka、Elasticsearch、SAP Can support , You can also perform mixed calculations . Data sources can come from local application systems , It can also be an external system or a remote cloud application .
This open computing system can be easily completed T+0 Real time data query , At the same time, connect the business library storing hot data and the analysis library storing cold data ( Or document ) It can be realized by hybrid calculation T+0.
High performance
SPL There is no theory based on relational algebra , But invented the discrete data set algebra . such , quite a lot SQL Hard to implement high-performance algorithms and storage solutions SPL But it can be done easily , The key to improving software performance lies in algorithm and storage .
for example ,SPL Support more thorough aggregation , You can put TopN Understood as aggregation operation , In this way, the sorting with high complexity can be transformed into aggregation operation with low complexity , And it can also expand the scope of application .
| A | ||
| 1 | =file(“data.ctx”).create().cursor() | |
| 2 | =A1.groups(;top(10,amount)) | Amount before 10 Order with name |
| 3 | =A1.groups(area;top(10,amount)) | The amount of each region is in the top 10 Order with name |
SQL Describing the above operations will involve large sorting , Very poor performance , We can only hope to optimize the database . But in a slightly more complicated situation ( such as A3 Adjoint grouping operation in ) The database optimizer will fail .
Another example ,SPL Our cursors support reuse , You can aggregate multiple results in one iteration .
| A | ||
| 1 | =file(“order.ctx”).create().cursor() | Ready to traverse |
| 2 | =channel(A1).groups(product;count(1):N) | Configure reuse computing |
| 3 | =A1.groups(area;sum(amount):amount) | Traverse , And get the grouping results |
| 4 | =A2.result() | Get the result of the multiplexing operation |
and SQL This algorithm cannot be described , Implementing the above operations will inevitably traverse the big data many times , Cause low performance . And this problem is still theoretical , The database optimization engine is powerless .
SPL Others provided are similar to OLAP Business related performance optimization technologies include : Orderly merge to realize the association between orders and details 、 Pre association technology realizes multi-layer dimension table Association in multi-dimensional analysis 、 Bit storage technology to achieve thousands of tag statistics 、 Boolean set technology can speed up the query of multiple enumeration value filter conditions 、 Time series grouping technology realizes complex funnel analysis , The technique of multiplying and segmenting storage realizes the smooth parallel of column storage 、…. Quite a few of them are SPL The invented algorithm .
use TPCH Measured in accordance with international standards ,SPL At low performance ARM High performance on chip Intel On chip Oracle Several times faster , This is the advantage of innovative algorithms ..
stay SPL Supported by high performance algorithms and storage schemes , The calculation of historical big data will achieve higher performance , Hybrid query with real-time business hot data can further improve T+0 The query efficiency .
Relational query
Against tradition OLAP Server Poor correlation ability in multidimensional analysis , be based on SPL A relational query parsing syntax is also developed DQL.DQL(Dimensional Query Language) Is a class with dimension as its core SQL Query language adopts the method of SQL Different ideas .
Currently based on SQL Of OLAP Server There is no particularly good way to implement multi table Association , Or use logical wide table , However, too many fields will be generated ( Dimension table fields will be copied many times , Multi level correlation 、 Self correlation 、 Circular correlation will exacerbate this situation ) Cause users to be unable to use , And the performance is poor . There are some BI Products can be automatically associated on the page according to the fields selected by the user , But only for simple cases , When the same dimension field is encountered ( Like a table with 2 More than region fields ) Can't match , There is no way to deal with autocorrelation . It is obviously more unrealistic for users to associate tables and fields with each other .
that DQL How does it work ? For example, such a sentence SQL:
--SQL
SELECT A.* FROM EMPLOYEE A, DEPARTMENT B, EMPLOYEE C
WHERE A.country='USA'
AND C. country ='China'
AND A. dept_id =B. dept_id
AND B. manager=C. emp_id
It involves multiple tables and auto correlation , It is difficult for business users to BI Correctly describe the relationship in the interface .
The same query uses DQL It's written like this :
--DQL
SELECT * FROM EMPLOYEE
WHERE country ='USA' AND dept_id.manager.country ='China’
The complex multi table association is transformed into a simple single table query , Ordinary business users can understand it and implement it in the interface .
summary
SPL And DQL The advent of , Will be right OLAP Server Have a profound impact .
be based on SPL Agility of ( Process calculation 、 Algorithm external 、 Explain to perform ) Can be well adapted to OLAP The need for complex reports in business , Rapid development of 、 Hot switch 、 Low coupling can be well integrated with microservices ; The open computing system and the unconstrained data organization have broken the tradition OLAP The closeness of the product , You can directly use various data sources , Make it easy T+0 Inquire about ; Based on SPL Of DQL It can solve the problem of real-time Association query in multidimensional analysis ;SPL High performance algorithm and storage technology ensure that OLAP Operational performance , Complete report query efficiently 、T+0 Inquire about 、 Query analysis tasks such as multidimensional analysis .
We expect to be based on SPL A new generation of Technology OLAP Server as well as BI Appearance of products .
SPL Information
边栏推荐
- Offline physical stores combined with VR panorama make virtual shopping more realistic
- Vcip2021: super resolution using decoded information
- Nested assignment of D
- Madcap flare 2022, documentation in language or format
- 一文搞懂开放源码软件(OSS)质量保证
- Implementation of redis+caffeine two-level cache
- How many days are there between the two timestamps of PHP
- Tasks and responsibilities of the test team and basic concepts of testing
- 成都测试设备开发_单片机C语言之数组介绍
- 《Kubernetes监控篇:Grafana通过自动化方式添加datasource和dashboard》
猜你喜欢

看完這篇 教你玩轉滲透測試靶機Vulnhub——DriftingBlues-5

如何保护WordPress网站免受网络攻击?采取安全措施至关重要

Stephencovey's tips for efficient work for young people

Understand the quality assurance of open source software (OSS)

JS高级程序设计第 4 版:迭代器的学习

Chip silicon and streaming technology

轻松上手Fluentd,结合 Rainbond 插件市场,日志收集更快捷

Installing and using protobuf-c

芯片硅片与流片技术

能让Jellyfin直接挂载阿里云盘的aliyundrive-fuse
随机推荐
基于SSM框架实现的甜品饮品店前后台管理系统甜品商城蛋糕店【源码+数据库】
C # define and implement interface interface
What is the difference between Z-score and deltf/f?
Cve-2022-22965 reappearance
Shan Zhiguang, chairman of BSN Development Alliance: DDC can provide the underlying support for the development of China's meta universe industry
How location coding (PE) works in transformers
基于SSH框架甜品商城管理系统【源码+数据库】
[untitled]
Database employment consulting system for your help
天润云上市在即:VC大佬田溯宁大幅减持,预计将套现2.6亿港元
加密市场进入寒冬,是“天灾”还是“人祸”?
ThoughtWorks.QRCode和ZXing.Net 二维码,网址可以直接跳转
Unity 子线程调用主线程的UI
如何给VR全景作品添加遮罩?作用是什么?
JasperReport报表生成工具的基本使用和常见问题
【Pr】基础流程
Lisez ceci pour vous apprendre à jouer avec la cible de test de pénétration vulnhub - driftingblues - 5
Understand the quality assurance of open source software (OSS)
unity和C#中怎么去比较2个日期大小
ThoughtWorks. QRcode and zxing Net QR code, URL can be directly jumped