当前位置:网站首页>Typical data Lake application cases
Typical data Lake application cases
2022-07-25 04:04:00 【InfoQ】
Typical data Lake application cases
1. Advertising data analysis
- Concurrency and peak problems . In the advertising industry , Traffic peaks often occur , Instant hits may reach tens of thousands , Even hundreds of thousands , This requires the system to have very good scalability to quickly respond to and process every click
- How to realize the real-time analysis of massive data . In order to monitor the advertising effect , The system needs to analyze each click and activation data of the user in real time , At the same time, the relevant data is transmitted to the downstream media ;
- The amount of data on the platform is growing rapidly , Daily business log data is continuously generated and uploaded , Exposure 、 Click on 、 The pushed data is being processed continuously , The amount of data added every day is already 10-50TB about , It puts forward higher requirements for the whole data processing system . How to efficiently complete the offline of advertising data / Near real time statistics , Conduct aggregation analysis according to the dimensional requirements of advertisers .


2. Game operation analysis
- Be flexible enough . For the game , It often erupts in a short time , The amount of data is surging ; therefore , Whether it can adapt to the explosive growth of data , Meeting elastic demand is a key consideration ; Whether it's computing or storage , All need to have enough elasticity .
- Have enough cost performance . For user behavior data , It often takes a long period to analyze and compare , Such as retention rate , In many cases, we need to consider 90 God even 180 Day customer retention rate ; therefore , How to store massive data in the most cost-effective way for a long time is a key consideration .
- Have enough analytical skills , And scalable . In many cases , User behavior is reflected in buried point data , The buried point data needs to be consistent with the user registration information 、 Landing information 、 Correlation analysis of structured data such as bills ; therefore , In data analysis , At least big data is needed ETL Ability 、 The access ability of heterogeneous data sources and the modeling ability of complex analysis .
- To match the company's existing technology stack , And the follow-up is conducive to recruitment . about YJ, An important point in technology selection is the technical stack of its technicians ,YJ Most of our technical teams are only familiar with traditional database development , namely MySQL; And the hands are tight , The only technicians who do data operation analysis are 1 individual , There is no ability to independently build the infrastructure of big data analysis in a short time . from YJ From the angle of , It is best that most of the analysis can pass SQL complete ; And in the recruitment market ,SQL The number of developers is also much higher than that of big data development engineers . For the customer's situation , We helped our customers transform their existing solutions .

- Behavioral data and structured data are completely separated , Unable to perform linkage analysis ;
- Provide intelligent retrieval function for behavioral data , Unable to do deep mining analysis ;
- OSS Used only as a data storage resource , Not mining enough data value .


- Due to the diversification of business types and needs , Provided by the platform SaaS Class analysis is difficult to cover all types of businesses , Unable to meet the customized needs of merchants ; For example, some businesses pay attention to sales , Some focus on customer operations , Some focus on cost optimization , It's hard to meet all the needs .
- For some advanced analysis functions , For example, customers who rely on custom labels circle 、 Customer defined extension and other functions , Unified data analysis services can not meet ; In particular, some custom tags depend on the merchant's custom algorithm , Unable to meet customers' advanced analysis needs .
- Data asset management requirements . In the age of big data , Data is an enterprise / The assets of the organization have become a consensus , How to make the data belonging to the merchant reasonable 、 Long term precipitation , It's also SaaS Service considerations .

- Data capitalization capability . Using data Lake , Businesses can continuously precipitate their own data , How long is the data kept , How much does it cost , It is entirely up to the merchant to decide . The data lake also provides data asset management capabilities , In addition to managing raw data , It can also save the processed process data and result data by category , Greatly enhance the value of buried point data .
- Analytical modeling capability . There is more than raw data in the data Lake , And the model of buried point data (schema). The buried point data model reflects the abstraction of business logic by the global data intelligent service platform , Through the data lake , In addition to exporting raw data as assets , The data model is also output , With the help of buried point data model , Businesses can have a deeper understanding of the user behavior logic behind the buried point data , Help businesses better insight into customer behavior , Get user needs .
- Service customization capability . With the help of the data integration and data development capabilities provided by the data Lake , Based on the understanding of buried point data model , Businesses can customize the data processing process , The original data is processed iteratively , Extract valuable information from data , Finally get the value beyond the original data analysis services .
边栏推荐
- 基于ABP实现DDD--领域逻辑和应用逻辑
- DNS domain name resolution service
- [Flink] transform operator flatmap
- Debezium series: optimize cluster parameters and support personalized settings of debezium connector
- Customized view considerations
- 应急响应全栈
- Shell string
- Original | record a loophole excavation in Colleges and Universities
- [Flink] rich function
- 有个问题想请教下,我想用来同步数据库,但我看他是根据mysql 的binlog同步的,如果是大表,一
猜你喜欢

Deeply understand the connection state and reliable mechanism of TCP protocol

Original | ueditor1.4.3-asmx bypasses WAF

How should enterprise users choose aiops or APM?

CVPR 2022 | content aware text logo image generation method

MySQL eight shares

Network security - information hiding - use steganography to prevent sensitive data from being stolen

Emergency response stack

Solve "nothing added to commit but untracked files present"“

Has baozi ever played in the multi merchant system?

原创 | ueditor1.4.3-asmx绕过waf
随机推荐
Use of CCleaner
DNS domain name resolution service
Deeply understand the connection state and reliable mechanism of TCP protocol
01_ Education 4
Localization distillation for dense object detection cvpr2022
Debezium series: when there are a large number of DML operations in the record source database, the debezium consumption data time lags behind the data generation time by several hours
Operations in shell
Divide candy Huawei od JS
JS absolute minimum value of the sum of Huawei od two numbers
Yuntu says digital asset chain: your God of digital asset property protection
DNS resolution experiment
ES(8.1)认证题目
The difference between apply, call and bind
Experience sharing of system architecture designers in preparing for the exam: how to prepare for the exam effectively
Homologous strategy, surface longitude
Network security - information hiding - use steganography to prevent sensitive data from being stolen
Secondary vocational network security skills competition P100 vulnerability detection
Creativity: presentation of AI oil paintings with high imitation mineral pigments
Debezium series: in depth interpretation of important JMX indicators of debezium
Network construction and application in 2020 -- the answer of samba in Guosai