当前位置:网站首页>"Four highs" of data midrange stability | startdt Tech Lab 18

"Four highs" of data midrange stability | startdt Tech Lab 18

2022-06-23 12:56:00 InfoQ

Write it at the front
This is the singularity cloud technology column 「StartDT Tech Lab」 Of the 18 period .
ad locum , We focus on data technology , Share methodology and practice . Front line project experience , Rich practical experience , Real summary experience … Slide to the end of the text , You can see our previous content .
This chapter consists of the singularity cloud DataSimba Team bring :

null
author : Far wave 、 Fish fly 、 Ruoxi
Reading time : about 8 minute

cost reduction , Increase of efficiency , Aid decision making … The industry only likes the middle ground , And when we go behind the scenes , It is found that the enterprise is in the real scenario of using data , There will be some problems :
Data analysis output is not timely , As a result, the business department cannot get the results in time ; Inaccurate data processing and calculation , It leads to deviation in decision making ; The platform service is unstable , It will also lead to the loss of users 、 High operation and maintenance costs .
these “ Behind the scenes story ” All point to a concept : Stability of the data center .

0.  Data Center “ stability ” What is it? ?


Data Center “ stability ” What is it? ?
We think , Stability is a necessary performance of the data center , It ensures that the data center can calculate the data storage 、 The midrange application architecture and the platform itself do fine control and guarantee .
Say something reasonable. , That is to say
Only data midrange with stability , Only in this way can the data storage and calculation be ensured to be normal 、 Efficient operation , To manage the application architecture and the platform itself , Deal with all kinds of unexpected problems in time .

What kind of data is stable ?
We think , Yes
Four essential elements
High availability , High concurrency , Efficient scheduling , Efficient operation and maintenance
.

High availability
: Availability shall at least meet 99.999%.
High concurrency
: The data computing capacity reaches 100 million level of offline computing +/ Hours , Real time computing tens of millions of levels / Hours ; data API Service satisfaction QPS(Queries per Second, Query rate per second )10 ten thousand +.
Efficient scheduling
: Support task scheduling at hundreds of thousands of levels .
Efficient operation and maintenance
: Support second level alarm 、 Minute positioning and stop loss 、 Hour level recovery failure impact .
 

1. “ High availability ” Data Center


The data center shall be stable ,“ High availability ” Is an important foundation .
“ High availability ”
(High Availability), It usually means that a system has been specially designed , Reduce the time when the system cannot provide services , Ensure long-term availability of services as far as possible .
For example , If every time the system runs 10 Ten thousand time units , There will be 1 Time units can't provide services , Then the availability of the system is 99.999%.

One “ High availability ” Our data platform should consider these five aspects :
The Internet
Business services
middleware
database
Data itself
.
1.  The Internet
: Adopt equipment redundancy 、 link bundling 、 Ring network technology, etc , Protect against network failure ( error ) It can recover quickly and automatically ;
2.  Business services
: The service node itself is stateless , Multiple nodes can be deployed , To ensure that when any node is abnormal ( Downtime ) when , The service can still be provided externally ;
3.  middleware
: Middleware to registry 、 sentry 、 Active / standby mode , To ensure that any node of the middleware is abnormal ( Downtime ) when , The service is still accessible ;
4.  Data itself
: Data needs to be backed up , To improve the fault tolerance of the data itself , Prevent data loss ;
5.  database
: The distributed database shall adopt multiple copies , The non distributed database shall be active and standby 、 The way of regular data backup .

A well-known consumer goods enterprise in singularity cloud service , It needs to carry 100 million data processing every day , Connect with dozens of suppliers , Up undertake the data operation Kanban of multiple business lines . Once there is a problem in the data center , It may cause abnormal data of business lines the next day , Therefore, there is a very high demand for high availability .
Cloud native data platform of singularity cloud DataSimba According to the customer situation , From the network 、 Data itself 、 database 、 middleware 、 Business services are reinforced from five perspectives , Finally, the availability of the customer data platform is guaranteed to reach 99.999%.

2. “ High concurrency ” Data Center


High concurrency (High Concurrency), Literally , It means that a system has been designed , It can process many requests in parallel at the same time .
To be specific , High concurrency is mainly reflected in
Data integration
and
Data services
Two aspects :
· 
Data integration level
, With minimal impact on the source database , Guaranteed data ETL Concurrent collection capability ;
· 
Data service level
, It can provide the processed result data for customers' decision-making faster 、 Report analysis, etc .

​ From data integration to data services
Singularity cloud DataSimba The high concurrency of has the following characteristics :
1.  The service is deployed in a clustered and distributed manner , High scalability ;
2.  According to different data levels , Adopt different data storage schemes ;
3.  The high concurrency scheme supports fusing 、 Current limiting and degradation etc ;
4.  Data services QPS Supportable 10 Wan grade .

A leading enterprise in the domestic securities industry has 100 million customers C End user , It needs to be delivered in time C The end data is collected to the data midrange , And the data is supplemented and calculated through standardization , Ultimately for C End users provide services .DataSimba High concurrency and data processing capability 100% Support the customer's needs .

3. “ Efficient scheduling ” Data Center


A stable data center , Good dispatching service is also required : Facing a large number of tasks , According to the resources of the enterprise 、 Task priority 、 Add time and other factors , Carry out the task in an orderly manner . While each task is completed in time , Save the resource cost of the enterprise .

​“ Efficient scheduling ” Logic diagram
DataSimba The efficient dispatching of has the following characteristics :
1.  High multi type task compatibility
: Support DataX、Flink、Python、Hive、Spark And other different types of tasks ;
2.  High resource utilization
: The task decision-making system is based on different task resource occupancy 、 Remaining physical resources 、 Task depends on priority , Reasonably schedule the task execution sequence ;
3.  Real time scheduling optimization
: Real time monitoring of task execution status , Reschedule failed tasks in time .

With a certain interior decoration 3D Design software customers as an example , The enterprise has a huge C End user community , The daily task to be scheduled reaches 10 Wan grade , And the dependency complexity between tasks is high .DataSimba Support the enterprise to complete the day 10 Million level task processing .

4. “ Efficient operation and maintenance ” Data Center


Operation and maintenance is the support for the stability of the data center .
The efficient operation and maintenance of the data center is mainly reflected in : Can quickly find problems , At the same time, I have the ability to solve problems , Self recovery , Minimize the maintenance cost of the enterprise to the data center .

With DataSimba For example , Singularity cloud from below 3 Point to realize the efficient operation and maintenance of the data center :
1、 Distributed Link Tracking
DataSimba Adopt distributed link tracing , Multi language automatic probe 、 Compatible with a variety of open source infrastructure and components 、 Automatic probes for infrastructure and components .

​ Distributed Link Tracking
2、 Intelligent monitoring and alarm
When the report task times out 、 Task failed 、 Beyond the scope of setting alarm rules ,DataSimba Will automatically identify in real time , And send an alarm message , Prompt the corresponding alarm object .

null

3、 Since the recovery
The core business module of the system shall support the automatic recording of error nodes in case of failure , And it has self recovery function , To ensure the data integrity and accuracy when the task is running .
DataSimba It has a comprehensive monitoring and alarm mechanism , The alarm has reached the second level 、 Minute positioning and stop loss 、 Hour level recovery failure impact .


null

Take up a “ chestnuts ”, A head FMCG enterprise customer , There are nearly 10000 task instances in the data center , And the data calculation dependency and coupling between task instances are high .DataSimba Low cost of operation and maintenance 、 A timely response 、 The accuracy and integrity of task data , Fully meet the requirements of the enterprise , Effectively deal with various abnormal scenarios in the use of the enterprise , It was well received .

5.  Summary


We think ,
Data midrange that can bring business value , Is the data center that customers really need
.

So , The data center must be able to
accurately 、 Steadily 、 Efficiently
For the enterprise itself —— Only the data center with stability , Only in this way can the integrity and accuracy of enterprise data be effectively guaranteed , Improve service availability , At the same time, it realizes intelligent operation and maintenance , Lower maintenance cost , Bring users a good experience , Provide support for accurate decision-making , Reduce cost and increase efficiency .

Review the key points mentioned above ,
High availability 、 High concurrency 、 Efficient scheduling 、 Efficient operation and maintenance , this “ Four high ” It constitutes the four essential elements of data stability .
The reason why singularity cloud attaches importance to the stability of data in the middle stage , And choose to integrate the above elements into the cloud native data platform DataSimba, Because we stand with our customers .

null
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206231158463827.html

随机推荐