当前位置:网站首页>Ways to improve the utilization of openeuler resources 01: Introduction
Ways to improve the utilization of openeuler resources 01: Introduction
2022-07-24 03:00:00 【Euler open source community】
The problem background
According to the Canalys A report released showed [1], Global spending on cloud infrastructure services is 2022 Year on year growth in the first quarter of 34%, achieve 559 Billion dollars . However , Several studies have shown that , The current average number of global data center user clusters CPU Utilization is lower than 20%, There is a huge waste of resources . therefore , Improving the utilization of data center resources is an important problem that needs to be solved urgently [2].
The cause of the problem
The main reason for low resource utilization is the imbalance between tasks and resource allocation , This imbalance has many forms , for example :
- The scheduling system is independent of the cluster : Different jobs adopt different scheduling systems , Jobs cannot flow in a broader cluster , Idle resources of other clusters cannot be effectively utilized .
- Lack of diversity in task types : The job homogeneity in the cluster is serious , Some resources are used in the job set , As a result, the utilization rate of this part of resources is high , But the rest of the resources are idle .
- Lack of priority hierarchical management : Or the lack of low priority jobs to fill idle resources , Or there are low priority jobs, but the cluster does not have hierarchical control capability , Lead to over allocation of resources .
- The resource type in the cluster is single : The overall specification of the internal resources of the cluster is single , It cannot flexibly scale the dynamic requirements of various resources according to the overall business , This leads to excessive allocation of some resources .
Overall speaking , It is the lack of diversity of tasks and resources within the cluster , The weak ability of scheduling to manage diverse tasks and resources leads to .
Solutions
Deploy different types of jobs , Improve the utilization rate of resources in time and space respectively .
- Oversold resources ( Air separation is oversold ): The idle resources of online business are oversold to offline jobs , Improve overall resource utilization .
- Peak staggering use ( Time oversold ): The idle period of online business is filled with offline jobs , Reduce resource idling .
Technical challenges
Whether it is oversold by air or time , There is a lack of common peak resources , This problem will lead to the service quality of some businesses (QoS) Damage . How to improve resource utilization , Security business QoS Undamaged is a key technical challenge .
Besides , The diversity and complexity of cloud businesses further increase the difficulty of ensuring service quality :
One side , Perceived degree from load characteristics , It can be divided into white box applications , Black box application and gray box application . White box applications can be perceived by the system , Get... In real time QoS indicators ; Black box business cannot be perceived by the system , The system doesn't even know the application QoS What is it? ; Applications with a perceptibility between the two are called gray box applications . How to accurately quantify the service quality of black box business and locate interference sources is the technical challenge of capability generalization , It is also a research hotspot in the industry .
On the other hand , From the business complexity of the load , It can be divided into lightweight applications ( Such as microservices , Function calculation ), Traditional applications ( Such as monomer Application ) And super applications ( Such as HPC/AI) etc. . We need to overcome technical problems such as full stack collaborative awareness , Build a universal unified system .
Solution brief
According to the above cause analysis , further , Diversified businesses / Load and resource integration deployment scheduling , It can significantly improve the flexibility of resource allocation , So as to achieve the purpose of improving the efficiency of resource utilization . But it also brings greater technical challenges , Managed business / The more load , The more resource types , The more complex the dependency relationship is , The more complex the multi-objective optimization requirements of the system . Based on this , We divide it into the following development stages :
L0: Independent deployment : Cluster independent technology stack 、 Independent resource pool , Low cluster utilization (<20%).
L1: Shared deployment : Unified technology stack expands the scale of the cluster , Single type business shared resource deployment , Improve resource utilization based on dynamic elasticity , The utilization rate of cluster resources is low (<30%).
- Related technology : Technology stack unification 、 Containerization 、 Stretch and stretch
L2:「 Mixed deployment 」: Unified technology stack expands the scale of the cluster , Deployment of shared resources for various types of businesses , Improve resource utilization based on oversold and isolation technology , The utilization rate of cluster resources is high (>40%).
- Related technology : Oversold resources 、 Hierarchical isolation of resources 、 Feedback control
L3:「 Generic hybrid 」: Hybrid deployment business type generalization , Support the deployment of thousands of black box business shared resources on the public cloud , be based on QoS Quantitative perception ensures the service quality of key businesses .
- Related technology :QoS quantitative / location 、 Precise control 、QoS Perceptual scheduling
L4:「 Integration deployment 」: On the basis of load type generalization , Fusion container 、 The virtual machine 、 Lightweight runtime and other diverse loads , combination HPC/AI + Complex scenarios such as heterogeneous resource perception , Comprehensively improve the overall utilization of various resources .
- Related technology : Heterogeneous resource aware scheduling 、 Unified scheduling
among ,L1~L2 To improve the cluster CPU Resource utilization is the main factor ,L3~L4 Generalize the technology of improving resource utilization .
The industry is currently engaged in internal business L2 Level exploration has significantly improved the overall utilization of clusters and even data centers , But public cloud generalization is still in its early stage , It's not commercial yet .
We are on the trend of combining future generics and converged deployment , It has built a set of sustainable resource utilization solutions , As shown in the figure below :

In order to achieve the best deployment effect , It needs to be controlled and optimized at multiple levels of task execution :
「 Cluster management 」: At the scheduling level, businesses with strong performance interference are deployed separately , Reduce unnecessary interference through task combination optimization .
「 Stand alone management 」: Stand alone management level real-time perception of resource competition , Eliminate the impact on key operations .
「 Resource isolation layer 」: Priority control by grading tasks , Ensure the resource requirements of high priority tasks .
At present, Huawei has realized based on the above framework L2 Level solutions , The relevant features have been verified in Huawei and launched in succession . Important breakthroughs have been made in technology at all levels :
「 Cluster management 」:
Predictive scheduling : Support predictive scheduling based on node physical resource utilization [3]、 Load balancing scheduling 、 Resource preemption scheduling and other features .
Feature modeling : A set of general application portrait modeling components is designed and implemented , This component can automatically inject interference 、 Index collection and model output .
「 Stand alone management 」:
QoS quantitative : Real time detection of business based on quantitative model QoS And real-time control of interference sources .
Topology layout : According to the hardware topology , Make dynamic affinity arrangement for business , With the resource quota unchanged , Improve overall performance .
Power control : The increased resource utilization increases the risk of excessive power consumption of the whole machine , Power consumption changes need to be monitored in real time , Carry out targeted power consumption suppression .
L3/MB control : The current underlying hardware provides L3 Cache and memory bandwidth isolation , But still need software dynamic control , To achieve a balance between interference control and resource utilization .
「 Resource isolation layer 」:
Hierarchical preemption : Provide hierarchical preemption capability for prioritized queued resources , Such as CPU、MEM、IO/NET etc. , among CPU Absolute suppression ability ( Avoid priority reversal ),NET Preemptive performance (<100ms) And other industry leaders .
Flexible scheduling : Support tidal affinity 、CPU Burst Equal elastic scheduling capacity .
The above fine particle characteristics , We will also open to openEuler On , Please use more 、 Communicate more in the community .
Future plans
At present, we have verified and implemented the hybrid deployment scheme in some internal scenarios , It's reached L2 Stage . In the short term , We also need to break through the black box business QoS Ensure relevant technology and enter L3 Stage , Only to achieve L3 Only in this stage can more users benefit . In the long term , In addition to the container scenario , There are more load types 、 Resource types need to improve resource utilization , This needs to be scheduled in the cluster 、OS And other levels, there are more technological breakthroughs .
This article briefly introduces the thinking about the solution technology of improving the utilization of resources on the cloud , Follow up plans for the isolation technology involved , Feedback control technology , Perceptual scheduling technology is introduced in detail , Coming soon !
Reference material
- Global cloud services spend hits US$55.9 billion in Q1 2022
- Wang Kangjin , Jia Tong , Li Ying . Summary of research on job scheduling and resource management technology in off-line mixed Department . Journal of software ,2020,31 (10):3100-3119
- Volcano: On the management platform of off-line operation Department , Realize intelligent resource management and job scheduling
The resource utilization improvement technology mentioned in the article , from Cloud Native SIG、High Performance Network SIG,Kernel SIG, OpenStack SIG and Virt SIG Joint participation , Its source code will be in openEuler Community Open source gradually .
边栏推荐
猜你喜欢

攻防世界WEB练习区(weak_auth、simple_php、xff_referer)

理解加载class到JVM的时机

Recommendation system topic | recommendation system architecture and single domain cross domain recall model

老公,我们现在无家可归了

如何获取步态能量图gei

How to get gait energy map Gei

summernote富文本编辑器
![[management / upgrade] * 02. View the upgrade path * FortiGate firewall](/img/c7/da6db46d372e7462cd14852b662d6d.png)
[management / upgrade] * 02. View the upgrade path * FortiGate firewall

Interpretation of steam education with the deepening of educational reform

Nirvana rebirth! Byte Daniel recommends a large distributed manual, and the Phoenix architecture makes you become a God in fire
随机推荐
Attack and defense world web practice area (weak_auth, simple_php, xff_referer)
Symbol类型
CMT 注册——Google Scholar Id,Semantic Scholar Id,和 DBLP Id
summernote支持自定义视频上传功能
compostion-api(setup中) watch使用细节
The process of solving a bug at work
Summernote font displays Chinese
Liveqing live RTMP on demand video streaming platform how to carry the Sid and token returned by the login interface to call authentication streamtoken video streaming authentication
[C language] file operation
Skywalking distributed system application performance monitoring tool - upper
Symbol type
关于Aries框架增删改查-查Demo
Relational expression greater than > less than < congruence = = = Nan isnan() logical operator double sense exclamation point!! & |% +-- Short circuit calculation assignment expression shortcut operat
c语言小练习
About Aries framework addition, deletion, modification and query - query demo
老公,我们现在无家可归了
Correlation
openEuler 资源利用率提升之道 01:概论
Job hunting and recruitment system of SSM part-time job hunting
攻防世界WEB练习区(view_source、get_post、robots)