Abstract : about Spark For users , With the help of Volcano Batch scheduling provided 、 Fine grained resource management and other functions , Can be more convenient from Hadoop Migrate to Kubernetes, At the same time, the performance of large-scale data analysis business is greatly improved .

2022 year 6 month 16 Japan ,Apache Spark 3.3 Official release , among 《Support Customized Kubernetes Schedulers》 As Spark 3.3 The focus of the version (Highlight) characteristic , Its key capability is to support customization from the framework level Kubernetes Degree meter , And will Volcano As Spark on Kubernetes Default batch Scheduler . This is also Apache Spark Official community support Volcano The first version of . about Spark For users , With the help of Volcano Batch scheduling provided 、 Fine grained resource management and other functions , Can be more convenient from Hadoop Migrate to Kubernetes, At the same time, the performance of large-scale data analysis business is greatly improved .

Huawei takes the lead in launching , Mainstream manufacturers collaborate

This feature is initiated by Huawei , From Huawei 、Apple、Cloudera、Netflix、Databricks Wait for the company's developers to work together to complete . By means of Apache Spark Support custom scheduling capability , Allow users to plug-in and use various third-party custom schedules .

Spark + Volcano: More perfect dispatching capability

Spark Our resource management platform is moving towards Kubernetes evolution ,Apache Spark Under the existing structure ,Job Single owner of 、 Separate scheduling of multiple slave nodes , Led to Spark driver Node resource deadlock , Especially when resources are tight , Such problems often occur . meanwhile , Because of the original Kubernetes The scheduling capability of is limited , It can't be done Job Granularity such as queue scheduling 、 Fair dispatch 、 Resource reservation and other functions .

Volcano As CNCF The first cloud native batch computing in the community , On 2019 year 6 Month in Shanghai KubeCon Official open source , And in 2020 year 4 Month be CNCF The official project .2022 year 4 month ,Volcano Officially promoted to CNCF Incubation projects .Volcano Since the open source of the community , In artificial intelligence 、 big data 、 Gene sequencing 、 transcoding 、 Rendering and other massive data computing and analysis scenarios have been quickly applied , And build a perfect upstream and downstream ecology , Tencent now 、 Iqiyi 、 The little red book 、 Mushroom street 、 Vipshop 、 Pengcheng Laboratory 、 Ruitian investment and other enterprises will Volcano Applied to the production environment .

Spark Official support Volcano It will further accelerate the migration of big data platforms to Kubernetes The process of , help Spark Users should deal with the following common batch scheduling scenarios .

Common scheduling scenarios :

Job level fair scheduling (Job-based Fair-share)

When running multiple elastic jobs ( Such as streaming media analysis ) when , You need to allocate resources fairly to each job , When multiple jobs compete for additional resources SLA/QoS requirement . In the worst case , A single job may start a large number of pod Low resource utilization , This prevents other jobs from running due to insufficient resources . To avoid a small distribution ( for example , Start one for each job Pod),Volcano Allow flexible jobs to define what should be started Pod The minimum available quantity of . Any amount that exceeds the specified minimum available quantity pod Will share cluster resources with other jobs fairly .

queue (Queue)

Queues are also widely used to share resources between elastic and batch workloads . The main purpose of the queue is :

  • In different “ Tenant ” Or sharing resources between resource pools , For example, map each department to a queue , Realize the weight of multiple departments passing through the queue , Dynamically share cluster resources .
  • For different “ Tenant ” Or the resource pool supports different scheduling strategies or algorithms , Such as FIFO、Fairness、Priority etc.

Queues are implemented as cluster wide CRD, and namespace Achieve decoupling . This allows to be different namespace Jobs created in are placed in a shared queue . Queues also provide min and max,min Is the minimum guaranteed resource of the queue , Whenever an urgent task is put forward in the queue, it is guaranteed that min Resources available ,max Is the upper limit of queue resource usage .min and max If the resources between are idle , Allow tasks to be shared with other queues to improve overall resource utilization .

User oriented , Fair scheduling across queues (Namespace-based fair-share Cross Queue)

In the queue , Each job has almost equal scheduling opportunities during the scheduling cycle , This means that users with more jobs have a greater chance to schedule their jobs , It's unfair to other users . for example , There is a queue that contains a small amount of resources , Yes 10 individual pod Belong to UserA,1000 individual pod Belong to UserB. under these circumstances ,UserA Of pod The probability of being bound to a node is small .

To balance resource usage between users in the same queue , A more granular strategy is needed . in consideration of Kubernetes Multi user model in , Use a namespace to distinguish different users , Each namespace will be configured with a weight , As a means of controlling the priority of its resource usage .

preemption (Preemption & Reclaim)

Support the lending model through fair sharing , Some homework / Queues overuse resources when they are idle . however , If there is any further request for resources , resources “ owner ” take “ Take back ”. Resources can be shared between queues or jobs : Recycling is used for resource balancing between queues , Preemption is used to balance resources between jobs .

Minimum resource reservation (minimal resource reservation)

When running a job with multiple task roles ( Such as Spark) when ,Spark driver pod Will first create and run , Then request Kube-apiserver establish Spark executor pod, In resource constrained or highly concurrent scenarios , Often, a large number of job submissions result in all available resources being Spark driver pod Run out of ,Spark executor Unable to get resource , In the end, all Spark The job doesn't work properly . To solve this problem , The user is Spark driver pod and executor pod Create proprietary nodes for static partitioning , And this brings about resource fragmentation 、 Low utilization .Volcano Provided minimal resource reservation Allow for each Spark Job reserved resources , prevent Spark executor Deadlock caused by failure to obtain resources , Compared with static partition , Performance improvement 30%+.

Reservation and backfilling (Reservation & Backfill)

When someone asks for a lot of resources “ huge ” The assignment is submitted to Kubernetes when , When there are a lot of small jobs in the pipeline , The job could starve to death , And finally according to the current scheduling strategy / The algorithm is killed . To avoid hunger , Resources should be reserved conditionally for jobs , For example, overtime . When resources are reserved , They may be idle and unused . In order to improve the utilization of resources , The scheduler will conditionally “ smaller ” The operation is backfilled into those reserved resources . Retention and backfill are triggered by feedback from the plug-in :Volcano Several callback interfaces are provided , For developers or users to decide which jobs should be filled or retained .

The future development

With the increasingly rich scenes ,Volcano New algorithms are also being added , meanwhile , The corresponding interface is also constantly improving , It is convenient for users to expand and customize the corresponding algorithm . On the other hand , The community is also continuously expanding its technology landscape to support new scenarios, such as cross cloud and cross cluster scheduling 、 Mixing part 、FinOps、 Intelligent flexible scheduling 、 Fine grained resource management, etc .

In the near future, we will have a discussion on Spark 3.3 in Volcano The batch scheduling capability brought by the system is explained in detail , Coming soon . add to Volcano Little helper k8s2222, Get into Volcano Community communication group , The big guy is on the side , Share regularly .

Spark 3.3 release notes:https://spark.apache.org/releases/spark-release-3-3-0.html

Volcano Official website :https://volcano.sh/zh/docs/

Github :https://github.com/volcano-sh/volcano

Click to follow , The first time to learn about Huawei's new cloud technology ~

Volcano become Spark Default batch More articles on scheduler

  1. spark Of task Scheduler (FAIR Fair scheduling algorithm )

    FAIR  The tree structure of the scheduling policy is shown in the figure below : FAIR Scheduling policy memory structure FAIR There's a pattern rootPool And more than one Pool, Each child Pool All to be allocated are stored in TaskSetMagage ...

  2. turn : adjustment Linux I/O The scheduler optimizes system performance

    from :https://www.ibm.com/developerworks/cn/linux/l-lo-io-scheduler-optimize-performance/index.html adjustment ...

  3. MapReduce Scheduler

    1. fifo (FIFO) Scheduler The FIFO scheduler is Hadoop The default scheduler for . As the name implies , This kind of scheduler simply follows “ First come first served basis ” Algorithm to schedule tasks . for example , Homework A And homework B Submitted successively . Then in execution ...

  4. Kubernetes Study 20 Scheduler , Pre selection strategy and optimization function

    One . summary 1.k8s It can run in the cluster pod Resources are actually what we call nodes , Also known as work nodes .master essentially , It actually runs the control plane components of the whole cluster, such as apiserver,scheal,controlm ...

  5. Hadoop The scheduler summary of

    Hadoop The scheduler summary of With MapReduce The popularity of , Its open source implementation Hadoop And it's becoming more and more popular . stay Hadoop In the system , There's a component that's very important , That's the scheduler , Its function is to allocate the idle resources in the system to the operators according to a certain policy ...

  6. JStorm And Storm Source code analysis ( 3、 ... and )--Scheduler, Scheduler

    Scheduler As Storm The scheduler , Responsible for Topology Allocate available resources . Storm Provides IScheduler Interface , Users can customize the interface by implementing it Scheduler.  Its definition is as follows : public ...

  7. Kubernetes Analysis and Reflection on the principle of cluster scheduler

    sketch Cloud environment or computing warehouse level ( Treat the entire data center as a single computing pool ) The cluster management system usually defines the specification of workload , And use the scheduler to place the workload in the appropriate location of the cluster . A good scheduler can make the work processing of the cluster more efficient , At the same time, improve resource utilization ...

  8. RxJS—— Scheduler (Scheduler)

    Scheduler What is a scheduler ? The scheduler is when the subscription starts , Control what notifications push . It consists of three parts . Scheduling is data structure . It knows how to store and queue running tasks in priority or other criteria The scheduler is an execution context . It indicates when and where the task is to be performed ( example ...

  9. kubernetes Mechanism scheduler and controller

    One Understand the scheduler 1.1  How the scheduler puts a pod Scheduling to nodes We all know ,API The server will not actively create pod, Just pull up the system components , These components subscribe to notifications of resource status , Then create the corresponding resources , And be responsible for scheduling po ...

  10. Kubernetes Enhanced scheduler Volcano Algorithm analysis

    [ Abstract ] Volcano Is based on Kubernetes Batch processing system , It comes from Huawei cloud open source .Volcano convenient AI. big data . gene . Rendering and many other industry universal computing framework access , Provide high performance task scheduling engine , High performance ...

Random recommendation

  1. Study Python Of ABC modular ( turn )

    http://yansu.org/2013/06/09/learn-Python-abc-module.html 1.abc Module function Python It doesn't provide abstract classes and interface mechanisms , To implement an abstract class , Can use a ...

  2. sass Experience

    1.sass Installation :(1) Download and install Ruby, Remember to check the second item during installation ,(2) Open the control panel gem install sass( If you have a good character , It can be installed successfully at once )(3) If (2) If the installation is not successful, proceed to the following steps :gem ...

  3. Web phones wap2.0 Web page head Add the following meta tag to the list ......

    Web phones wap2.0 Web page head Add the following meta tag to the list , stay iPhone The page will be displayed in its original size , Scaling is not allowed . <meta name="viewport" conten ...

  4. brew install

    p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #454545 } span.s1 { font: 12. ...

  5. Hibernate Cascade one to many and inverse analysis

    hibernate It can be said that hibernate The most important part , Only in-depth understanding of the characteristics and usage of cascading , It's easy to use . This time we will discuss the one to many situation , So use the user table and blog table of the blog project as examples , Come and learn together hibern ...

  6. POJ3417 LCA+ Trees dp

    http://poj.org/problem?id=3417 The question : Let's start with a rootless tree , And then I'll give you m side , Put this m The edge is connected to , And then you can destroy two sides at a time , One of them is by the tree , One is the new side , How many ways can you break a tree ...

  7. stay uboot Add cmd_run command , Run environment variable

    I'm learning uboot The program will often be burned in the process of , Every time I have to hit some download instructions . Is that a lot of trouble , Is there any way to burn quickly . It's simple , Compile the instructions that need to be tapped into uboot in , Exist in the form of environment variables . But environment variables are easy to add , How to transport ...

  8. Probability graph model (PGM) review -by MIT Dr. Lin Dahua

    Statement : Reprinted from http://www.sigvc.org/bbs/thread-728-1-1.html, Personal feeling is very good PGM Theory review , From a strategic point of view PGM Main branches and development trends of , Special collection here . “ General ...

  9. HashMap resize Code details ( Two )

    About resize The method is as follows : final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = ( ...

  10. Eclipse Running error :Failed to load the JNI shared library Solutions for

    The reason for the above error is that there is a problem with the configuration of environment variables , see JAVA_HOME Whether the value of this environment variable is correct . The operation steps are as follows , 1. Right click “ My computer ”-> attribute ↓ 2. open “ Advanced system setup ”, Here's the picture : ↓ 3. choice “ Environmental Science ...