Abstract : about Spark For users , With the help of Volcano Batch scheduling provided 、 Fine grained resource management and other functions , Can be more convenient from Hadoop Migrate to Kubernetes, At the same time, the performance of large-scale data analysis business is greatly improved .
2022 year 6 month 16 Japan ,Apache Spark 3.3 Official release , among 《Support Customized Kubernetes Schedulers》 As Spark 3.3 The focus of the version (Highlight) characteristic , Its key capability is to support customization from the framework level Kubernetes Degree meter , And will Volcano As Spark on Kubernetes Default batch Scheduler . This is also Apache Spark Official community support Volcano The first version of . about Spark For users , With the help of Volcano Batch scheduling provided 、 Fine grained resource management and other functions , Can be more convenient from Hadoop Migrate to Kubernetes, At the same time, the performance of large-scale data analysis business is greatly improved .

Huawei takes the lead in launching , Mainstream manufacturers collaborate
This feature is initiated by Huawei , From Huawei 、Apple、Cloudera、Netflix、Databricks Wait for the company's developers to work together to complete . By means of Apache Spark Support custom scheduling capability , Allow users to plug-in and use various third-party custom schedules .
Spark + Volcano: More perfect dispatching capability
Spark Our resource management platform is moving towards Kubernetes evolution ,Apache Spark Under the existing structure ,Job Single owner of 、 Separate scheduling of multiple slave nodes , Led to Spark driver Node resource deadlock , Especially when resources are tight , Such problems often occur . meanwhile , Because of the original Kubernetes The scheduling capability of is limited , It can't be done Job Granularity such as queue scheduling 、 Fair dispatch 、 Resource reservation and other functions .
Volcano As CNCF The first cloud native batch computing in the community , On 2019 year 6 Month in Shanghai KubeCon Official open source , And in 2020 year 4 Month be CNCF The official project .2022 year 4 month ,Volcano Officially promoted to CNCF Incubation projects .Volcano Since the open source of the community , In artificial intelligence 、 big data 、 Gene sequencing 、 transcoding 、 Rendering and other massive data computing and analysis scenarios have been quickly applied , And build a perfect upstream and downstream ecology , Tencent now 、 Iqiyi 、 The little red book 、 Mushroom street 、 Vipshop 、 Pengcheng Laboratory 、 Ruitian investment and other enterprises will Volcano Applied to the production environment .
Spark Official support Volcano It will further accelerate the migration of big data platforms to Kubernetes The process of , help Spark Users should deal with the following common batch scheduling scenarios .
Common scheduling scenarios :
Job level fair scheduling (Job-based Fair-share)
When running multiple elastic jobs ( Such as streaming media analysis ) when , You need to allocate resources fairly to each job , When multiple jobs compete for additional resources SLA/QoS requirement . In the worst case , A single job may start a large number of pod Low resource utilization , This prevents other jobs from running due to insufficient resources . To avoid a small distribution ( for example , Start one for each job Pod),Volcano Allow flexible jobs to define what should be started Pod The minimum available quantity of . Any amount that exceeds the specified minimum available quantity pod Will share cluster resources with other jobs fairly .
queue (Queue)
Queues are also widely used to share resources between elastic and batch workloads . The main purpose of the queue is :
- In different “ Tenant ” Or sharing resources between resource pools , For example, map each department to a queue , Realize the weight of multiple departments passing through the queue , Dynamically share cluster resources .
- For different “ Tenant ” Or the resource pool supports different scheduling strategies or algorithms , Such as FIFO、Fairness、Priority etc.
Queues are implemented as cluster wide CRD, and namespace Achieve decoupling . This allows to be different namespace Jobs created in are placed in a shared queue . Queues also provide min and max,min Is the minimum guaranteed resource of the queue , Whenever an urgent task is put forward in the queue, it is guaranteed that min Resources available ,max Is the upper limit of queue resource usage .min and max If the resources between are idle , Allow tasks to be shared with other queues to improve overall resource utilization .
User oriented , Fair scheduling across queues (Namespace-based fair-share Cross Queue)
In the queue , Each job has almost equal scheduling opportunities during the scheduling cycle , This means that users with more jobs have a greater chance to schedule their jobs , It's unfair to other users . for example , There is a queue that contains a small amount of resources , Yes 10 individual pod Belong to UserA,1000 individual pod Belong to UserB. under these circumstances ,UserA Of pod The probability of being bound to a node is small .
To balance resource usage between users in the same queue , A more granular strategy is needed . in consideration of Kubernetes Multi user model in , Use a namespace to distinguish different users , Each namespace will be configured with a weight , As a means of controlling the priority of its resource usage .
preemption (Preemption & Reclaim)
Support the lending model through fair sharing , Some homework / Queues overuse resources when they are idle . however , If there is any further request for resources , resources “ owner ” take “ Take back ”. Resources can be shared between queues or jobs : Recycling is used for resource balancing between queues , Preemption is used to balance resources between jobs .
Minimum resource reservation (minimal resource reservation)
When running a job with multiple task roles ( Such as Spark) when ,Spark driver pod Will first create and run , Then request Kube-apiserver establish Spark executor pod, In resource constrained or highly concurrent scenarios , Often, a large number of job submissions result in all available resources being Spark driver pod Run out of ,Spark executor Unable to get resource , In the end, all Spark The job doesn't work properly . To solve this problem , The user is Spark driver pod and executor pod Create proprietary nodes for static partitioning , And this brings about resource fragmentation 、 Low utilization .Volcano Provided minimal resource reservation Allow for each Spark Job reserved resources , prevent Spark executor Deadlock caused by failure to obtain resources , Compared with static partition , Performance improvement 30%+.
Reservation and backfilling (Reservation & Backfill)
When someone asks for a lot of resources “ huge ” The assignment is submitted to Kubernetes when , When there are a lot of small jobs in the pipeline , The job could starve to death , And finally according to the current scheduling strategy / The algorithm is killed . To avoid hunger , Resources should be reserved conditionally for jobs , For example, overtime . When resources are reserved , They may be idle and unused . In order to improve the utilization of resources , The scheduler will conditionally “ smaller ” The operation is backfilled into those reserved resources . Retention and backfill are triggered by feedback from the plug-in :Volcano Several callback interfaces are provided , For developers or users to decide which jobs should be filled or retained .
The future development
With the increasingly rich scenes ,Volcano New algorithms are also being added , meanwhile , The corresponding interface is also constantly improving , It is convenient for users to expand and customize the corresponding algorithm . On the other hand , The community is also continuously expanding its technology landscape to support new scenarios, such as cross cloud and cross cluster scheduling 、 Mixing part 、FinOps、 Intelligent flexible scheduling 、 Fine grained resource management, etc .
In the near future, we will have a discussion on Spark 3.3 in Volcano The batch scheduling capability brought by the system is explained in detail , Coming soon . add to Volcano Little helper k8s2222, Get into Volcano Community communication group , The big guy is on the side , Share regularly .
Spark 3.3 release notes:https://spark.apache.org/releases/spark-release-3-3-0.html
Volcano Official website :https://volcano.sh/zh/docs/
Github :https://github.com/volcano-sh/volcano
Click to follow , The first time to learn about Huawei's new cloud technology ~