当前位置:网站首页>Three Scheduling Strategies in yarn

Three Scheduling Strategies in yarn

2022-06-24 03:01:00 reisende

Introduction

I tested three different scheduling strategies on my own machine

FIFO Scheduling strategy

There is nothing to say about this strategy , Submit a task to the queue , The scheduler sorts according to priority and arrival order , Allocate resources to each application at once , Until there are no resources . No one will use this strategy in a production environment , Poor performance .

advantage

Simple , It can be used directly , No additional configuration required . Earlier versions of yarn use FIFO As the default scheduling policy , Newer versions of yarn use Capacity As the default scheduling policy

shortcoming

  1. Apps can starve to death . Large applications occupy a lot of resources after entering the queue , Small applications get stuck without resources
  2. Low priority tasks can starve to death . When there are no resources left in the queue , If high priority tasks keep coming , Low priority tasks are constantly pushed back , Never get resources

Capacity Scheduling strategy

A clever strategy , To enable small tasks to schedule resources , Divided multiple queues , Inside each queue is still FIFO, But because of the division of different resource areas , Therefore, small tasks can be submitted to queues with fewer resources , Submit large tasks to a queue with more resources , But this may also lead to a waste of resources

To configure

The strategy is yarn Default policy , But there is only one if there is no additional configuration default queue , and FIFO Not much difference

You can see , The three sub queues occupy% of the total resources respectively 10%,20%,70%

You can see from the above picture that , I configured three queues , Respectively first,second,third, They have a common root queue root

Elastic resources

The purpose of resource sharing is to solve the problem of resource waste ,capacity Policies allow queues to provide resources to other queues when they are idle , So as to improve resource utilization

I set the configuration of the three queues to 100, It means when I first,second When the queue has no tasks ,third Queues can take up all resources , However, there will also be a problem, that is, when a queue occupies a large amount of elastic resources, other queues will not be able to grab back resources when new tasks come in

test

I'll ask third The queue submitted four tasks , You can find , The queue utilization has reached 125%, This shows that he has a part that should belong to first and second The resources of the queue

Then continue to first The queue submits a task , It takes up all the resources

Go on to second The queue submits a task , The result is that it can't get resources , Unable to run

Priority settings

yarn Allow us to application set priority , Applications with higher priority are scheduled earlier . Only applicable to FIFO Scheduling algorithm of

In actual use, there are two pits , Will cause the priority setting to be invalid

The first is the need to yarn-site.xml It is specified in yarn.cluster.max-application-priority Parameters , This parameter limits one application What is the highest priority . stay yarn-default.xml We can find that this value defaults to 0, In other words, if I don't change it, no matter what priority I submit 1,2,4 or 8 Will eventually be changed to 0, Speechless dead

The second is yarn.scheduler.capacity.root.<leaf-queue-path>.default-application-priority Parameters , Specify the default priority for a queue , There seems to be no problem without setting this

It is important to read the documents carefully

test

I submitted four first app, Priority is 3, And occupy all the resources .

At this time, I continue to submit a priority of 1 Of app, In a state of Accept, Waiting for resources to be allocated

I continued to submit a priority of 5 Of app and 6 Of app, All in Accept state

Use yarn application -kill command kill Drop the first submitted app

Observations , It can be found that the priority is 5 and 6 Of app It's all in Running state , And the priority is 1 Of app Still waiting for resource allocation

Lifecycle settings

yarn.scheduler.capacity.<queue-path>.maximum-application-lifetime Parameter can set the maximum life cycle of an application in a queue

yarn.scheduler.capacity.root.<queue-path>.default-application-lifetime Parameter can set the default lifecycle

The simple understanding is that when it expires, it will be kill

User resource limit

advantage

  1. Prevent apps from starving , Achieve the goal of rational resource allocation through queue division

shortcoming

  1. Troublesome queue configuration , Parent queue , Sub queues , Each queue also has its own lifecycle , priority , Maximum resources , Minimum resources , Maximum user resources , User minimum resource allocation , No trouble
  2. The starvation of low priority tasks in the same queue still exists , For multiple users , This problem can be solved by limiting the upper limit of resources occupied by a single user in the queue , But in the case of a single user , There is no better way
  3. The problem of resource occupation between queues , This is actually a matter of trade-offs , If you want to improve the overall resource utilization , You can increase the elastic resources , If you want to ensure fairness between queues , Just turn down the elastic resources

Fair Scheduling strategy

To configure

Need to configure yarn-site.xml Specify the scheduler type

Then, create the in the configuration directory fair-scheduler.xml The configuration file

I specified four queues , Weights are 1, So in theory, it is fair for them to share resources equally

You can see steady fair share All are 25%

When I first After the queue submits two tasks , You can see that the dotted line is first Occupied dynamic resources

原网站

版权声明
本文为[reisende]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/10/20211020204737485j.html