当前位置：网站首页>Kubernetes GPU's Dilemma and failure

Kubernetes GPU's Dilemma and failure

2022-07-24 15:37:00 【InfoQ】

kubernetes GPU The plight and ruin of

With the rapid development of artificial intelligence and machine learning technology , stay Kubernetes Run model training on 、 The demand for image processing programs is increasing , And the foundation of realizing such requirements , Namely Kubernetes Yes GPU And other hardware acceleration devices .

kubernetes Dispatch GPU- Use article

Kubernetes Support for on node AMD and NVIDIA GPU （ Graphic processing unit ） Conduct management , Currently in

experiment

state .

stay GPU Support on , The most basic appeal is actually very simple ： I just need to be in Pod Of YAML Inside , Declare what a container needs GPU Number , that

Kubernetes The corresponding... Should appear in the container created for me GPU equipment , And its corresponding driver Directory .

With NVIDIA Of GPU For example, equipment , The above requirements mean that when the user's container is created , The following two parts of equipment and directory must appear in this container ：

GPU equipment , such as
/dev/nvidia0
;
GPU Drive directory , such as
/usr/local/nvidia/*
.

among ,

GPU Device path , It is when the container starts Devices Parameters ; And drive directory , When the container is started Volume Parameters .

therefore , stay Kubernetes Of GPU Supported implementations ,**kubelet In fact, the above two parts , It is set in the... Where the container is created CRI （Container Runtime Interface） In the parameters .** such , Wait until the container starts , The corresponding container will appear GPU Device and drive path .

however ,**Kubernetes stay Pod Of API In the object , Not for GPU Set a resource type field , Instead, a method called Extended Resource（ER） Is used to transfer GPU Information about .** Here's an example ：

apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: &quot;k8s.gcr.io/cuda-vector-add:v0.1&quot;
resources:
limits:
 nvidia.com/gpu: 1

You can see , In the above Pod Of limits In the field ,

The name of this resource is

nvidia.com/gpu

, Its value is 1. in other words , This Pod Declare that you want to use a NVIDIA Type of GPU.

stay Kubernetes Of GPU In the support scheme , you

There is no need to really do the above about Extended Resource These operations . stay Kubernetes in , The function of managing all hardware acceleration devices , It is made up of a kind called Device Plugin The plug-in of

. Among them , Of course, it also includes the hardware Extended Resource The logic of reporting .

Use Device Plugins

Kubernetes Realized

Device plug-ins （Device Plugins）

With permission Pod The visit is similar to GPU This kind of special hardware features .

As a Cluster Administrator , You need to install on the node from the corresponding hardware manufacturer GPU The driver , And run from GPU The corresponding device plug-in of the manufacturer .

NVIDIA

When the above conditions are met ,Kubernetes Will be exposed

amd.com/gpu

nvidia.com/gpu

by Schedulable resources .

You can ask

<vendor>.com/gpu

Resources to use GPU equipment , Just like you for CPU As memory does . however , Use GPU when , There are still some restrictions on how to specify resource requirements ：

GPUs
It can only be set in
limits
part
, It means ：

You can specify GPU Of
limits
Without specifying its
requests
,Kubernetes Will use restrictions Value as the default request value ;

You can also specify
limits
and
requests
, However, these two values must be equal .

You can't just specify
requests
Not specified
limits
.

Containers （ as well as Pod） Is not shared GPU Of
.GPU Nor should it be over allocated （Overcommitting）.

Each container can request one or more GPU, however
Request parts with decimal values GPU It's not allowed
.

and CPU The difference in resources is , There are many types of hardware acceleration devices , for instance GPUs、NICs、FPGAs, And they have more than one manufacturer ,Kubernetes It is unrealistic to support one by one , therefore Kubernetes Take these hardware acceleration devices as
Expand resources
To deal with it .
Kubernetes stay Pod Of API There is no such thing as CPU That kind of resource , It USES
Expand resources
Resource field GPU Information .

apiVersion: v1
kind: Pod
metadata:
 name: cuda-vector-add
spec:
 restartPolicy: OnFailure
 containers:
 - name: cuda-vector-add
 # https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
 image: &quot;k8s.gcr.io/cuda-vector-add:v0.1&quot;
 resources:
 limits:
 nvidia.com/gpu: 1 # requesting 1 GPU

To use the above yaml Document declaration use GPU equipment , Well, first of all Node Install on node

Device plug-ins Device Plugin

The device plug-in is bound with the device manufacturer , Different manufacturers provide different Device Plugin.

Deploy AMD GPU Device plug-ins

Official AMD GPU Device plug-ins

There are the following requirements ：

Kubernetes Nodes must be pre installed AMD GPU Of Linux drive .

If your cluster has been started and meets the above requirements , It can be deployed in this way AMD Device plug-ins ：

kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/r1.10/k8s-ds-amdgpu-dp.yaml

You can go to

RadeonOpenCompute/k8s-device-plugin

The project reports problems with this device plug-in .

Deploy NVIDIA GPU Device plug-ins

Official NVIDIA GPU Device plug-ins

There are the following requirements :

Kubernetes Nodes of must be pre
Installed NVIDIA drive

Kubernetes Nodes of must be pre installed
nvidia-docker 2.0

Docker Of
Default runtime
Must be set to
nvidia-container-runtime
, instead of runc

NVIDIA Driver version ~= 384.81

Kubernetes edition >= 1.10

Get ready GPU node

Specific reference https://github.com/NVIDIA/k8s-device-plugin

Need at all GPU Perform the following steps on the node .

Please note that , You need to install

nvidia-docker2

Software packages, not

nvidia-container-toolkit

. This is because of the new

--gpus

The options haven't arrived yet Kubernetes. Example ：

# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-docker2
$ sudo systemctl restart docker

You need to enable nvidia The runtime is the default runtime on the node . We will edit docker daemon The configuration file , The file is usually located in

/etc/docker/daemon.json

：

{
 &quot;default-runtime&quot;: &quot;nvidia&quot;,
 &quot;runtimes&quot;: {
 &quot;nvidia&quot;: {
 &quot;path&quot;: &quot;/usr/bin/nvidia-container-runtime&quot;,
 &quot;runtimeArgs&quot;: []
 }
 }
}

If
runtimes
non-existent , Please go to
nvidia-docker Installation page of

stay Kubernetes Enable GPU Support

All of the GPU After configuring the above options on the node , You can enable GPU Support ：

$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.11.0/nvidia-device-plugin.yml

** Be careful ：** This is a simple static daemon set , Designed to demonstrate

nvidia-device-plugin

When deploying plug-ins in a production environment , Please use
helm

There are different types of GPU

If there are different types of NVIDIA GPU, Then you can use

Node labels and node selectors

to pod Schedule to the appropriate node .

for example ：

#  Label your nodes with the type of accelerator they have 
kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80
kubectl label nodes <node-with-p100> accelerator=nvidia-tesla-p100

Reference resources ：https://kubernetes.io/zh/docs/tasks/manage-gpus/scheduling-gpus/

Not Device Plugin plug-in unit

Many companies use , Not in YAML The document specifies GPU The number of , Not at all. Kubernetes Install in cluster Device Plugin plug-in unit , Because their program is based on DaemonSet How to run , And there is only one piece on each machine GPU, This is equivalent to a program monopolizing one GPU, As for putting GPU The device and driver are loaded into Docker In container , It can be done by YAML The document specifies

NVIDIA_DRIVER_CAPABILITIES

Environment variables ：

#  Reference resources ：https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html
containers:
- env:
 - name: NVIDIA_DRIVER_CAPABILITIES
 value: compute,utility,video

kubernetes Dispatch GPU- Principles

Kubernetes It is managed by the mechanism of plug-in extension GPU Resources , Specifically, there are two independent internal mechanisms .

The first is
Extend Resources
, Allow users to customize resource names . And the
The measurement of resources is at the integer level
, The goal is to support different heterogeneous devices through a common pattern , Include RDMA、FPGA、AMD GPU wait , Not just for Nvidia GPU The design of the ;

Device Plugin Framework
Allow third-party equipment providers to be external
Manage the whole life cycle of equipment
, and Device Plugin Framework establish Kubernetes and Device Plugin The bridge between modules . On the one hand, it is responsible for reporting equipment information to Kubernetes, On the other hand, it is responsible for the scheduling and selection of equipment .

Extended Resource The report of

Extend Resources Belong to Node-level Of api, Completely independent of Device Plugin Use . and

Report Extend Resources, Just pass one PACTH API Yes Node Object to carry out status Part of the update , And this PACTH The operation can be done through a simple curl Order to complete

. such , stay Kubernetes The scheduler can record the GPU type , The number of resources it corresponds to is 1.

This PATCH operation , It's easy to use curl Order to initiate , As shown below ：

#  start-up  Kubernetes  The client of  proxy, So you can use it directly  curl  Follow me  Kubernetes  Of  API Server  It's interactive 
$ kubectl proxy

#  perform  PACTH  operation 
$ curl --header &quot;Content-Type: application/json-patch+json&quot; \
--request PATCH \
--data '[{&quot;op&quot;: &quot;add&quot;, &quot;path&quot;: &quot;/status/capacity/nvidia.com/gpu&quot;, &quot;value&quot;: &quot;1&quot;}]' \
http://localhost:8001/api/v1/nodes/<your-node-name>/status

PATCH After the operation is completed , You can see Node Of Status It becomes the content shown below ：

apiVersion: v1
kind: Node
...
Status:
 Capacity:
 cpu: 2
 memory: 2049008Ki
 nvidia.com/gpu: 1

So in the scheduler , It can record in the cache node-1 Upper

nvidia.com/gpu

The number of resources of type is 1.

Of course, if you use Device Plugin, There's no need to do this PACTH operation , Just follow

Device Plugin Programming model

, In the work of equipment reporting Device Plugin It's going to be done .

Device Plugin Working mechanism

Introduce to you Device Plugin How it works , Whole Device Plugin The workflow can be divided into two parts ：

One is to report the resources at the start-up time ;

The other is the scheduling and operation of user's use time .

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-bdUufJwW-1653290237483)(https://cdn.jsdelivr.net/gh/Fly0905/[email protected]/imag/202205222220941.png)]

Kubernetes Of Device Plugin Mechanism , I can explain it to you with a schematic diagram as shown below .

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-fiXh9WMI-1653290237484)(https://cdn.jsdelivr.net/gh/Fly0905/[email protected]/imag/202205230635508.png)]

Device Plugin The development of is very simple . It mainly includes the two most concerned and core event methods ：

among
ListAndWatch
Reporting of corresponding resources , At the same time, it also provides the mechanism of health examination . When the equipment is not healthy , It can be reported to Kubernetes Unhealthy equipment ID, Give Way Device Plugin Framework Remove this device from the schedulable device ;

and
Allocate
Will be Device Plugin Call... When deploying the container , The core of the parameter passed in is the device that the container will use ID, The parameter returned is when the container starts , Equipment needed 、 Data volumes and environment variables .

Resource reporting and monitoring

For every hardware device , All need its corresponding Device Plugin Conduct management , these Device Plugin Through... As a client GRPC In the right way kubelet Medium Device Plugin Manager Connect , And listen to it Unix socket api Version number and device name of, for example GPU, Report to kubelet.

Let's take a look Device Plugin The whole process of resource escalation . in general , The whole process is divided into four steps , The first three steps are all on nodes , The fourth step is kubelet and api-server Interaction .

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-1gYzDHR6-1653290237485)(https://cdn.jsdelivr.net/gh/Fly0905/[email protected]/imag/202205230642141.png)]

The first step is to
Device Plugin Registration of
, need Kubernetes Know which one to follow Device Plugin Interact . This is because there may be multiple devices on a node , need Device Plugin As a client to Kubelet Report three things .

** Who am I ？** Namely Device Plugin Name of the managed device , yes GPU still RDMA;

** Where am i? ？** It's the plug-in that monitors itself unix socket File location , Give Way kubelet Be able to call yourself ;

Interaction protocol
, namely API Version number of .

The second step is
Service startup
,
Device Plugin Will start a GRPC Of server
. After that Device Plugin Has been providing services as this server kubelet To visit , And monitor addresses and provide API The version of is already finished in the first step ;

The third step , When it's time to GRPC server After starting ,
kubelet Will build one to Device Plugin Of ListAndWatch Long connection of , Used to find equipment ID And the health of the device
. When Device Plugin When an unhealthy device is detected , Will take the initiative to inform kubelet. At this time, if the device is idle ,kubelet It will be removed from the assignable list . But when this device has been Pod When used ,kubelet Would not do anything , If you kill this Pod It's a very dangerous operation .

Step four ,kubelet Will expose these devices to Node The state of the node , Send the number of devices to Kubernetes Of api-server in . The subsequent scheduler can schedule according to this information .

** It should be noted that

kubelet In the api-server When reporting , I will only report that GPU Corresponding quantity

and kubelet Self Device Plugin Manager On this GPU Of ID List to save , And used for specific equipment allocation . And this

about Kubernetes For the global scheduler , It doesn't grasp this GPU Of ID list , It only knows GPU The number of .** This means that in the existing Device Plugin Under the working mechanism ,Kubernetes The global scheduler can't do more complex scheduling .

For example, I want to do ** Two GPU Affinity scheduling , Two of the same node GPU May need to pass

NVLINK

Communication, not

PCIe

Communications , In order to achieve better data transmission effect .** In this demand , current Device Plugin It can't be realized in the scheduling mechanism .

Pod The process of scheduling and running

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-WNs9XQFy-1653290237486)(https://cdn.jsdelivr.net/gh/Fly0905/[email protected]/imag/202205230642572.png)]

First step ：Pod Want to use a GPU When , It just needs to be like the example before , stay Pod Of Resource Next limits Field GPU Resources and corresponding quantities ( such as nvidia.com/gpu: 1).Kubernetes We will find the nodes that meet the quantity condition , And then I'll put the GPU Quantity reduction 1, And complete Pod And Node The binding of .

The second step ： After the binding is successful , Naturally, it will be the corresponding node kubelet To create a container . And when kubelet Found this Pod The resource requested by the container of is a GPU When ,**kubelet I will entrust my own internal Device Plugin Manager modular , From what you own GPU Of ID Select one of the available GPU Assign to the container .** here
kubelet It'll go to the local Device Plugin To launch a Allocate request , Parameters carried by this request , It's the device to be assigned to the container ID list
.

The third step ：
Device Plugin received AllocateRequest After the request
, It will be based on kubelet The equipment that came in ID, Go to
Look for this device ID Corresponding device path 、 Drive directories and environment variables
, And take AllocateResponse In the form of kubelet.

Step four ：
AllocateResponse Device path and drive directory information carried in
, Once returned to kubelet after ,kubelet The container is assigned based on this information GPU The operation of , such
Docker Will be based on kubelet To create a container
, And there will be GPU equipment . And mount the driver directory it needs , thus Kubernetes by Pod Allocate one GPU The process is over .

Device Plugin Life cycle summary

Device Plugin Startup time
, With grpc In the form of
var/ib/kubelet/device-plugins/kubelet..sock
towards Kubelet Register device id( such as nvidia.com/gpu)

Kubelet The number of equipment will be divided into Node Check in status APIServer in
, The subsequent scheduler will schedule based on this information

Kubelet At the same time
Build one to Device Plugin Of listAndWatch A long connection
, When the plug-in detects that a device is unhealthy , Will take the initiative to inform Kubelet

When the user's application requests GPU After resources ,
Device Plugin according to Kubeleti stay Allocate Request assigned devices id Locate the corresponding device path and driver file

Kubelet according to Device plugin The information provided creates the corresponding container

kubernetes Dispatch GPU- Community strengthening

Device Plugin: Available but not easy to use

We know Kubernetes It can realize the CPU、 Memory 、 The network realizes fine control , But so far (2022.05),

Kubernetes Management has not yet been implemented CPU Manage like that GPU, For example, there are the following restrictions

：

Nvidia Contribution scheduling scheme , Only coarse-grained scheduling is supported , Press GPU Block scheduling .
Nvidia GPU Device Plugin

1. GPU The precision of resource scheduling is insufficient

about GPU Resources can only be set
limit
, It means
requests
It can't be used alone , Or just set
limit
、 Or set both , But the two values must be equal , You can't just set
request
It is not set
limit
.

It is not allowed to request GPU Resource allocation .

2. GPU Insufficient resource sharing ability

pod And between containers , Can't share GPU, And GPU Nor should it be over allocated （ So our online program adopts
daemonSet
Way to run ）.

3. colony GPU Resources lack a global perspective

There is no intuitive way to access the cluster level GPU Information , such as Pod / Container and GPU Card binding relationship 、 Already used GPU Number of cards etc.

existing Kubernetes Yes GPU The allocation and scheduling of resources is through extended resource Realized , It is based on the addition and subtraction of the number of cards on the node . If the user wants to know GPU Card allocation , You need to traverse the nodes , Get and calculate this information . And because this resource is scalar , So I can't get Pod / Containers Binding relationship with card . These problems are not so prominent in the whole card mode , But in fine-grained sharing mode , Especially serious .

The problem lies in GPU Resource scheduling , It's all in fact kubelet The complete .

As a global scheduler, the participation is very limited , As a tradition Kubernetes For the scheduler , It can only deal with GPU Number . Once your device is heterogeneous ,
When you can't simply use numbers to describe requirements
, Such as my Pod Want to run in two with nvlink Of GPU On , This Device Plugin I can 't handle it at all .

4. Can't support many GPU Back end

Various GPU technology （nvidia docker、qGPU、vCUDA、gpu share、GPU Pooling ） All components need to be deployed independently , Unable to uniformly dispatch and manage

Community GPU Share technology practices

because

Resource isolation mainly adopts virtualization technology

, also NVIDIA Two kinds of GPU Virtualization solutions are not open source ,GPU There are relatively few practical materials shared in resource isolation ,

Most of them pay attention to GPU Resource scheduling

Resource isolation mainly adopts the solution of virtualization , at present NVIDIA There are two kinds of GPU Virtualization solutions ：
GRID
: Patterns are more used in virtual machine scenarios , Based on driver , The isolated type will do better , but
Not open source
, Good performance .
MPS
: Apply to container scenarios , Software based approach , Isolation is relatively weak , But also
Not open source
.

1. Ali GPU Sharing practice ()

Contributed by Alibaba cloud service team GPU Shared scheduling scheme , Its purpose is to solve user sharing GPU Scheduling requirements Kubernetes GPU Sharing practice
gpushare-scheduler-extender
gpushare-device-plugin
Design document
Deployment documents

advantage ：

** can
Let more prediction services share the same GPU obstruct
,** Enable users to pass API Describe an application for a shareable resource , And can realize the scheduling of this kind of resources

shortcoming ：

Isolation of this shared resource is not supported

Prerequisite ：

Continue to use Kubernetes Extended Resource Definition , however
The smallest unit of measurement dimension is from 1 individual GPU Card becomes GPU The memory MiB

User applied GPU The upper limit of resources will not exceed one card , That is to say
The upper limit of resources applied is a single card

Realize the idea ：

Depend on Kubernetes Existing working mechanism :

Extended Resource
Definition

Scheduler Extender
Mechanism

Device Plugin
Mechanism

utilize
kubernetes Extended Resource
Mechanism ,
Redefinition GPU resources , Mainly for video memory and GPU Number
The definition of .

utilize
Device Plugin
Mechanism , On the node GPU Report the total resources to kubelet,kubelet Further report to Kubernetes API Server.

utilize
k8s scheduler Extender
Mechanism , Expand scheduler functions , In charge of the global scheduler Filter and Bind When judging a single node GPU Whether the card can provide enough GPU memory , And in Bind The moment will GPU The distribution result of is passed annotation It was recorded that Pod Spec For follow-up Filter Check the allocation results .

Node operation ： When Pod Events bound to nodes are Kubelet When received ,Kubelet Will create a real Pod Entity , In the process , Kubelet Would call GPU Share Device Plugin Of Allocate Method , Allocate The parameters of the method are Pod Applied gpu-mem. And in the Allocate In the method , Will be based on GPU Share Scheduler Extender Corresponding to the scheduling decision of Pod.

Later planning

Integrate Nvidia MPS As an isolation option
from kubeadm The deployment of Kubernetes Automated deployment of clusters
Scheduler extender high availability
Apply to GPU、RDMA And other devices

2. Huawei GPU Sharing practice ()

GPU Share user guides

Volcano Provide scheduling level GPU Resource sharing , Can make multiple Pod Run in the same block GPU obstruct , So as to improve the cluster GPU Overall resource utilization .

advantage ：

At the scheduling level, multiple containers can share one gpu card

At present, it is allowed to fill in memory The size of the video memory . You can fill in one gpu Part of video memory , You can also fill in more than one gpu Memory capacity of the card .

shortcoming ：

In isolation , Originally intended to do video memory and computing power isolation , But I don't know hack cuda api perhaps driver Legal risks will be involved , There's no going on

Direct designation gpu num, Not yet . This device plugin At that time, I did gpu Shared development .

Volcano GPU Shared design

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-ckOZqapw-1653290237486)(https://github.com/volcano-sh/volcano/raw/master/docs/images/gpu-share-flow.png)]

Create a with
volcano.sh/gpu-memory
Resource requested pod,

Volcano And for pod Distribute gpu resources . Add the following comment

annotations:
 volcano.sh/gpu-index: &quot;0&quot;
 volcano.sh/predicate-time: &quot;1593764466550835304&quot;

kubelet Monitor bound to itself pod, And call... Before running the container allocate API Set up env.

env:
 NVIDIA_VISIBLE_DEVICES: &quot;0&quot; # GPU card index
 VOLCANO_GPU_ALLOCATED: &quot;1024&quot; # GPU allocated
 VOLCANO_GPU_TOTAL: &quot;11178&quot; # GPU memory of the card

Volcano adopt Kubernetes The custom extension resource mechanism defines GPU dependent

volcano.sh/gpu-memory

and

volcano.sh/gpu-number

Two resources , among

volcano.sh/gpu-memory

Used to describe nodes GPU Video memory information ;

volcano.sh/gpu-number

Used to describe nodes GPU Number of cards .

Volcano adopt Kubernetes Provided Device plugin Achieve the following functions ：

Collect data on nodes in the cluster
gpu-number
And memory
gpu-memory

monitor GPU A healthy state

Apply in the cluster GPU Of workload mount GPU resources

Users can start from Volcano device plugin for Kubernetes Get how to install 、 Use volcano GPU Details of the plug-in .

GPU virtualization
： Reasoning scenarios and GPU Development scenarios ,GPU The utilization rate is generally low ,Volcano Multi container sharing has been realized GPU, The future will further enhance computing power 、 Isolation capability of video memory , Ensure that while improving utilization , Reduce interference between services ;
Support GPU Node multi-dimensional resource proportion partition
“ Support GPU Node multi-dimensional resource proportion partition ” Is one of the significant features of this version , Mainly used to solve GPU Nodal factor CPU And other dimensional resources are overused GPU Homework hungry but GPU The problem of resource idleness and waste .
This feature is provided by Volcano The contribution of scientific brain in community partners
. In traditional schedulers ,GPU And other scarce resources are allocated with CPU Wait for resources to be discretized , namely CPU Type jobs can be directly assigned to GPU Nodes without considering GPU Operational CPU、 Memory requirements , No resources will be reserved for it . In this feature , Allow users to set a dominant resource （ Usually set to GPU）, And you can configure the reserved proportion of the supporting resource dimension （ Such as GPU:CPU:Memory=1:4:32）. The scheduler will always keep GPU Node GPU、CPU、Memory The proportion of idle resources is not lower than the set value , Therefore, any time that meets the demand of this proportion GPU Jobs can be scheduled to this node , Without causing GPU waste . This method is better than other solutions in the industry , Such as GPU Node assignment independent scheduler 、CPU Type jobs are not allowed to be scheduled to GPU Nodes etc. , It is more conducive to improving node resource utilization , It's also more flexible to use .
For feature design and usage, please refer to ：https://github.com/volcano-sh/volcano/blob/master/docs/design/proportional.md

https://github.com/elastic-ai/elastic-gpu-scheduler

2. tencent GPU Sharing practice ()

https://cloud.tencent.com/document/product/457/65734
https://www.infoq.cn/article/UyujBCyAeFb2o9Ncfr2L

Realize the idea ：

routine GPU Use , Schedule by block

Use Nvidia Virtualization technology provided

Self developed GPU Semi virtualization ： Based on the driver function , Changed the function memory application 、 Release and thread initiator .

summary

From the community 、stackoverflow And the practice of the above companies , at present
GPU Sharing mainly shares GPU Explicit memory of
, Poor isolation of resources , There are situations such as resource preemption , Whether it is necessary to carry out GPU Sharing related development work depends on the company's machine learning GPU Use scenarios to determine ;

GPU There are certain restrictions on sharing ：
The resources applied by users are limited to single card

from
Video memory sharing
From the perspective of ,
single GPU Card sharing is feasible
, The main implementation steps include ：

Extend resource definitions
, Redefinition GPU resources , Mainly for video memory and GPU Definition of quantity

Extension scheduler
, In charge of the global scheduler Filter and Bind When judging a single node GPU Whether the card can provide enough GPU memory