### Multi-Instance GPUs (MIGs) in Kubernetes
#### Overview
This paper is meant to provide a comprehensive overview of supporting Multi-Instance GPUs (MIG) in Kubernetes, highlighting the challenges, design decisions, and implementation strategies.

#### Background
MIG (Multi-Instance GPU) is a feature for Nvidia GPUs that allows partitioning a single GPU into multiple MIG devices. Each MIG device acts as an independent GPU with a dedicated portion of memory and compute resources. This partitioning is done in fixed-size chunks called slices. For example, a next-generation Nvidia GPU might have 8 memory slices and 7 compute slices.
MIG allows for flexible GPU usage by enabling configurations such as:
- One device with 4 memory slices and 3 compute slices
- One device with 2 memory slices and 2 compute slices
- One device with 1 memory slice and 1 compute slice
This configuration allows users to efficiently utilize GPU resources based on their specific needs.
#### High-Level Design Decisions
1. **Static Configuration:** MIG devices are pre-configured on GPUs and are not created dynamically within the Kubernetes stack.
2. **Single GPU and Compute Instance:** Each MIG device consists of a single GPU instance and a single compute instance.
3. **Single MIG Device per Container:** Containers request a single MIG device to meet their resource needs.
4. **Node-Level Strategy Configuration:** Different strategies for exposing MIG devices are configurable at the node level.
#### Supporting MIG on Kubernetes
Two main strategies for exposing MIG devices on Kubernetes nodes are implemented: **single** and **mixed**.
##### Single Strategy
- **Node Configuration:** All GPUs on a node must be of the same type, have MIG enabled, and expose the same type of MIG device.
- **Resource Exposure:** The k8s-device-plugin exposes MIG devices using the traditional `nvidia.com/gpu` resource type. Node labels are applied to indicate the properties of the exposed MIG device.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: gpu-example
spec:
containers:
- name: gpu-example
image: nvidia/cuda:11.0-base
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
nvidia.com/gpu.product: A100-SXM4-40GB MIG 3g.20gb
```
##### Mixed Strategy
- **Node Configuration:** All GPUs on a node must be of the same type, but can be configured with a mix of MIG devices.
- **Resource Exposure:** The k8s-device-plugin exposes non-MIG GPUs with `nvidia.com/gpu` and individual MIG devices with `nvidia.com/mig-<slice_count>g.<memory_size>gb`. Node labels are applied to indicate the properties of each MIG device type.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: gpu-example
spec:
containers:
- name: gpu-example
image: nvidia/cuda:11.0-base
resources:
limits:
nvidia.com/mig-3g.20gb: 1
nodeSelector:
nvidia.com/gpu.product: A100-SXM4-40GB
```
#### Discussion
- **Single Strategy:** Suitable for large deployments where nodes are dedicated to a single type of MIG device.
- **Mixed Strategy:** Suitable for smaller deployments that require flexibility in GPU resource allocation.
### Expanding Kubernetes for Machine Learning
To effectively use Kubernetes for Machine Learning (ML), especially for scaling GPU workloads, consider the following additional strategies and best practices:
#### Dynamic GPU Allocation
Dynamic GPU allocation allows for the automatic scaling of GPU resources based on workload demands. This can be achieved using Kubernetes' Horizontal Pod Autoscaler (HPA) combined with custom metrics for GPU utilization.
```yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: gpu-scaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: gpu-app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: gpu_utilization
target:
type: AverageValue
averageValue: 80%
```
#### GPU Sharing
GPU sharing allows multiple containers to use a fraction of a GPU, which is useful for inference workloads that do not require the full capacity of a GPU.
##### NVIDIA GPU Sharing
NVIDIA’s device plugin for Kubernetes supports GPU sharing, allowing multiple containers to access GPU resources concurrently.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: shared-gpu-pod
spec:
containers:
- name: container1
image: nvidia/cuda:11.0-base
resources:
limits:
nvidia.com/gpu: 0.5
- name: container2
image: nvidia/cuda:11.0-base
resources:
limits:
nvidia.com/gpu: 0.5
```
#### Optimizing for ML Workloads
- **Data Locality:** Ensure data is close to the GPU nodes to minimize latency.
- **Node Affinity and Anti-Affinity:** Use node affinity rules to schedule ML workloads on nodes with GPU capabilities.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: ml-pod
spec:
containers:
- name: ml-container
image: my-ml-image
resources:
limits:
nvidia.com/gpu: 1
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.present
operator: Exists
```
#### Code Snippets for MIG and Kubernetes Integration
**Configuring MIG on Kubernetes Nodes:**
```bash
# Enable MIG mode on the GPU
sudo nvidia-smi -mig 1
# Configure MIG devices (example for A100)
sudo nvidia-smi mig -cgi 19,19,19,19,19,19,19 -C`
```
**Deploying a Pod with MIG resources:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: ml-pod
spec:
containers:
- name: ml-container
image: nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04
resources:
limits:
nvidia.com/mig-2g.10gb: 1
nodeSelector:
nvidia.com/gpu.product: A100-SXM-80GB
```
#### Monitoring and Logging
- **Prometheus and Grafana:** Integrate Prometheus for monitoring GPU metrics and Grafana for visualization.
- **Logging:** Use centralized logging systems like ELK stack (Elasticsearch, Logstash, Kibana) for debugging and performance analysis.
### Conclusion
Supporting MIG in Kubernetes enhances the flexibility and efficiency of GPU resource allocation. Expanding these capabilities for ML workloads involves dynamic GPU allocation, GPU sharing, optimized scheduling, and robust monitoring and logging practices. These strategies ensure scalable and efficient deployment of ML models in a Kubernetes environment.
References: [MIGs in Kubernetes, by Kevin Klues](https://docs.google.com/document/d/1mdgMQ8g7WmaI_XVVRrCvHPFPOMCm5LQD5JefgAh6N8g/edit)