auto

Virt-Handler Pools

Custom worker images for heterogeneous clusters

Lee Yarwood, Software Engineer @ Red Hat

lyarwood@redhat.com
https://github.com/lyarwood

The Problem

KubeVirt
DaemonSet: virt-handler
Node A
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.8.0
VMI
virt-launcher:v1.8.0
Node B
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.8.0
VMI
virt-launcher:v1.8.0
Node C
node-role.kubernetes.io/worker
superduper.io/gpu.product: MegaSlop-9000
DaemonSet
virt-handler:v1.8.0
VMI
virt-launcher:v1.8.0

The Problem: everyone gets the image

KubeVirt
DaemonSet: virt-handler
Node A
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.8.0-megaslop
VMI
virt-launcher:v1.8.0-megaslop
Node B
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.8.0-megaslop
VMI
virt-launcher:v1.8.0-megaslop
Node C
node-role.kubernetes.io/worker
superduper.io/gpu.product: MegaSlop-9000
DaemonSet
virt-handler:v1.8.0-megaslop
VMI
virt-launcher:v1.8.0-megaslop

The Problem: duplicate installations

KubeVirt #1
DaemonSet: virt-handler
Node A
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.8.0
VMI
virt-launcher:v1.8.0
Node B
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.8.0
VMI
virt-launcher:v1.8.0
KubeVirt #2
DaemonSet: virt-handler
Node C
node-role.kubernetes.io/worker
superduper.io/gpu.product: MegaSlop-9000
DaemonSet
virt-handler:v1.8.0-megaslop
VMI
virt-launcher:v1.8.0-megaslop

Intermediate solution: kubevirt-aie-webhook

KubeVirt
DaemonSet: virt-handler
Node A
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.8.0
VMI
virt-launcher:v1.8.0
Node B
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.8.0
VMI
virt-launcher:v1.8.0
Node C
node-role.kubernetes.io/worker
superduper.io/gpu.product: MegaSlop-9000
DaemonSet
virt-handler:v1.8.0
VMI
virt-launcher:v1.8.0-megaslop
kubevirt-aie-webhook

MutatingWebhook
mutates virt-launcher image
based on device selectors

The Solution: a single installation

KubeVirt
DaemonSet: virt-handler
Node A
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.9.0
VMI
virt-launcher:v1.9.0
Node B
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.9.0
VMI
virt-launcher:v1.9.0
DaemonSet: virt-handler-megaslop
Node C
node-role.kubernetes.io/worker
superduper.io/gpu.product: MegaSlop-9000
DaemonSet
virt-handler:v1.9.0-megaslop
VMI
virt-launcher:v1.9.0-megaslop

The Solution: virtHandlerPools

apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
  name: kubevirt
spec:
  configuration:
    developerConfiguration:
      featureGates:
        - VirtHandlerPools
    permittedHostDevices:
      pciHostDevices:
        - pciVendorSelector: "DEAD:BEEF"
          resourceName: superduper.io/MegaSlop_9000
  virtHandlerPools:
    - name: megaslop
      virtHandlerImage: registry.example.com/virt-handler:v1.9.0-megaslop
      virtLauncherImage: registry.example.com/virt-launcher:v1.9.0-megaslop
      nodeSelector:
        superduper.io/gpu.product: MegaSlop-9000
      selector:
        deviceNames:
          - superduper.io/MegaSlop_9000

VMI Matching: virt-controller

VMI submitted by user

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  name: megaslop-vm
spec:
  domain:
    [..]
    devices:
      gpus:
        - name: gpu1
          deviceName: superduper.io/MegaSlop_9000
      [..]
  [..]

virt-launcher Pod rendered by virt-controller

apiVersion: v1
kind: Pod
metadata:
  name: virt-launcher-megaslop-vm
  annotations:
    kubevirt.io/handler-pool: megaslop
    [..]
spec:
  containers:
    - name: compute
      image: registry.example.com/
        virt-launcher:v1.9.0-megaslop
      [..]
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: superduper.io/gpu.product
                operator: In
                values:
                  - MegaSlop-9000

Future: per-pool hypervisor backends

KubeVirt
DaemonSet: virt-handler
Node A
node-role.kubernetes.io/worker
DaemonSet
virt-handler:v1.9.0
VMI (KVM)
virt-launcher:v1.9.0
DaemonSet: virt-handler-megaslop
Node B
superduper.io/gpu.product: MegaSlop-9000
DaemonSet
virt-handler:v1.9.0-megaslop
VMI (KVM)
virt-launcher:v1.9.0-megaslop
DaemonSet: virt-handler-mshv
Node C
kubevirt.io/hypervisor: mshv
DaemonSet
virt-handler:v1.9.0-mshv
VMI (MSHV)
virt-launcher:v1.9.0-mshv

Future: per-pool hypervisor matching

VMI submitted by user

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  name: mshv-vm
spec:
  hypervisor: mshv
  domain:
    resources:
      requests:
        memory: 4Gi
    devices:
      disks:
        - name: rootdisk
          disk:
            bus: virtio
  [..]

virt-launcher Pod rendered by virt-controller

apiVersion: v1
kind: Pod
metadata:
  name: virt-launcher-mshv-vm
  annotations:
    kubevirt.io/handler-pool: mshv
    [..]
spec:
  containers:
    - name: compute
      image: registry.example.com/
        virt-launcher:v1.9.0-mshv
      [..]
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubevirt.io/hypervisor
                operator: In
                values:
                  - mshv

- GPU node runs the same virt-handler and virt-launcher as every other node - No native way to use GPU-optimised images for GPU workloads - Workarounds: separate installs (overhead), bloated images, or external webhooks (kubevirt-aie-webhook)

- One workaround: build a single image containing all specialised components - Every node runs the GPU-optimised images, even nodes without GPUs - Unnecessary image bloat, larger attack surface, slower pulls

- Another workaround: run separate KubeVirtations per node pool - Doubles operational overhead, upgrades, and configuration management - Prevents resource sharing between pools

- The kubevirt-aie-webhook project already exists to solve the virt-launcher image problem - It uses an external MutatingAdmissionWebhook to replace virt-launcher images based on VMI device and label selectors - Only mutates virt-launcher, cannot customise virt-handler - External dependency with its own lifecycle, upgrades, and failure modes - Webhook failures block VMI creation entirely - This VEP proposes folding that capability natively into KubeVirt and extending it to also support per-pool virt-handler customisation

- Single KubeVirtation manages all nodes - Primary virt-handler DaemonSet covers standard nodes with anti-affinity to avoid MegaSlop nodes - virt-handler-megaslop DaemonSet targets only nodes with the MegaSlop-9000 label - No operational overhead, no image bloat, no external webhooks

- New virtHandlerPools field in the KubeVirt CR - Each pool deploys an additional virt-handler DaemonSet - Custom virtHandlerImage and/or virtLauncherImage per pool - VMIs matched to pools transparently via deviceNames or vmLabels - No changes required from VM users

- VM user submits a VMI as normal, no pool-specific configuration needed - virt-controller evaluates VMI against pool selectors - Matches superduper.io/MegaSlop_9000 to megaslop deviceNames - Selects virt-launcher:v1.9.0-megaslop as the launcher image - Merges pool's nodeSelector into virt-launcher pod node affinity - VMI lands on Node C with the correct virt-launcher image

- VEP #97 introduces per-cluster hypervisor configuration via the hypervisor abstraction layer - Virt-handler pools are a natural mechanism for extending this to per-pool hypervisor backends - Example: KVM on some nodes, MSHV on others within the same cluster - A future iteration could add an optional hypervisor field to VirtHandlerPoolConfig - Out of scope for the initial implementation but a clear extension point

- VM user labels their VMI with kubevirt.io/hypervisor: mshv - virt-controller matches via vmLabels selector to the mshv - Selects the MSHV-specific virt-launcher image - Merges pool nodeSelector into pod affinity ensuring it lands on an MSHV node - The VMI runs under the MSHV hypervisor transparently