Node Features and Hardware Plugins

Relevant source files

The kube-system namespace hosts the foundational services required to bridge physical hardware capabilities to the Kubernetes control plane. This section details how the cluster detects hardware features (GPU, CPU instructions), exposes them to specialized workloads like Plex/Jellyfin via device plugins, optimizes image delivery through P2P caching, and maintains system health via metrics and automated pod refreshes.

Hardware Detection and Labeling

The cluster utilizes Node Feature Discovery (NFD) to automatically detect hardware capabilities on each node and apply corresponding Kubernetes labels. This is critical for scheduling workloads that require specific hardware, such as Intel QuickSync for media transcoding.

Node Feature Discovery (NFD)

NFD is deployed via a HelmRelease using the node-feature-discovery OCI repository kubernetes/apps/kube-system/node-feature-discovery/app/helmrelease.yaml2-24 The configuration includes:

Intel Device Plugins

To expose Intel iGPU hardware to containers, the cluster employs a two-tier operator pattern:

  1. Intel Device Plugin Operator: Manages the lifecycle of specialized device plugins kubernetes/apps/kube-system/intel-device-plugin/app/helmrelease.yaml18-21 It is configured to specifically handle gpu devices kubernetes/apps/kube-system/intel-device-plugin/app/helmrelease.yaml38-39
  2. GPU Device Plugin: A specific instance managed by the operator that handles the actual allocation of /dev/dri resources. It is configured with sharedDevNum: 5kubernetes/apps/kube-system/intel-device-plugin/gpu/helmrelease.yaml38 allowing up to five concurrent pods to share the same physical iGPU for hardware-accelerated transcoding (QuickSync).

Hardware Feature Data Flow

The following diagram illustrates how physical hardware attributes are transformed into schedulable Kubernetes resources.

Title: Hardware Feature Discovery and Resource Exposure

[Flowchart Diagram]

Sources:kubernetes/apps/kube-system/node-feature-discovery/app/helmrelease.yaml36-47kubernetes/apps/kube-system/intel-device-plugin/app/helmrelease.yaml38-39kubernetes/apps/kube-system/intel-device-plugin/gpu/helmrelease.yaml35-43


Image Delivery and P2P Caching

To reduce external bandwidth and speed up pod startup times, the cluster uses Spegel, a stateless peer-to-peer (P2P) container image advertisement and proxy mechanism.

Spegel Implementation

Spegel enables nodes to share container image layers with each other directly.

Sources:kubernetes/apps/kube-system/spegel/app/helmrelease.yaml18-28bootstrap/helmfile.d/01-apps.yaml51-57


Configuration Lifecycle Management

Reloader by Stakater is used to solve the problem of pods not picking up changes to ConfigMaps or Secrets without a manual rollout.

Reloader Mechanics

Reloader watches for changes to resources and performs a rolling upgrade on associated Deployments, StatefulSets, or DaemonSets.

Sources:kubernetes/apps/kube-system/reloader/app/helmrelease.yaml1-41kubernetes/apps/network/external-dns/app/helmrelease.yaml45-46


Cluster Metrics and Observability

The Metrics Server provides the core metrics pipeline for the cluster, enabling tools like kubectl top and the Horizontal Pod Autoscaler (HPA).

Metrics Server

  • Deployment: Managed via Helm at kubernetes/apps/kube-system/metrics-server/app/.
  • Role: Collects resource metrics from Kubelets and exposes them via the Metrics API for use by the Kubernetes scheduler and other controllers.

Component Integration Map

This diagram maps the logical system components described above to their specific HelmRelease and OCIRepository entities in the codebase.

Title: Node Infrastructure Code Entity Mapping

[Flowchart Diagram]

Sources:kubernetes/apps/kube-system/node-feature-discovery/app/helmrelease.yaml2-24kubernetes/apps/kube-system/intel-device-plugin/gpu/helmrelease.yaml18-26kubernetes/apps/kube-system/spegel/app/helmrelease.yaml2-10kubernetes/apps/kube-system/reloader/app/helmrelease.yaml1-20

Summary Table of Hardware Plugins

ComponentPurposeKey ConfigurationSource File
NFDNode LabelingsleepInterval: 0snode-feature-discovery/app/helmrelease.yaml
Intel OperatorDevice Lifecycle--devices=gpuintel-device-plugin/app/helmrelease.yaml
Intel GPU PluginiGPU AllocationsharedDevNum: 5intel-device-plugin/gpu/helmrelease.yaml
SpegelP2P Image CachehostPort: 29999spegel/app/helmrelease.yaml
ReloaderConfig WatcherreadOnlyRootFileSystem: truereloader/app/helmrelease.yaml

Sources:kubernetes/apps/kube-system/node-feature-discovery/app/helmrelease.yaml39kubernetes/apps/kube-system/intel-device-plugin/app/helmrelease.yaml39kubernetes/apps/kube-system/intel-device-plugin/gpu/helmrelease.yaml38kubernetes/apps/kube-system/spegel/app/helmrelease.yaml23kubernetes/apps/kube-system/reloader/app/helmrelease.yaml34