Node Features and Hardware Plugins
Relevant source files
- bootstrap/helmfile.d/01-apps.yaml
- kubernetes/apps/cert-manager/cert-manager/app/helmrelease.yaml
- kubernetes/apps/external-secrets/external-secrets/app/ocirepository.yaml
- kubernetes/apps/kube-system/intel-device-plugin/app/helmrelease.yaml
- kubernetes/apps/kube-system/intel-device-plugin/app/kustomization.yaml
- kubernetes/apps/kube-system/intel-device-plugin/gpu/helmrelease.yaml
- kubernetes/apps/kube-system/intel-device-plugin/gpu/kustomization.yaml
- kubernetes/apps/kube-system/intel-device-plugin/ks.yaml
- kubernetes/apps/kube-system/metrics-server/app/helmrelease.yaml
- kubernetes/apps/kube-system/node-feature-discovery/app/helmrelease.yaml
- kubernetes/apps/kube-system/node-feature-discovery/app/kustomization.yaml
- kubernetes/apps/kube-system/node-feature-discovery/features/intel-gpu.yaml
- kubernetes/apps/kube-system/node-feature-discovery/features/kustomization.yaml
- kubernetes/apps/kube-system/node-feature-discovery/ks.yaml
- kubernetes/apps/kube-system/reloader/app/helmrelease.yaml
- kubernetes/apps/kube-system/spegel/app/helmrelease.yaml
- kubernetes/apps/kube-system/spegel/app/kustomization.yaml
- kubernetes/apps/kube-system/spegel/ks.yaml
- kubernetes/apps/media/opencloud/app/ocirepository.yaml
- kubernetes/apps/network/external-dns/app/helmrelease.yaml
- kubernetes/components/common/repos/app-template/ocirepository.yaml
- scripts/migrate_to_oci.py
The kube-system namespace hosts the foundational services required to bridge physical hardware capabilities to the Kubernetes control plane. This section details how the cluster detects hardware features (GPU, CPU instructions), exposes them to specialized workloads like Plex/Jellyfin via device plugins, optimizes image delivery through P2P caching, and maintains system health via metrics and automated pod refreshes.
Hardware Detection and Labeling
The cluster utilizes Node Feature Discovery (NFD) to automatically detect hardware capabilities on each node and apply corresponding Kubernetes labels. This is critical for scheduling workloads that require specific hardware, such as Intel QuickSync for media transcoding.
Node Feature Discovery (NFD)
NFD is deployed via a HelmRelease using the node-feature-discovery OCI repository kubernetes/apps/kube-system/node-feature-discovery/app/helmrelease.yaml2-24 The configuration includes:
- Worker Configuration: The worker sleep interval is set to
0sfor continuous discovery kubernetes/apps/kube-system/node-feature-discovery/app/helmrelease.yaml36-39 - Feature Rules: Extended feature rules are applied via a separate Kustomization that pulls Intel-specific device plugin rules directly from the upstream Intel repository kubernetes/apps/kube-system/node-feature-discovery/features/kustomization.yaml4-6
Intel Device Plugins
To expose Intel iGPU hardware to containers, the cluster employs a two-tier operator pattern:
- Intel Device Plugin Operator: Manages the lifecycle of specialized device plugins kubernetes/apps/kube-system/intel-device-plugin/app/helmrelease.yaml18-21 It is configured to specifically handle
gpudevices kubernetes/apps/kube-system/intel-device-plugin/app/helmrelease.yaml38-39 - GPU Device Plugin: A specific instance managed by the operator that handles the actual allocation of
/dev/driresources. It is configured withsharedDevNum: 5kubernetes/apps/kube-system/intel-device-plugin/gpu/helmrelease.yaml38 allowing up to five concurrent pods to share the same physical iGPU for hardware-accelerated transcoding (QuickSync).
Hardware Feature Data Flow
The following diagram illustrates how physical hardware attributes are transformed into schedulable Kubernetes resources.
Title: Hardware Feature Discovery and Resource Exposure
[Flowchart Diagram]
Sources:kubernetes/apps/kube-system/node-feature-discovery/app/helmrelease.yaml36-47kubernetes/apps/kube-system/intel-device-plugin/app/helmrelease.yaml38-39kubernetes/apps/kube-system/intel-device-plugin/gpu/helmrelease.yaml35-43
Image Delivery and P2P Caching
To reduce external bandwidth and speed up pod startup times, the cluster uses Spegel, a stateless peer-to-peer (P2P) container image advertisement and proxy mechanism.
Spegel Implementation
Spegel enables nodes to share container image layers with each other directly.
- Configuration: It communicates with the local container runtime via
/run/containerd/containerd.sockkubernetes/apps/kube-system/spegel/app/helmrelease.yaml27 - Registry Mirroring: It configures containerd to use a local host port (
29999) as a mirror for external registries kubernetes/apps/kube-system/spegel/app/helmrelease.yaml21-23 - Bootstrap Dependency: Spegel is a critical bootstrap component, prioritized in the
helmfileto ensure it is available before heavier workloads likecert-managerare deployed bootstrap/helmfile.d/01-apps.yaml51-64
Sources:kubernetes/apps/kube-system/spegel/app/helmrelease.yaml18-28bootstrap/helmfile.d/01-apps.yaml51-57
Configuration Lifecycle Management
Reloader by Stakater is used to solve the problem of pods not picking up changes to ConfigMaps or Secrets without a manual rollout.
Reloader Mechanics
Reloader watches for changes to resources and performs a rolling upgrade on associated Deployments, StatefulSets, or DaemonSets.
- Global Deployment: Deployed as a single instance in
kube-systemkubernetes/apps/kube-system/reloader/app/helmrelease.yaml19-21 - Usage Pattern: Applications opt-in by adding the annotation
reloader.stakater.com/auto: "true"to their metadata. For example,external-dnsuses this to ensure it restarts if its Cloudflare API credentials change kubernetes/apps/network/external-dns/app/helmrelease.yaml45-46
Sources:kubernetes/apps/kube-system/reloader/app/helmrelease.yaml1-41kubernetes/apps/network/external-dns/app/helmrelease.yaml45-46
Cluster Metrics and Observability
The Metrics Server provides the core metrics pipeline for the cluster, enabling tools like kubectl top and the Horizontal Pod Autoscaler (HPA).
Metrics Server
- Deployment: Managed via Helm at
kubernetes/apps/kube-system/metrics-server/app/. - Role: Collects resource metrics from Kubelets and exposes them via the Metrics API for use by the Kubernetes scheduler and other controllers.
Component Integration Map
This diagram maps the logical system components described above to their specific HelmRelease and OCIRepository entities in the codebase.
Title: Node Infrastructure Code Entity Mapping
[Flowchart Diagram]
Sources:kubernetes/apps/kube-system/node-feature-discovery/app/helmrelease.yaml2-24kubernetes/apps/kube-system/intel-device-plugin/gpu/helmrelease.yaml18-26kubernetes/apps/kube-system/spegel/app/helmrelease.yaml2-10kubernetes/apps/kube-system/reloader/app/helmrelease.yaml1-20
Summary Table of Hardware Plugins
| Component | Purpose | Key Configuration | Source File |
|---|---|---|---|
| NFD | Node Labeling | sleepInterval: 0s | node-feature-discovery/app/helmrelease.yaml |
| Intel Operator | Device Lifecycle | --devices=gpu | intel-device-plugin/app/helmrelease.yaml |
| Intel GPU Plugin | iGPU Allocation | sharedDevNum: 5 | intel-device-plugin/gpu/helmrelease.yaml |
| Spegel | P2P Image Cache | hostPort: 29999 | spegel/app/helmrelease.yaml |
| Reloader | Config Watcher | readOnlyRootFileSystem: true | reloader/app/helmrelease.yaml |
Sources:kubernetes/apps/kube-system/node-feature-discovery/app/helmrelease.yaml39kubernetes/apps/kube-system/intel-device-plugin/app/helmrelease.yaml39kubernetes/apps/kube-system/intel-device-plugin/gpu/helmrelease.yaml38kubernetes/apps/kube-system/spegel/app/helmrelease.yaml23kubernetes/apps/kube-system/reloader/app/helmrelease.yaml34