System Upgrades and Bootstrap

Relevant source files

This page details the lifecycle management of the cluster, from the initial provisioning of Talos Linux and Kubernetes to the automated upgrade orchestration of the underlying system components. It focuses on the system-upgrade namespace and the automated bootstrap sequence that hands off control to Flux CD.

Bootstrap Process

The bootstrap process is a multi-stage sequence designed to bring a bare Talos Linux node to a fully functional GitOps-managed Kubernetes cluster. This process is orchestrated primarily through the bootstrap-cluster.sh script and a dedicated Taskfile.

Phase 1: Talos Provisioning

The initial phase involves applying machine configurations to Talos nodes. The apply_talos_config function in scripts/bootstrap-cluster.sh renders Jinja2 templates (e.g., controlplane.yaml.j2) into machine configurations using render-machine-config.sh and applies them via talosctl apply-configscripts/bootstrap-cluster.sh10-62 Once nodes are configured, bootstrap_talos triggers the initial cluster formation on a controller node scripts/bootstrap-cluster.sh65-80

Phase 2: CRD and Base Resource Installation

Before Flux can take over, certain Custom Resource Definitions (CRDs) and fundamental resources must exist.

  • CRD Pre-installation: The apply_crds function uses helmfile to template OCI-based charts and applies only the CustomResourceDefinition kinds using Server-Side Apply scripts/bootstrap-cluster.sh115-132
  • Core Resources: Fundamental namespaces and secrets are applied via apply_resources using the template at bootstrap/resources.yaml.j2scripts/bootstrap-cluster.sh135-154

Phase 3: Flux Handover

The final bootstrap stage involves installing the Flux controllers and the initial GitOps configuration.

  1. Prometheus Operator CRDs: Applied manually to ensure observability primitives exist [ .taskfiles/Flux/Taskfile.yaml15-18](https://github.com/chaijunkin/home-ops/blob/b5f8d898/ .taskfiles/Flux/Taskfile.yaml#L15-L18)
  2. Flux Installation: The bootstrap task applies the Flux kustomization and creates the sops-age secret for decrypting Git-stored secrets .taskfiles/Flux/Taskfile.yaml19-21
  3. Cluster Sync: Once Flux is running, it reconciles the cluster-apps Kustomization to deploy the rest of the repository .taskfiles/Flux/Taskfile.yaml57

Bootstrap Logic Flow

The following diagram illustrates the execution flow within scripts/bootstrap-cluster.sh.

Bootstrap Execution Flow

[Flowchart Diagram]

Sources: scripts/bootstrap-cluster.sh174-197.taskfiles/Flux/Taskfile.yaml12-26


System Upgrades with TUPPR

System upgrades for Talos Linux and Kubernetes are managed by TUPPR (Talos/Kubernetes Upgrade Controller), located in the system-upgrade namespace kubernetes/apps/system-upgrade/kustomization.yaml4-11

Version Pinning

Versions for the core system components are centralized in a versions.env file. This allows Renovate to track and update versions automatically using specific datasources kubernetes/apps/system-upgrade/versions.env1-5

VariableRenovate DatasourceTarget
KUBERNETES_VERSIONdocker:ghcr.io/siderolabs/kubeletKubernetes Binaries
TALOS_VERSIONdocker:ghcr.io/siderolabs/installerTalos OS Image

Upgrade Controllers

TUPPR utilizes two primary Custom Resources to manage the rollout of updates:

  1. TalosUpgrade: Manages the Talos OS version. It includes a rebootMode (e.g., powercycle) and health checks to ensure the cluster is stable before proceeding. For example, it checks that volsync is not currently synchronizing data kubernetes/apps/system-upgrade/tuppr/upgrades/talosupgrade.yaml1-16
  2. KubernetesUpgrade: Manages the Kubernetes control plane and worker versions, ensuring they match the pinned KUBERNETES_VERSIONkubernetes/apps/system-upgrade/tuppr/upgrades/kubernetesupgrade.yaml1-14

Upgrade Entity Mapping

This diagram maps the logical upgrade concepts to the specific code entities used in the system-upgrade namespace.

TUPPR Upgrade Entity Mapping

[Flowchart Diagram]

Sources: kubernetes/apps/system-upgrade/versions.env1-5kubernetes/apps/system-upgrade/tuppr/upgrades/talosupgrade.yaml1-9kubernetes/apps/system-upgrade/tuppr/app/ocirepository.yaml1-13


Technical Implementation Details

CRD Management

During bootstrap, CRDs are applied using --server-side and --force-conflicts. This is critical for large CRDs like those from kube-prometheus-stack or envoy-gateway which often exceed the size limits of standard kubectl applyscripts/bootstrap-cluster.sh123-126

Bootstrap Taskfile

The .taskfiles/Flux/Taskfile.yaml provides the bootstrap task which acts as the entry point for operators.

TaskPurposeKey Commands
bootstrapInitial Flux deploymentkubectl apply --kustomize ./bootstrap/flux
applyManual Flux build/applyflux build ks ... | kubectl apply --server-side
reconcileForce Git syncflux reconcile kustomization cluster-apps --with-source
github-deploy-keySecret setupsops --decrypt github-deploy-key.sops.yaml | kubectl apply

Sources: .taskfiles/Flux/Taskfile.yaml11-72scripts/bootstrap-cluster.sh115-132