Gatus, Exporters, and Health Monitoring
Relevant source files
- infrastructure/ansible/playbooks/monitoring.yaml
- kubernetes/apps/downloads/recyclarr/app/externalsecret.yaml
- kubernetes/apps/downloads/recyclarr/app/helmrelease.yaml
- kubernetes/apps/downloads/recyclarr/app/kustomization.yaml
- kubernetes/apps/downloads/recyclarr/app/resources/recyclarr.yml
- kubernetes/apps/downloads/recyclarr/ks.yaml
- kubernetes/apps/home-automation/home-assistant/ks.yaml
- kubernetes/apps/observability/exporters/blackbox-exporter/app/helmrelease.yaml
- kubernetes/apps/observability/exporters/blackbox-exporter/app/kustomization.yaml
- kubernetes/apps/observability/exporters/blackbox-exporter/app/probe.yaml
- kubernetes/apps/observability/exporters/blackbox-exporter/ks.yaml
- kubernetes/apps/observability/exporters/blackbox-exporter/vpn/helmrelease.yaml
- kubernetes/apps/observability/exporters/blackbox-exporter/vpn/kustomization.yaml
- kubernetes/apps/observability/exporters/blackbox-exporter/vpn/ocirepository.yaml
- kubernetes/apps/observability/exporters/blackbox-exporter/vpn/probes.yaml
- kubernetes/apps/observability/exporters/nut-exporter/ks.yaml
- kubernetes/apps/observability/exporters/opnsense-exporter/app/externalsecret.yaml
- kubernetes/apps/observability/exporters/opnsense-exporter/app/helmrelease.yaml
- kubernetes/apps/observability/exporters/opnsense-exporter/app/kustomization.yaml
- kubernetes/apps/observability/exporters/opnsense-exporter/ks.yaml
- kubernetes/apps/observability/exporters/smartctl-exporter/app/kustomization.yaml
- kubernetes/apps/observability/exporters/smartctl-exporter/ks.yaml
- kubernetes/apps/observability/exporters/speedtest-exporter/app/kustomization.yaml
- kubernetes/apps/observability/exporters/speedtest-exporter/ks.yaml
- kubernetes/apps/observability/gatus/app/externalsecret.yaml
- kubernetes/apps/observability/gatus/app/grafana-dashboard.yaml
- kubernetes/apps/observability/gatus/app/helmrelease.yaml
- kubernetes/apps/observability/gatus/app/kustomization.yaml
- kubernetes/apps/observability/gatus/app/resources/config.yaml
- kubernetes/apps/observability/gatus/ks.yaml
- kubernetes/apps/observability/k8s-monitoring/app/helmrelease.yaml
- kubernetes/apps/observability/k8s-monitoring/app/httproute.yaml
- kubernetes/apps/observability/k8s-monitoring/app/kustomization.yaml
- kubernetes/apps/observability/k8s-monitoring/ks.yaml
- kubernetes/apps/observability/kube-prometheus-stack/app/externalsecret.yaml
- kubernetes/apps/observability/kube-prometheus-stack/app/resources/alertmanager.yaml
- kubernetes/apps/observability/kube-prometheus-stack/app/scrapeconfig.yaml
- kubernetes/apps/observability/loki/app/helmrelease.yaml
This section details the specialized monitoring components of the home-ops cluster. While the core metrics stack is handled by Prometheus and Loki, these exporters and health-checking tools provide deep visibility into external infrastructure, hardware health, and service availability.
Gatus: Status Monitoring
Gatus serves as the primary health dashboard and service checker for the homelab. It performs active probing of endpoints and provides a public-facing status page.
Implementation and Discovery
The Gatus deployment utilizes a gatus-sidecar for automated discovery of cluster resources.
- Auto-Discovery: The
gatus-sidecarkubernetes/apps/observability/gatus/app/helmrelease.yaml32-39 watches the cluster forHTTPRouteandServiceresources, automatically generating Gatus endpoints. - State Management: Persistence is handled via a SQLite database stored at
/config/sqlite.dbkubernetes/apps/observability/gatus/app/resources/config.yaml5-7 - Configuration: Static endpoints, such as public websites and core services, are defined in a
ConfigMapkubernetes/apps/observability/gatus/app/resources/config.yaml46-61 - Alerting: Gatus is configured to send alerts to Discord and Pushover when failure thresholds are met kubernetes/apps/observability/gatus/app/resources/config.yaml23-31
Data Flow: Gatus Service Monitoring
Title: Gatus Monitoring Logic
[Flowchart Diagram]
Sources: kubernetes/apps/observability/gatus/app/helmrelease.yaml31-82kubernetes/apps/observability/gatus/app/resources/config.yaml5-31
Blackbox Exporter
The Prometheus Blackbox Exporter allows for probing of endpoints over HTTP, HTTPS, DNS, TCP, and ICMP.
- Modules: Configured with
http_2xx,icmp, andtcp_connectmodules kubernetes/apps/observability/exporters/blackbox-exporter/app/helmrelease.yaml43-62 - Probes: Custom
Proberesources are used to monitor infrastructure devices (e.g.,smb.cloudjur.com,gateway.cloudjur.com) and network services like NFS on port 2049 kubernetes/apps/observability/exporters/blackbox-exporter/app/probe.yaml1-45 - Alerting: Includes rules for SSL certificate expiry and probe failures kubernetes/apps/observability/exporters/blackbox-exporter/app/helmrelease.yaml102-125
Sources: kubernetes/apps/observability/exporters/blackbox-exporter/app/helmrelease.yaml42-62kubernetes/apps/observability/exporters/blackbox-exporter/app/probe.yaml1-45
Specialized Exporters
The cluster utilizes several specialized exporters to pull metrics from non-standard sources:
| Exporter | Target | Implementation Details |
|---|---|---|
| OPNsense Exporter | Firewall | Scrapes OPNsense API via HTTPS (insecure mode enabled for local) kubernetes/apps/observability/exporters/opnsense-exporter/app/helmrelease.yaml21-42 |
| SMARTCTL Exporter | Disk Health | Monitors S.M.A.R.T. data for physical drives. |
| NUT Exporter | UPS | Interfaces with Network UPS Tools to monitor power status. |
| Speedtest Exporter | WAN Performance | Periodically runs speed tests to track ISP bandwidth. |
Sources: kubernetes/apps/observability/exporters/opnsense-exporter/app/helmrelease.yaml21-42
k8s-monitoring Alloy Stack
The k8s-monitoring stack (based on Grafana Alloy) acts as a high-performance telemetry collector that bridges various observability signals.
- Destinations: Routes telemetry to
kube-prometheus-stack(metrics),loki(logs),tempo(traces), andpyroscope(profiling) kubernetes/apps/observability/k8s-monitoring/app/helmrelease.yaml42-69 - Log Collection: Specifically handles
nodeLogsandpodLogsViaLokiby tailing/var/log/pods/*kubernetes/apps/observability/k8s-monitoring/app/helmrelease.yaml93-107 - Application Observability: Acts as an OTLP receiver for gRPC and HTTP, enriching traces with Kubernetes attributes (namespace, pod UID, etc.) kubernetes/apps/observability/k8s-monitoring/app/helmrelease.yaml123-132
Code Entity Mapping: Alloy Telemetry Pipeline
Title: Alloy Collector Routing
[Flowchart Diagram]
Sources: kubernetes/apps/observability/k8s-monitoring/app/helmrelease.yaml42-132
Alerting and Remediation
Alertmanager coordinates the response to health monitoring failures.
- Routing: Critical alerts are sent to both Discord and Pushover kubernetes/apps/observability/kube-prometheus-stack/app/resources/alertmanager.yaml26-33
- Automated Remediation: Specific alerts, such as
VolSyncVolumeOutOfSync, trigger aremediation-webhookkubernetes/apps/observability/kube-prometheus-stack/app/resources/alertmanager.yaml21-25 This calls an internal service to attempt automated recovery of the volume kubernetes/apps/observability/kube-prometheus-stack/app/resources/alertmanager.yaml46-49
Sources: kubernetes/apps/observability/kube-prometheus-stack/app/resources/alertmanager.yaml4-53