Cilium LRP Breaks NodeLocal DNS After Node Reboot
Problem
After a node reboot, DNS resolution breaks for all pods on that node. Symptoms:
nslookup kubernetes.default.svc.cluster.localtimes out- External DNS (google.com) may work while internal cluster DNS fails
- ArgoCD shows
Unknownsync status across many applications nodelocaldnspod logs show upstream DNS timeouts- CoreDNS pods on other nodes are healthy and running
Root Cause
Multiple interacting bugs in Cilium v1.18+/v1.19's CiliumLocalRedirectPolicy (LRP):
-
addressMatcher frontend guard bypass (PR #45522): When LRP backing pods (nodelocaldns) are not yet Ready after a node reboot, the
len(pods)==0code path runs an unconditionalDeleteFrontendthat wipes the LRP frontend before the override guard can inspect it. By the time pods come up, the BPF state is inconsistent —cilium service listshows "active" but the datapath doesn't redirect traffic. -
skipRedirectFromBackend broken (v1.19.4): When using
serviceMatcherfor kube-dns, nodelocaldns forwards cluster.local queries to the kube-dns ClusterIP (10.43.0.10) via TCP.skipRedirectFromBackend: trueshould prevent the LRP from redirecting nodelocaldns's own traffic back to itself, but it doesn't work — creating a redirect loop that times out. -
TCX attachment mode (v1.19 default on kernel 6.6+): Cilium 1.19 switched from tc to tcx BPF attachment. PR #45740 fixed silent packet drops in tcx hooks. Compounds LRP issues.
Architecture
The current solution uses serviceMatcher LRP with a sidecar for dynamic upstream discovery:
Pod → kube-dns (10.43.0.10) → Cilium LRP serviceMatcher → nodelocaldns
↓ cache miss
forward to CoreDNS pod IPs
(discovered dynamically by
corefile-watcher sidecar via
kube-dns-upstream headless service)
Key files:
- system/kube-system/resources/nodelocaldns/ — all nodelocaldns resources
- system/kube-system/cilium-values.yaml — localRedirectPolicy: true
- metal/roles/k3s/templates/config.yaml.j2 — cluster-dns=10.43.0.10
How to Diagnose
# 1. Check nodelocaldns and cilium pods on affected node
kubectl --context=grigri get pods -n kube-system -l k8s-app=node-local-dns -o wide
# 2. Check if LRP is redirecting correctly
kubectl --context=grigri exec -n kube-system cilium-<node-pod> -- \
cilium-dbg service list | grep -E 'kube-dns|LocalRedirect'
# Should show LocalRedirect for kube-dns (10.43.0.10:53) pointing to nodelocaldns pod IP
# 3. Check sidecar discovered CoreDNS IPs
kubectl --context=grigri logs -n kube-system <nodelocaldns-pod> -c corefile-watcher --tail=5
# 4. Check generated Corefile has pod IPs (not 10.43.0.10)
kubectl --context=grigri exec -n kube-system <nodelocaldns-pod> -c node-cache -- \
cat /etc/coredns/Corefile
# 5. Test DNS from a pod
kubectl --context=grigri exec -n <ns> <pod> -- nslookup kubernetes.default.svc.cluster.local
Fix / Workaround
If DNS breaks after a node reboot:
# Restart cilium pod on the affected node
kubectl --context=grigri delete pod cilium-<node-pod> -n kube-system
If sidecar hasn't discovered endpoints:
# Check kube-dns-upstream endpoints exist
kubectl --context=grigri get endpoints kube-dns-upstream -n kube-system
# Restart nodelocaldns to force sidecar rediscovery
kubectl --context=grigri rollout restart daemonset/nodelocaldns -n kube-system
Observed Incident: grigri Node Reboot (2026-06-05)
Node: grigri (x86_64, kernel 6.8.0-124-generic, Cilium v1.19.4)
Trigger: Node reboot caused all pods to restart. Cilium and nodelocaldns both have
system-node-critical priority. After restart, the old addressMatcher LRP (169.254.25.10)
had stale BPF state — cilium service list showed "active" but DNS queries to 169.254.25.10
went unanswered.
Resolution: Migrated from addressMatcher (169.254.25.10) to serviceMatcher (kube-dns)
with a corefile-watcher sidecar that dynamically discovers CoreDNS pod IPs from the
kube-dns-upstream headless service endpoints, avoiding the redirect loop caused by broken
skipRedirectFromBackend.
Migration from addressMatcher to serviceMatcher
The old setup used addressMatcher with 169.254.25.10 (link-local IP):
- Pods queried 169.254.25.10 → LRP redirected to nodelocaldns
- Fragile: 169.254.25.10 is not a Kubernetes service, not managed by Cilium
- After node reboot, LRP BPF state became stale
- Complete DNS outage with no fallback (169.254.25.10 goes nowhere without LRP)
The new setup uses serviceMatcher with kube-dns service:
- Pods query kube-dns (10.43.0.10) → LRP redirects to nodelocaldns
- Better failure mode: if LRP breaks, pods fall through to CoreDNS directly
- Sidecar dynamically discovers CoreDNS pod IPs to avoid redirect loop
- kubelet cluster-dns changed from 169.254.25.10 to 10.43.0.10 (requires Ansible)