The Health Inspector
Objective
Escape Room: The Health Inspector
The application starts up fine, but then Kubernetes keeps killing it repeatedly.
Your Mission
- Identify why the pod keeps restarting
- Investigate what's triggering the restarts
- Fix the configuration so the pod stays running
Success Criteria
- The pod is in
Runningstate withReadycondition - The pod is stable (not restarting)
- The liveness probe is passing
Getting Started
# Check the pod status - notice the restart count
kubectl get pods -n escape-room-health-inspector
# You might see something like:
# NAME READY STATUS RESTARTS AGE
# escape-app-xxxxxxxxx-xxxxx 0/1 Running 3 (5s ago) 30s
Namespace
All resources are in the escape-room-health-inspector namespace.
Good luck, engineer. Something keeps killing your app, and it's not the app's fault.
Quick Start
Run this command in your terminal to set up the room:
$ make room-apply ROOM=room-health-inspectorThis creates the namespace escape-room-health-inspector with the broken resources.
Other useful commands:
$ make room-test ROOM=room-health-inspectorVerify the room is in the expected broken state
$ make room-escape-test ROOM=room-health-inspectorTest if you have successfully fixed all issues
$ make room-reset ROOM=room-health-inspectorReset the room to try again
Useful Commands
Check pod status
$ kubectl get pods -n escape-room-health-inspectorSee the current state of pods in the namespace
View events
$ kubectl get events -n escape-room-health-inspector --sort-by='.lastTimestamp'Check recent events for error details
Describe pods
$ kubectl describe pods -n escape-room-health-inspectorGet detailed information about pods
Check logs
$ kubectl logs -l app.kubernetes.io/part-of=K8sEscapeRoom -n escape-room-health-inspectorView the application logs
Hints
Submit Proof
Login to submit proof and track your progress.
Login with GitHubView Solution (Spoiler)
Solution preview locked
Complete the room to unlock the full solution here
Run this to see the full solution:
$ make room-solution ROOM=room-health-inspectorShow solution anyway (spoiler)
Solution: Probe Doom
Root Cause
The deployment has liveness and readiness probes configured to check port 8080:
livenessProbe:
httpGet:
path: /
port: 8080 # Nothing is listening here!
The nginx container listens on port 80, not 8080. Since nothing is listening on port 8080, every probe attempt gets "connection refused," which Kubernetes counts as a failure. After the configured failureThreshold (2 failures), Kubernetes kills the container.
With periodSeconds: 3 and failureThreshold: 2, the container gets killed every ~6 seconds, causing CrashLoopBackOff.
Diagnosis Steps
# Step 1: Notice the restart count climbing
kubectl get pods -n escape-room-health-inspector -w
# Output: escape-app-xxx-xxx 0/1 Running 4 (2s ago) 30s
# Step 2: Check events for the cause
kubectl get events -n escape-room-health-inspector --sort-by='.lastTimestamp'
# You'll see:
# Warning Unhealthy Liveness probe failed: Get "http://...:8080/":
# dial tcp ...:8080: connect: connection refused
# Normal Killing Container app failed liveness probe, will be restarted
# Step 3: Compare probe port vs container port
kubectl get deployment escape-app -n escape-room-health-inspector \
-o jsonpath='{.spec.template.spec.containers[0].ports[0].containerPort}'
# Output: 80
kubectl get deployment escape-app -n escape-room-health-inspector \
-o jsonpath='{.spec.template.spec.containers[0].livenessProbe.httpGet.port}'
# Output: 8080 ← MISMATCH!
The Fix
Edit the deployment to change the probe ports from 8080 to 80:
kubectl edit deployment escape-app -n escape-room-health-inspector
Find both probe sections and change the port:
# Before (WRONG):
livenessProbe:
httpGet:
path: /
port: 8080 # connection refused - nothing listening
# After (FIXED):
livenessProbe:
httpGet:
path: /
port: 80 # matches containerPort
Do the same for the readinessProbe section. Save and exit — Kubernetes will automatically roll out a new pod with the corrected probes.
Verification
# Watch the new pod roll out
kubectl get pods -n escape-room-health-inspector -w
# Old pod terminates, new pod starts with 0 restarts
# After ~30 seconds, verify stability
kubectl get pods -n escape-room-health-inspector
# Should show: escape-app-xxx-xxx 1/1 Running 0 30s
# Confirm the probes are passing
kubectl describe pod -l app=escape-app -n escape-room-health-inspector | grep -A5 "Liveness:"
Lessons Learned
- "Connection refused" means wrong port - nothing is listening there. This is different from a 404 (wrong path) or timeout (port blocked/slow app).
- Probe port must match the container port - not the Service port or any other port
- Liveness probe failures cause container restarts - they're the "kill switch"
- CrashLoopBackOff isn't always an app crash - it can be probe-induced kills
- Check events for "Unhealthy" and "Killing" messages when debugging restarts
Real-World Considerations
This commonly happens when:
- Copying probe configs from one app to another without adjusting ports
- Confusing Service port (what clients connect to) with container port (what the app listens on)
- Adding sidecars that remap ports
- Helm templates using the wrong port variable (
service.portvscontainer.port) - Framework defaults differ from deployment config (e.g., Spring Boot defaults to 8080)
Best practices:
- Always verify probe port matches
containerPortin the pod spec - Use named ports for clarity:
port: httpinstead ofport: 80 - Test probes manually with
kubectl exec ... curlbefore deploying - Use
startupProbefor apps with variable startup times - Consider
tcpSocketprobes if HTTP isn't practical