Overzealous Warden
Incident Report
INCIDENT: Overzealous Warden (Boss Room)
Severity: P1 - Application Down Reported: 09:15 UTC Status: OPEN - Awaiting remediation
Incident Summary
The security team hardened the production namespace overnight. This morning the escape-app deployment won't start. Pods are stuck and never become Ready.
Initial Report
"Security pushed new pod security policies last night. Now our nginx pods won't even start. We've been told we can't just remove the security settings — we need to make the app work with them." — On-call engineer
What We Know
- The
escape-appDeployment was redeployed with new security context settings - Pods are NOT in Running state
- The security team requires
runAsNonRootandreadOnlyRootFilesystemto stay enabled - There may be more than one issue — fixing the first problem could reveal another
Triage Checklist
Start your investigation here:
# 1. Get overall status
kubectl get all -n escape-boss-overzealous-warden
# 2. Check pod status and events
kubectl get pods -n escape-boss-overzealous-warden
kubectl describe pod -l app=escape-app -n escape-boss-overzealous-warden
# 3. Check the security context
kubectl get deployment escape-app -n escape-boss-overzealous-warden \
-o jsonpath='{.spec.template.spec.containers[0].securityContext}' | jq .
# 4. Check events for clues
kubectl get events -n escape-boss-overzealous-warden --sort-by='.lastTimestamp'
Success Criteria
- All
escape-apppods are inRunningstate AND show1/1Ready runAsNonRoot: trueis still set (don't just remove security)readOnlyRootFilesystem: trueis still set (don't just remove security)
Namespace
All resources are in the escape-boss-overzealous-warden namespace.
On-call engineer, the security settings must stay. Make the app work within the constraints.
Quick Start
Run this command in your terminal to set up the room:
$ make room-apply ROOM=boss-overzealous-wardenThis creates the namespace escape-boss-overzealous-warden with the broken resources.
Other useful commands:
$ make room-test ROOM=boss-overzealous-wardenVerify the room is in the expected broken state
$ make room-escape-test ROOM=boss-overzealous-wardenTest if you have successfully fixed all issues
$ make room-reset ROOM=boss-overzealous-wardenReset the room to try again
Useful Commands
Check pod status
$ kubectl get pods -n escape-boss-overzealous-wardenSee the current state of pods in the namespace
View events
$ kubectl get events -n escape-boss-overzealous-warden --sort-by='.lastTimestamp'Check recent events for error details
Describe pods
$ kubectl describe pods -n escape-boss-overzealous-wardenGet detailed information about pods
Check logs
$ kubectl logs -l app.kubernetes.io/part-of=K8sEscapeRoom -n escape-boss-overzealous-wardenView the application logs
Hints
Submit Proof
Login to submit proof and track your progress.
Login with GitHubView Solution (Spoiler)
Solution preview locked
Complete the room to unlock the full solution here
Run this to see the full solution:
$ make room-solution ROOM=boss-overzealous-wardenShow solution anyway (spoiler)
Solution: Security Lockdown
Root Causes (MULTIPLE)
This incident has two layered failures — the second is invisible until the first is fixed:
Failure #1: runAsNonRoot Without runAsUser
securityContext:
runAsNonRoot: true # Requires non-root user
# runAsUser: ??? # But no user is specified!
The nginx:1.25-alpine image runs as root by default (UID 0). When runAsNonRoot: true is set without specifying a runAsUser, Kubernetes checks the image's default user, sees it's root, and refuses to start the container.
Result: CreateContainerConfigError — container never starts.
Failure #2: Read-Only Filesystem Without Writable /tmp
securityContext:
readOnlyRootFilesystem: true # Entire filesystem is read-only
# No emptyDir volume for /tmp!
The nginx.conf is already configured to write its PID file, cache, and all temp files to /tmp. But readOnlyRootFilesystem: true makes /tmp read-only along with everything else. nginx crashes immediately on startup.
Result: CrashLoopBackOff — container starts but crashes on first write.
Why this is tricky: Bug #2 is completely hidden while Bug #1 is active. The container never starts, so you never see the filesystem error.
Diagnosis Steps
# Step 1: Check pod status — notice CreateContainerConfigError
kubectl get pods -n escape-boss-overzealous-warden
# NAME READY STATUS RESTARTS AGE
# escape-app-xxxxx 0/1 CreateContainerConfigError 0 5m
# Step 2: Describe pod for the error message
kubectl describe pod -l app=escape-app -n escape-boss-overzealous-warden
# Events:
# Warning Failed container has runAsNonRoot and image will run as root
# Step 3: Check the security context
kubectl get deployment escape-app -n escape-boss-overzealous-warden \
-o jsonpath='{.spec.template.spec.containers[0].securityContext}'
# {"readOnlyRootFilesystem":true,"runAsNonRoot":true}
# Notice: no runAsUser!
# Step 4: After fixing runAsUser, pod crashes — check logs
kubectl logs -l app=escape-app -n escape-boss-overzealous-warden --previous
# nginx: [emerg] open() "/tmp/nginx.pid" failed (30: Read-only file system)
The Fix
Open the deployment in your editor:
kubectl edit deployment escape-app -n escape-boss-overzealous-warden
You can fix both bugs in one edit. Here's what to change — lines marked with # <-- ADD are the only additions:
spec:
containers:
- name: nginx
# ...
volumeMounts:
- mountPath: /etc/nginx/nginx.conf # already exists
name: nginx-config # already exists
subPath: nginx.conf # already exists
readOnly: true # already exists
- mountPath: /tmp # <-- ADD
name: tmp # <-- ADD
securityContext:
runAsNonRoot: true
runAsUser: 101 # <-- ADD (nginx user in alpine)
readOnlyRootFilesystem: true
volumes:
- configMap: # already exists
name: nginx-config # already exists
name: nginx-config # already exists
- emptyDir: {} # <-- ADD
name: tmp # <-- ADD
Save and close — Kubernetes rolls out a new pod automatically.
What each change does:
runAsUser: 101— tells Kubernetes to run the container as the nginx user (UID 101) instead of root, satisfyingrunAsNonRootemptyDirat/tmp— provides a writable directory for nginx's PID file, cache, and temp files, while the rest of the filesystem stays read-only
Verification
# Wait for rollout
kubectl rollout status deployment/escape-app -n escape-boss-overzealous-warden
# Check pods are Running and Ready
kubectl get pods -n escape-boss-overzealous-warden
# NAME READY STATUS RESTARTS AGE
# escape-app-xxxxx 1/1 Running 0 30s
# Verify security context is still enforced
kubectl get deployment escape-app -n escape-boss-overzealous-warden \
-o jsonpath='{.spec.template.spec.containers[0].securityContext}'
# Should still have runAsNonRoot: true AND readOnlyRootFilesystem: true
Lessons Learned
- Layered failures hide each other — the container must start before filesystem errors appear
runAsNonRootrequires explicitrunAsUserwhen the image defaults to rootreadOnlyRootFilesystemrequires writable volumes for any directory the app writes to- Consolidate writable paths to
/tmp— a single emptyDir is simpler than many - Don't remove security to fix issues — work within the constraints using volumes and user settings
Real-World Considerations
This pattern is extremely common in production:
- Pod Security Standards (PSS) enforce
runAsNonRootat the namespace level - CIS benchmarks recommend
readOnlyRootFilesystemfor all containers - Many popular images (nginx, redis, postgres) default to running as root
- Teams often enable security policies without testing existing deployments
Prevention:
- Use distroless or non-root base images
- Always specify
runAsUseralongsiderunAsNonRoot - Test with
readOnlyRootFilesystem: trueduring development - Configure apps to write all temp/cache/pid files under
/tmp - Use Pod Security Admission to catch misconfigurations before deployment