Level 1 → launchd auto-restart (seconds) Level 2 → watchdog health check every 60s Level 3 → Claude Code AI diagnosis + repair (fires after 30+ min failure) Level 4 → Discord/Telegram human escalation
The system looked great on paper. But v3.0.0 had a critical bug: Level 2 logged "Escalating to Level 3..." but never actually called the script. The chain was broken at the most important handoff — silently, for weeks.
v3.1.0 fixes this plus adds chain verification in the installer so each level is tested end-to-end before calling it done.
The meta-lesson: "I have monitoring" ≠ "my monitoring chain is actually connected."