Show HN: My "Grandma" prompt dropped a production DB. So I built a Kill Switch

  • Posted 5 hours ago by Esrbwt
  • 1 points
I was testing a LangChain agent this weekend and tried a "Grandma Bedtime Story" prompt injection to see if it would roleplay.

It worked too well. The agent bypassed my safeguards and executed a `drop_table` tool call because it thought it was "telling a story."

Realizing 99% of agents are vulnerable to this kind of context-based injection, I spent the weekend building two things:

1. A repo of 5,000+ Agent Attack Vectors (Grandma, CEO Override, Debug Mode): https://github.com/Esrbwt1/voidgate

2. VoidGate – A semantic firewall and remote kill switch for Python agents: https://voidgate.vercel.app/

It uses Upstash Redis to check a "Kill Flag" in <50ms before allowing any tool execution. If you see your agent going rogue, you flip the switch, and it dies instantly globally.

I'm releasing the Python client as open source. The "Attack DB" is also free to use for your own red-teaming.

Would love feedback on the attack vectors. Is anyone else seeing agents fail simple roleplay exploits?

1 comments

    Loading..