I kept rebuilding the same infrastructure every time I wanted to put a Claude Agent SDK into production. Sessions, streaming, sandboxing, persistence, a REST API, file hooks. So I built Ash.
An agent is a folder with a CLAUDE.md file, skills folder, etc, and one-line deploy from cli, or tsx/python sdks.
Some things I cared about when building this:
Sandboxing. Each agent runs in its own isolated process with an environment allowlist, cgroups resource limits, and bubblewrap filesystem isolation on Linux.
Session persistence. All state lives in CRDB. If the server crashes mid-conversation, resume picks up where it left off. You can also snapshot workspaces to S3/GCS and resume on a different machine.
Keeping it thin. The SDK's `Message` types flow through the whole pipeline untranslated, from the Unix socket to SSE to the client. Ash adds session routing, sandbox pooling, and lifecycle management. It doesn't re-wrap or translate the SDK's types.
I've been measuring overhead pretty carefully. Ash adds <0.5ms per message at p99. Warm resume is 1.7ms. Cold resume (restore workspace, spawn process, reconnect) is 32ms.
OSS, Self-hostable, MIT licensed: https://github.com/ash-ai-org/ash-ai