My pain with agents on real projects: big tasks degrade into babysitting (continue -> fix regression -> rerun -> repeat). They stop too early - so you can’t step away.
ProofLoop’s approach: you write a Definition of Done once, then it keeps planning/executing/verifying until the whole contract is satisfied. The “done” contract can include high-level, plain-text acceptance criteria
Example run: proofloop run "Implement a complete feature from scratch: API endpoints, database schema, frontend components, full test coverage" \ --path ./project --provider codex --timeout 8
Providers/adapters: OpenCode, Codex (ChatGPT), Claude Code. Local-first. Apache 2.0. Repo: https://github.com/exiw-ai/proofloop
Install: curl -LsSf https://raw.githubusercontent.com/exiw-ai/proofloop/main/ins... | sh
Would love feedback / war stories from anyone running agents on multi-hour tasks. Thanks!