6 Practices that turned AI from prototyper to workhorse (106 PRs in 14 days)

  • Posted 5 hours ago by waleedk
  • 13 points
1. Specs and plans are source code: Specs and plans live in git alongside source code, not in chat history. A new agent reads arch.md for the big picture, then its specific spec. You always know why something was built.

2. Three models review every phase: Claude, Gemini, and Codex catch almost entirely different bugs. No single model found more than 55% of issues. If you only review with the model that wrote the code, you're missing half the bugs. 20 bugs caught before shipping. Claude Code found 5 bugs, Gemini and Codex caught another 15, including a severe security issue Claude missed.

3. Enforce the process, don't suggest it. A state machine forces Spec → Plan → Implement → Review → PR. The AI can't skip steps. Tests must pass before advancing. AIs don't stick to the plan by themselves, you need rails.

4. Annotate, don't edit. Most of the work is writing specs and reviews that guide the code, not hacking at files in an open-ended chat.

5. Agents coordinate agents. An architect agent spawns builder agents into isolated git worktrees. You direct the architect; it directs the builders. They message each other async.

6. Manage the whole lifecycle. Most AI tools help you write code faster — maybe 30% of the job. The other 70% is planning how, reviewing, integrating, deployment scripts, managing staging vs prod. Have AI run the whole pipeline from spec to PR and beyond.

Overall result: One engineer able to produce what a team of 3-4 would usually do. Measured 1.2 points better code on a 10 point scale vs claude code. Downsides: takes a lot longer, much more token usage, but still reasonable at $1.60 per PR.

We open sourced it: https://github.com/cluesmith/codev More details and raw results: https://cluesmith.com/blog/a-tour-of-codevos/

5 comments

    Loading..
    Loading..
    Loading..
    Loading..
    Loading..