Is there real evidence, beyond hype, that agentic coding produces net-positive results? If any of you have actually got it to work, could you share (in detail) how you did it?
By "getting it to work" I mean: * creating more value than technical debt, and * producing code that’s structurally sound enough for someone responsible for the architecture to sign off on.
Lately I’ve seen a push toward minimal or nonexistent code review, with the claim that we should move from “validating architecture” to “validating behavior.” In practice, this seems to mean: don’t look at the code; if tests and CI pass, ship it. I can’t see how this holds up long-term. My expectation is that you end up with "spaghetti" code that works on the happy path but accumulates subtle, hard-to-debug failures over time.
When I tried using Codex on my existing codebases, with or without guardrails, half of my time went into fixing the subtle mistakes it made or the duplication it introduced.
Last weekend I tried building an iOS app for pet feeding reminders from scratch. I instructed Codex to research and propose an architectural blueprint for SwiftUI first. Then, I worked with it to write a spec describing what should be implemented and how.
The first implementation pass was surprisingly good, although it had a number of bugs. Things went downhill fast, however. I spent the rest of my weekend getting Codex to make things work, fix bugs without introducing new ones, and research best practices instead of making stuff up. Although I made it record new guidelines and guardrails as I found them, things didn't improve. In the end I just gave up.
I personally can't accept shipping unreviewed code. It feels wrong. The product has to work, but the code must also be high-quality.