So I set up to do an experimental project with strict architectural rules, pointed an LLM at it, and fed it the BusyBox test suite as ground truth.
The result: 77 POSIX utilities in pure Go, single static binary (~10MB), and it passes 548 of 552 BusyBox tests.
My Honest take: this project is ~90% wiring the AI to do the heavy lifting, ~10% steering it in the right direction. Took about 3 weeks of "prompts" and plans and all that. "Harness engineering" works when you have a solid test suite to validate against, I know, it is an obvious statement, but I have hard evidence now to prove it.
The test suite is the actual hero here, I cannot stress this enough. Without BusyBox's brutally thorough test suite, this project is just random hallucinated code.
Also, I'm not trying to replace BusyBox. It's an experiment. I'm happy with how it went, cheers!
Repo: github.com/ramayac/goposix (https://github.com/ramayac/goposix)