HN – Show HN: oMLX – coding agents on local LLMs without the painful reprefill

I was frustrated that coding agents like Claude Code were basically unusable with local models. every few turns the prefix shifts, KV cache gets invalidated, and your mac has to re-prefill the entire context from scratch.

So i built oMLX. it persists KV cache blocks to SSD, and when a previous context comes back, it restores from disk instead of recomputing. this alone made Qwen3-Coder-80B on my M3 Ultra actually usable for real coding sessions.

Some other stuff it does: continuous batching, multi-model serving (LLM + embedding + reranker at once), prefix sharing with copy-on-write, and a native mac menubar app so you don't have to touch the terminal.

Just shipped a built-in benchmark tool too, so you can test your own setup easily.

Show HN: oMLX – coding agents on local LLMs without the painful reprefill

0 comments