HN – Show HN: I built a coding agent that works with 8k context local models

Most AI coding agents assume you have a 200k-context model. In reality, the local models most people actually use have 8k windows — barely enough for one large file, let alone a whole project.

This tool works in three steps:

-Map: on init, it writes plain Markdown context files: one project-level overview, one per folder, plus a line range index for any file over 150 lines.

-Plan: one LLM call reads the map and turns your request into a task list, with dependencies.

-Execute: it gives only one file to an LLM call. A token counter checks before every single call, and falls back to loading just the relevant line range if the file is too big.

Works with Ollama, LM Studio, Groq, OpenRouter, Gemini, DeepSeek, or any OpenAI-compatible endpoint. Local models run sequentially by default, while cloud providers run in parallel.

The hardest part was conversation memory. 8k isn't enough for the full history, and i have tried compression, but it wasn't going to cut it either. The fix was a ring-buffer eviction system of summaries of the last two completed actions. It offers enough continuity to avoid repeating work, while its cheap enough to always fit.

This was something i had been pasionatelly working on and i hope you guys find it usefull. I am open to hearing any feedback and questions.

Thank you for taking your time to check this project out and I hope you enjoy it as much as me.

Show HN: I built a coding agent that works with 8k context local models

0 comments