HN – Show HN: LLM Simulation – Experience TTFT and tokens/SEC before investing

I built a small tool to simulate the user experience of LLM response speeds, focusing on TTFT (time to first token) and tokens/second.

Instead of reading benchmark numbers, you can feel how fast or slow different configurations are, by adjusting TTFT, token generation rate, and output length. It streams tokens exactly as an LLM would, but without generating real content.

I was wondering which Apple should I buy and then I did it in the weekend, to better feel what does it mean to run locally a model.

The project/toy is public on github too: https://github.com/htxsrl/localllmsimulation

Thanks to the sources (cited) for the real benchmarks that allowed me to set up a small ML model to fit even futuristic hardware (like an imaginary M9 with 2048 Gb RAM and 3000Gb/s bandwidth).

Show HN: LLM Simulation – Experience TTFT and tokens/SEC before investing

1 comments