Instead of reading benchmark numbers, you can feel how fast or slow different configurations are, by adjusting TTFT, token generation rate, and output length. It streams tokens exactly as an LLM would, but without generating real content.
I was wondering which Apple should I buy and then I did it in the weekend, to better feel what does it mean to run locally a model.
The project/toy is public on github too: https://github.com/htxsrl/localllmsimulation
Thanks to the sources (cited) for the real benchmarks that allowed me to set up a small ML model to fit even futuristic hardware (like an imaginary M9 with 2048 Gb RAM and 3000Gb/s bandwidth).