HN – Show HN: Best setup local LLM found for a 5090 (llama.cpp fork + turboquant)

Hi folks, I found this setup on consummer hardware that seems to have great results on local hardware. - qwen 3.6 q6 - 450 K context using turboquant turbo3 mode llama.cpp fork - multimodal support

This AI generated blog article is a kind of "report" of what and how I did and result exemples.

I hope this can be usefull to some peopole.

Note : I am not much intersted in having success with this article, I mainly want to share what I think is an interesting use of a 5090. I generated the blog page telling AI to be compliant with hn "rules" and remain factual.

It's definitely not perfect, done rather quickly, not properly tested over 265K context. please forgive my lazyness :) . I am just enthousiast right now about what can be done on a 5090.

Show HN: Best setup local LLM found for a 5090 (llama.cpp fork + turboquant)

0 comments