This is the result of that question itself and some weekend vibecoding (it has the linked library repository in the readme as well), it seems to work, even on consumer gpus, it should work better on professional ones tho
Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU
- Posted 1 day ago by xaskasdf
- 364 points
23 comments
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..