Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
- Posted 5 hours ago by NicoConstant
- 106 points
23 comments
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..