Lossless LLM compression for efficient GPU inference via dynamic-length float
- Posted 15 hours ago by CharlesW
- 341 points
22 comments
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..