Lossless LLM compression for efficient GPU inference via dynamic-length float
- Posted 9 months ago by CharlesW
- 411 points
22 comments
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..
Loading..