Lossless LLM compression for efficient GPU inference via dynamic-length float

  • Posted 9 months ago by CharlesW
  • 411 points
https://arxiv.org/abs/2504.11651

22 comments

    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..