Lossless LLM compression for efficient GPU inference via dynamic-length float

  • Posted 15 hours ago by CharlesW
  • 341 points
https://arxiv.org/abs/2504.11651

22 comments

    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..