Simple, zero overhead way to compress model, KV cache via Low-Rank Decomposition
Posted 34 minutes ago by
thw20
1
points
https://jeffreywong20.github.io/a3.github.io/
0
comments