Low-Rank KV Attention: 50% Less Memory, Better Models

Posted 7 hours ago by destraynor
2 points

https://fin.ai/research/low-rank-key-value-attention-reducing-kv-cache-memory-and-maintaining-head-diversity/

1 comments

Loading..