Low-Rank KV Attention: 50% Less Memory, Better Models
Posted 7 hours ago by
destraynor
2
points
https://fin.ai/research/low-rank-key-value-attention-reducing-kv-cache-memory-and-maintaining-head-diversity/
1
comments
Loading..