Usual implementation of attention transformers (SDPA) is kind of bad, actually
Posted 1 hour ago by
teleforce
1
points
https://gist.github.com/celoyd/6bf10122c3f5f7e64b0c684704e4ffb2
0
comments