Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism
Posted 2 hours ago by
matt_d
2
points
https://mlsys.wuklab.io/posts/nitsum/
0
comments