SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference

  • Posted 2 hours ago by matt_d
  • 2 points
https://supercomputing-system-ai-lab.github.io/projects/superinfer/

0 comments