Tokasaurus: An LLM inference engine for high-throughput workloads

  • Posted 1 day ago by rsehrlich
  • 213 points
https://scalingintelligence.stanford.edu/blogs/tokasaurus/

12 comments

    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..