Tokasaurus: An LLM inference engine for high-throughput workloads

  • Posted 8 months ago by rsehrlich
  • 218 points
https://scalingintelligence.stanford.edu/blogs/tokasaurus/

12 comments

    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..