Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

  • Posted 21 hours ago by yu3zhou4
  • 171 points
https://github.com/jmaczan/tiny-vllm

13 comments

    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..