Show HN: NanoSLG – Hack Your Own Multi-GPU LLM Server (5x Faster, Educational)

  • Posted 2 hours ago by geniusyan
  • 1 points
https://github.com/Guney-olu/nanoslg
I built NanoSLG as a minimal, educational inference server for LLMs like Llama-3.1-8B. It supports Pipeline Parallelism (split layers across GPUs), Tensor Parallelism (shard weights), and Hybrid modes for scaling.

0 comments