Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint

  • Posted 1 hour ago by charles_irl
  • 21 points
https://modal.com/blog/truly-serverless-gpus

1 comments

    Loading..