Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint
Posted 1 hour ago by
charles_irl
21
points
https://modal.com/blog/truly-serverless-gpus
1
comments
Loading..