Go back

Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint

Posted 1 hour ago by charles_irl
21 points

https://modal.com/blog/truly-serverless-gpus

1 comments

Loading..