GPU Inference in K8s: Acceleration, Sharing and Scaling Without Pain
In RussianRUComplexity -
How can I speed up GPU inference in Kubernetes and not go crazy? It's all about scaling, sharing, speeding up the start and choosing shaders. With examples, hacks, and conclusions from real production.