Talk

GPU Inference in K8s: Acceleration, Sharing and Scaling Without Pain

2 hallIn RussianComplexity -For practicing engineers

How can I speed up GPU inference in Kubernetes and not go crazy? It's all about scaling, sharing, speeding up the start and choosing shaders. With examples, hacks, and conclusions from real production.

Speakers

Invited experts

Schedule