Skip to contentRU

If you have a ticket, log in to watch the video

TalkDate: 17.09 / Start: 00:00 – Finish: 00:00

GPU Inference in K8s: Acceleration, Sharing and Scaling Without Pain

In RussianComplexity -

Presentation pdf

How can I speed up GPU inference in Kubernetes and not go crazy? It's all about scaling, sharing, speeding up the start and choosing shaders. With examples, hacks, and conclusions from real production.

Speakers

Anton Alekseev
Avito

Invited experts

Timur Gilmullin
Automator, independent expert

Other talks on «K8s»
- Watch recording
  Cross-Zone Traffic and Native Ways To Manage It in Kubernetes
  Dmitrii Rybalka
  Lamoda Tech
  3 hallIn RussianComplexity -
- Watch recording
  Scheduling GPU Workloads in Kubernetes: From Standard Mechanisms to Custom Solutions
  Makarii Balashov
  Yandex Cloud
  2 hallIn RussianComplexity -
- Watch recording
  L2 Announcements in Cilium: Access to Load Balancer in Bare-Metal Kubernetes
  Mikhail Petrov
  Yandex Cloud
  2 hallIn RussianComplexity -
- Watch recording
  Kafka in K8s Is Easy!
  Dmitrii Baskakov
  Mindbox
  1 hallIn RussianComplexity -
- Watch recording
  Deploying the Open-Source Site of the Chelyabinsk Zoo in Kubernetes
  Aleksandr Shinkarev
  Tourmaline Core
  Maxim Rychkov
  Tourmaline Core
  3 hallIn RussianComplexity -
- Watch recording
  Multitenancy Monitoring in Kubernetes: Why, Whom for, and How
  Vladimir Guryanov
  Flant
  1 hallIn RussianComplexity -
- Watch recording
  Platforms and Other Adult Toys
  Vasilii Kutsenko
  PochtaTech
  1 hallIn RussianComplexity -
Other talks on «Infrastructure»
- Watch recording
  Scheduling GPU Workloads in Kubernetes: From Standard Mechanisms to Custom Solutions
  Makarii Balashov
  Yandex Cloud
  2 hallIn RussianComplexity -
- Watch recording
  Testing Tools for Configuration Management Systems
  Andrei Kolesnikov
  Avito
  2 hallIn RussianComplexity -
- Watch recording
  Network Drives for Dedicated Servers: Ceph, iSCSI, and Pain-Free Automation
  Vladimir Ivanov
  Selectel
  2 hallIn RussianComplexity -
- Watch recording
  Implementation of Policy as Code in Apache Kafka
  Danila Malanin
  T2
  Denis Kunichkin
  T2
  1 hallIn RussianComplexity -
- Watch recording
  Evolution of the Logging System at ecom.tech: From Elastic Stack to VictoriaLogs
  Valerii Evdokimov
  ecom.tech
  1 hallIn RussianComplexity -
- Watch recording
  Fault-Tolerant Infrastructure: From Knee-Jerk Solutions to More Expensive Ones
  Andrei Radygin
  Flant
  2 hallIn RussianComplexity -
- Watch recording
  Building Centralized Configuration Management of Infrastructure with K8s and SaltStack
  Ivan Gulakov
  MWS Cloud Platform
  2 hallIn RussianComplexity -
- Watch recording
  Fast or Slow: The Story of the Struggle to Speed up Image Builds
  Mikhail Kozhukhovskii
  Analytical Program Solutions
  Avenir Voronov
  KORUS Consulting
  Konstantin Dipež
  DeusOps
  In RussianComplexity -
- Watch recording
  Platforms and Other Adult Toys
  Vasilii Kutsenko
  PochtaTech
  1 hallIn RussianComplexity -
- Watch recording
  The Server Went Down, but the Cache Didn’t Give Up — It Went to S3 and the Production Went Up!
  Nikolay Gubin
  Avito
  1 hallIn RussianComplexity -
Other talks on «ML/AI»
- Watch recording
  The Battle of the Code Assistants. Efficiency, Safety and Cost
  Anton Chernousov
  Yandex Cloud
  Aleksandr Kirillov
  Evrone
  1 hallIn RussianComplexity -
- Watch recording
  From Rag for Operators to a Rag Platform for a Large Bank
  Mark Kuznetsov
  Alfa-Bank
  Alexey Fateev
  Alfa-Bank
  1 hallIn RussianComplexity -
- Watch recording
  The Perfect 'Sandbox' for ML Models: Setting Up Containerization Without Stress
  Daniil Salman
  K2 Tech
  1 hallIn RussianComplexity -
- Watch recording
  How To Get the Most out of GPU and Ray: Our Production ML Infrastructure Pipeline
  Mikhail Untura
  Orion soft
  1 hallIn RussianComplexity -
- Watch recording
  n8n + AI for DevOps processes
  Evgeny Dekhtyarev
  2GIS
  1 hallIn RussianComplexity -
- Watch recording
  Break Me Completely: How AI (Does Not) Help with Pentesting
  Viktor Chaplygin
  Avenir Voronov
  KORUS Consulting
  Konstantin Dipež
  DeusOps
  In RussianComplexity -
- Watch recording
  AI in SDLC
  Avenir Voronov
  KORUS Consulting
  Ilia Atarshchikov
  KORUS Consulting
  2 hallIn Russian