Conference Opening Session
We will be talking about the schedule, sessions, and activities. Join us in the hall or online to find out what's in store for you!
The time in the program is for your time zone .
The program hasn’t been finally approved yet, so there still might be some changes.
We will be talking about the schedule, sessions, and activities. Join us in the hall or online to find out what's in store for you!
Why K8s is the best platform for deploying and testing ML models. We will demonstrate a step-by-step plan for creating high-quality machine learning environments in Kubernetes, which will allow you to automate machine learning for production environment creation and codebase management, and also make efficient use of the GPU.
K2 Tech
Testing roles in infrastructure administration. Tools of Infrastructure as Code approach and testing practices of configuration management systems such as Ansible, Puppet, Salt, and Chef. Attendees will gain a comprehensive understanding of the available tools, their advantages and disadvantages, as well as Avito's best testing practices.
Avito
Inter-zone traffic can lead to increased cost of ownership and latency. For a long time it was thought that the solution is only possible through the use of Service Mesh. I will tell you how to solve these problems through native Kubernetes mechanisms.
Lamoda Tech
How we’re implementing n8n for self-service automation without involving the Data Science team: from first steps to real use cases, pitfalls, and memes. How we ended up with an on-prem solution that fits seamlessly into our infrastructure and is understandable even beyond the developer crowd.
Not many people know how to break Kubernetes, much less how to break Kubernetes when it doesn't even exist yet. I'll share my experience of conducting cluster audits at the design stage, when all you have on hand are the Cluster API manifests of future Kubernetes. I'll tell you what types of flaws can be detected at this stage, and which ones can't. I'll dilute all this with interesting moments and automation of the process.
Luntry
Immutable Infrastructure is one of the main trends of recent years. But the bloody enterprise makes its own adjustments. At MWS, we really love the Cluster API and its core concepts, but at some point we needed to make child K8s nodes mutable. We'll talk about this.
MWS Cloud Platform
How Cilium’s built-in L2 announcement feature enables native Kubernetes LoadBalancer services in bare-metal clusters without external components or complex setups, leveraging modern eBPF technology. This approach provides reliable external access to services with minimal operational overhead.
Yandex Cloud
The graphs are green, the reliability is five nines, and the user is unhappy. Sound familiar? It means that somewhere in your calculations your math failed. I will tell you how we at VK calculated reliability for infrastructure products, highlighting critical user paths.
A straight talk about templating CI/CD pipelines with GitLab components. Isolate it, test it, describe it.
Kuper
Transformation of the logging system in the ecom.tech infrastructure. You will learn how the company decided to optimize the system using VictoriaLogs. This reduced the volume of stored data by 10 times, simplified the architecture and ensured stable logging without losing critical data.
ecom.tech
How QA and SRE/DevOps teams come together to test complex changes.
T-Bank
How to build secure multi—tenant monitoring in Kubernetes — from requirements to a custom solution, with an analysis of why standard approaches do not work.
Flant
Let's look at how non-human identities (NHI) are fundamentally different from regular accounts, why multi-factor authentication (MFA) and “change your password once a year” are not applicable to them, and how this leads to incidents.
Independent consultant
We created a public zoo website with open source code, for free, with a full-fledged development team, and even wrapped it in a Kubernetes. We'll tell you why and how we did it.
Tourmaline Core
Tourmaline Core
A talk on building a scalable ML infrastructure based on Ray and Kubernetes with an emphasis on efficient GPU utilization, distributed task management, and integration with external orchestrators. Using real examples, I'll show you how to build a fault-tolerant production pipeline and avoid typical errors when scaling loads.
K2 Cloud
I'll tell you about our Enabling team, how we created it and are developing it. And also about the prerequisites for the appearance, stages of development, problems and areas of work.
Raiffeisen Bank
Let's figure out whether it is possible to calculate the economic effect of such practices as CI/CD, monitoring, code review, and engineering culture in general. Using Raiffeisen Bank cases as an example, we will try to figure out how much it costs to implement any practice and whether it really saves money for a business.
Raiffeisen Bank
The standard Kubernetes scheduler kube-scheduler was developed with general load balancing principles in mind and is not specialized for the unique characteristics of GPU workloads. I propose examining the full spectrum of possibilities: from built-in K8s scheduling mechanisms to customization of the standard scheduler and specialized schedulers such as Volcano, Apache YuniKorn, and KAI-Scheduler.
Yandex Cloud
I will tell you about our approach to development and about the architecture of the "Network drives for dedicated Servers" product, in which each disk is an RBD volume in Ceph connected to a dedicated server via the iSCSI Gateway.
Selectel
Methods of cluster deployment that allow to significantly reduce the attack surface of an attacker on a Kubernetes cluster. I will focus on the podsec-k8s package, which allows deploying a native Kubernetes cluster of versions 1.26 and higher in rootless mode. I will touch on the second method of reducing the attack surface — SSHless cluster (fork of the Talos@SideroLabs project)
BaseALT
How to make the system fault-tolerant without Google's budget? We will analyze solutions for each level of architecture from DNS to database in three variants: minimal, optimal and industrial. I will show you, using examples of on-premise and Russian clouds like Yandex, VK, Selectel, what mistakes to avoid and how to save money without losing reliability.
Fevlake
A talk on ensuring a high level of observability of distributed systems using existing telemetry tools. Using examples, let's look at how to manage the growing cognitive complexity of supporting large systems in terms of monitoring and finding the root causes of degradation.
Kontur
Let's look at part of the cloud security incident monitoring process. We'll prepare a list of current cloud security risks. We'll go from security threat analytics to monitoring them.
МТС Web Services (MWS)
How can I speed up GPU inference in Kubernetes and not go crazy? It's all about scaling, sharing, speeding up the start and choosing shaders. With examples, hacks, and conclusions from real production.
Selectel
How to use AI tools to automate various stages of penetration testing — from reconnaissance to exploit development. Which tasks can realistically be delegated to algorithms and where human expertise is still essential. You will learn how AI can become your partner in security.
How the fallback caching system allows users not to lose Avito, even if production is down. You will learn how we collect statics from all of Avito, how the cache gets into production and why we are sure that it works.
Avito
Kubernetes is a complex ecosystem with many components and dependencies, and common vulnerabilities (CVEs) that, while not all of them are real threats, are often flagged by scanners as reliable threats, creating an overabundance of false production processes and complicating CI/CD processes. Here's how to make sense of this noise.
Flant
A talk on a declarative approach to managing access to Kafka, implemented on the basis of Open Policy Agent. We will find out the principle of operation of Open Policy Agent, as well as get answers to the most popular questions regarding this approach and learn about the experience of real-life operation.
Vibe coding and active use of various code AI assistants is changing the typical developer environment. In the talk we will discuss the current issues around the use of code AI assistant and right on stage investigate and compare the efficiency and speed of different agents.
Yandex Cloud
Evrone
Often I see engineers running up grades to grow their skills and paychecks: from junior to middle, then to senior and... what's next? To become a team lead? What's a tech lead? Or maybe there's some kind of engineering track? I'll tell you about it! Not everyone needs to be a team leader (or maybe they do).
Lamoda Tech
We will be summarising the results of the conference, recalling the highlights and talking about future plans. Join us in the hall or online so you don't miss a thing!
We are actively adding to the program. Sign up for our newsletter to stay informed.