Sergey Tovpeko

Sergey Tovpeko

Platform Engineer · MLOps · AI Infrastructure

Your expert in

Platform & AI Infrastructure Engineer

Experience

4 years 5 months

Profile

Platform / SRE engineer with deep expertise in automation, observability and infrastructure for ML workloads. Specialized in high-load local model inference (vLLM/Triton), GPU utilization optimization and event-driven RAG systems. Product-minded: I do not only configure infrastructure, but also build internal automation tools, CLI utilities and lightweight client interfaces for monitoring and operating systems end-to-end.

Tech Stack

AI Infrastructure & MLOps

vLLM Ollama CUDA optimization NVIDIA GPU Operator RAG pipelines pgvector Qdrant n8n

Platform & DevOps

Kubernetes (k3s) Docker GitLab CI Automated Delivery Pipelines Ansible Linux eBPF

Backend & Observability

Python (FastAPI) Go C# PostgreSQL ClickHouse Apache Airflow Prometheus Grafana VictoriaMetrics ELK Stack MinIO/S3

Internal Tooling & Client Interfaces

Kotlin (Jetpack Compose) Flutter (Dart) REST/WebSocket clients CLI utilities

Professional Experience

RTI JSC

Platform / DevOps Engineer

Apr 2025 – Apr 2026 · 1 year 1 month
  • Designed and deployed high-load LLM inference (Qwen) using vLLM and Ollama, optimized GPU utilization with CUDA and reached 150+ tokens/sec on a single GPU.
  • Implemented a scalable RAG pipeline with vector search (pgvector, Qdrant) and event-driven orchestration based on n8n and Supabase.
  • Deployed and configured a ClickHouse cluster from scratch for infrastructure metrics collection.
  • Automated ML model delivery into Kubernetes (k3s) and configured GPU metrics monitoring with Prometheus and Grafana.
  • Built internal cross-platform client applications for IoT telemetry monitoring and remote diagnostics of microcomputers.

Tech Stack: vLLM, Ollama, Qwen, CUDA, Kubernetes/k3s, Docker, Prometheus, Grafana, ClickHouse, pgvector, Qdrant, n8n, Supabase, Python/FastAPI, Kotlin, Flutter

Sber

DevOps / Automation Engineer

Feb 2024 – Apr 2025 · 1 year 3 months
  • Developed load-testing tooling: custom CLI wrappers and traffic generators in Go/Python.
  • Improved system stability under load with Chaos Engineering and stress tests (k6, wrk).
  • Created isolated Docker-based testing environments with emulation of heavy external services (Spark/Hadoop).
  • Reworked observability by adding business metrics and long-term storage in VictoriaMetrics and ClickHouse.

Tech Stack: Go, Python, k6, wrk, Chaos Engineering, Docker, Ansible, Prometheus, VictoriaMetrics, Grafana, ClickHouse, ELK Stack, Spark, Hadoop

Rosseti Digital JSC

DevOps / Data Platform Engineer

Jul 2023 – Feb 2024 · 8 months
  • Designed and operated a fault-tolerant process automation platform based on Apache Airflow and optimized DAG structure.
  • Implemented centralized logging and tracing on ELK Stack, reducing mean time to recovery (MTTR).
  • Containerized system components and standardized dev/prod environments with Docker.
  • Administered distributed data stores (PostgreSQL, MariaDB, MinIO/S3) and tuned their performance.

Tech Stack: Python, Apache Airflow, PostgreSQL, MariaDB, MinIO/S3, Docker, ELK Stack, Grafana, Linux

Franklins Burger

Fullstack Developer / Automation Engineer

Dec 2022 – Apr 2024 · 1 year 5 months
  • Built automated software update delivery infrastructure for endpoints in Yandex Cloud from scratch using Docker and Ansible.
  • Designed and implemented a distributed data bus for centralized point configuration and advertising management.
  • Developed server-side POS integration modules (IIKO) in C#/.NET and Python (FastAPI).
  • Designed and launched internal web tools for business automation, CRM systems and HR bots.

Tech Stack: Python/FastAPI, C#, .NET, PostgreSQL, Docker, Ansible, Yandex Cloud, IIKO SDK, JavaScript, Vue.js, Svelte

Platform / MLOps Projects

High-Performance ML Inference Serving

vLLM/Ollama inference platform

  • Local LLM deployment stand for Qwen/Llama based on vLLM/Ollama with production-like containerization.
  • Configured autoscaling, throughput/latency controls and CUDA core monitoring with Prometheus and Grafana.
  • Prepared Python tooling for smoke tests, health checks and operational inference diagnostics.

Tech Stack: vLLM, Docker, CUDA, Prometheus, Grafana, Python

AI-Driven Enterprise Automation Core

Agent workflows and RAG automation

  • Orchestration stack for agent workflows integrated with vector databases for contextual search (RAG).
  • Connected n8n workflows, FastAPI services and Supabase into an event-driven enterprise automation core.
  • Added vector storage, retrieval and context control for internal knowledge workflows.

Tech Stack: n8n, pgvector, Qdrant, FastAPI, Supabase

Multi-Platform CI/CD Automation Framework

Universal delivery automation template

  • Universal software delivery automation template for shorter release cycles and stable builds.
  • Includes containerization, automatic artifact signing, release notes generation and crash reporting integration.
  • Supports reproducible environments, quality gates and fast rollback paths for internal products.

Tech Stack: GitLab CI, Fastlane, Docker, Ansible, Gradle