Profile
Platform / SRE engineer with deep expertise in automation, observability and infrastructure for ML workloads. Specialized in high-load local model inference (vLLM/Triton), GPU utilization optimization and event-driven RAG systems. Product-minded: I do not only configure infrastructure, but also build internal automation tools, CLI utilities and lightweight client interfaces for monitoring and operating systems end-to-end.
Tech Stack
Platform & DevOps
Backend & Observability
Internal Tooling & Client Interfaces
Services
AI Infrastructure & Inference Serving Local high-load LLM inference with vLLM/Ollama, GPU observability and production-grade operations. Details
I design production-like inference environments for local models: containerization, CUDA/GPU monitoring, autoscaling, latency/throughput metrics and cost-aware operations.
MLOps and RAG Platforms RAG pipelines, vector search, agent workflows and model delivery into Kubernetes. Details
I build event-driven AI pipelines with FastAPI, n8n, Supabase and pgvector/Qdrant: indexing, retrieval, orchestration, quality signals and repeatable deployments.
Platform Engineering & DevOps Kubernetes/k3s, Docker, GitLab CI, Ansible, Linux and reproducible delivery pipelines. Details
I build internal platforms for teams: dev/prod environments, IaC automation, GitLab CI, release flows, containerization, secrets and operational runbooks.
Observability and Data Platforms Prometheus, Grafana, VictoriaMetrics, ClickHouse, ELK, Airflow, PostgreSQL and MinIO/S3. Details
I implement metrics, logs, alerts and long-term storage, design data pipelines and reduce MTTR through clear dashboards and operational signals.
End-to-End Internal Tooling CLI utilities, dashboards and lightweight client interfaces for complex platforms. Details
I close tasks end-to-end: from backend/API and automation to internal Kotlin/Flutter/Web tools for diagnostics, telemetry and system management.