RTI JSC
Middle MLOps
- Designed and deployed a production-ready fault-tolerant architecture for high-load LLM inference on vLLM and Ollama.
- Built MLOps processes from scratch: CI/CD pipelines for delivering ML models to Kubernetes (k3s), observability via Prometheus/Grafana.
- Implemented a RAG system on Qdrant/pgvector for semantic search to improve LLM answer accuracy.