MLOps Engineer (Python Backend + AI/GenAI Experience)
Full-time Mid-Senior LevelJob Overview
We are looking for a senior MLOps Engineer with strong Python backend engineering expertise to design, build, and manage scalable ML and AI platforms. The ideal candidate has hands-on experience with AWS SageMaker, ML pipelines, Infrastructure as Code, GenAI/RAG workflows, and containerized deployments.
You will collaborate closely with Data Scientists, ML Engineers, and AI Engineers to build robust pipelines, automate workflows, deploy models at scale, and support end-to-end ML lifecycle in production.
Key Responsibilities
MLOps & ML Pipeline Engineering
- Build, maintain, and optimize ML pipelines in AWS (SageMaker, Lambda, Step Functions, ECR, S3).
- Manage model training, evaluation, versioning, deployment, and monitoring using MLOps best pratices.
- Implement CI/CD for ML workflows using GitHub Actions / CodePipeline / GitLab CI.
- Set up and maintain Infrastructure as Code (IaC) using CloudFormation or Terraform.
Backend Engineering (Python)
- Design and build scalable backend services using Python (FastAPI/Flask).
- Build APIs for model inference, feature retrieval, data access, and microservices.
- Develop automation scripts, SDKs, and utilities to streamline ML workflows.
AI/GenAI & RAG Workflows (Good to Have / Nice to Have)
- Implement RAG pipelines, vector indexing, and document retrieval workflows.
- Build and deploy multi-agent systems using frameworks like LangChain, CrewAI, or Google ADK.
- Apply prompt engineering strategies for optimizing LLM behavior.
- Integrate LLMs with existing microservices and production data.
Model Deployment & Observability
- Deploy models using Docker + Kubernetes (EKS/ECS) or SageMaker endpoints.
- Implement monitoring for model drift, data drift, usage patterns, latency, and system health.
- Maintain logs, metrics, and alerts using CloudWatch, Prometheus, Grafana, or ELK.
Collaboration & Documentation
- Work directly with data scientists to support experiments, deployments, and re-platforming efforts.
- Document design decisions, architectures, and infrastructure using Confluence, GitHub Wikis, or architectural diagrams.
- Provide guidance and best practices for reproducibility, scalability, and cost optimization.
Make Your Resume Now