MLOps Engineer (Python Backend + AI/GenAI Experience)
Full-time Mid-Senior LevelJob Overview
We are looking for a senior MLOps Engineer with strong Python backend engineering expertise to design, build, and manage scalable ML and AI platforms. The ideal candidate has hands-on experience with AWS SageMaker, ML pipelines, Infrastructure as Code, GenAI/RAG workflows, and containerized deployments.
You will collaborate closely with Data Scientists, ML Engineers, and AI Engineers to build robust pipelines, automate workflows, deploy models at scale, and support end-to-end ML lifecycle in production.
Key Responsibilities
MLOps & ML Pipeline Engineering
Build, maintain, and optimize ML pipelines in AWS (SageMaker, Lambda, Step Functions, ECR, S3).
Manage model training, evaluation, versioning, deployment, and monitoring using MLOps best practices.
Implement CI/CD for ML workflows using GitHub Actions / CodePipeline / GitLab CI.
Set up and maintain Infrastructure as Code (IaC) using CloudFormation or Terraform.
Backend Engineering (Python)
Design and build scalable backend services using Python (FastAPI / Flask).
Build APIs for model inference, feature retrieval, data access, and microservices.
Develop automation scripts, SDKs, and utilities to streamline ML workflows.
AI/GenAI & RAG Workflows (Good to Have / Nice to Have)
Implement RAG pipelines, vector indexing, and document retrieval workflows.
Build and deploy multi-agent systems using frameworks like LangChain, CrewAI, or Google ADK.
Apply prompt engineering strategies for optimizing LLM behavior.
Integrate LLMs with existing microservices and production data.
Model Deployment & Observability
Deploy models using Docker + Kubernetes (EKS/ECS) or SageMaker endpoints.
Implement monitoring for model drift, data drift, usage patterns, latency, and system health.
Maintain logs, metrics, and alerts using CloudWatch, Prometheus, Grafana, or ELK.
Collaboration & Documentation
Work directly with data scientists to support experiments, deployments, and re-platforming efforts.
Document design decisions, architectures, and infrastructure using Confluence, GitHub Wikis, or architectural diagrams.
Provide guidance and best practices for reproducibility, scalability, and cost optimization.
Make Your Resume Now