MLOps Engineer - ML Platforms
Full-time Mid-Senior LevelJob Overview
Join our Data Science Platform team as an MLOps Engineer, where you'll build and maintain production ML infrastructure across AWS SageMaker and Databricks. You'll enable data scientists to efficiently develop, deploy, and monitor ML models at scale while establishing governance and best practices for our multiplatform ML ecosystem.
What You'll Do
- ML Infrastructure & Operations
- Build and maintain end-to-end ML pipelines for training, deployment, and monitoring across AWS SageMaker and Databricks
- Create Feature Stores using Feature Engineering and EDA
- Implement MLflow for experiment tracking, model versioning, and registry management across platforms
- Develop automated CI/CD pipelines for model deployment using Jenkins/GitHub Actions/GitLab CI
- Create reusable Python libraries and Terraform modules for standardized ML operations
Feature Engineering & Management
- Develop feature pipelines using Databricks Feature Store and SageMaker Feature Store
- Implement feature versioning, lineage tracking, and governance through Unity Catalog
- Build feature serving infrastructure for online and offline access
- Ensure feature discoverability and reusability across ML projects
ML Governance & Monitoring
- Leverage Unity Catalog for ML model governance, access control, and lineage tracking
- Implement Databricks Lakehouse Monitoring and SageMaker Model Monitor for drift detection
- Build dashboards and alerting for model performance, data quality, and prediction monitoring
- Deploy ML explainability frameworks (SHAP, LIME) for model interpretability
Platform Interoperability
- Design cross-platform ML workflows ensuring seamless integration between AWS, Databricks etc
- Implement deployment strategies: A/B testing, canary deployments, bluegreen rollouts
- Optimize distributed training and hyperparameter tuning (Ray Tune, Optuna, SageMaker HPO)
- Collaborate with data scientists to productionize models and establish best practices
What You Bring
Required Experience
- 4-7 years in ML engineering, MLOps, or data science engineering
- 1-2+ years hands on with Databricks and/or AWS SageMaker in production
- Proven track record deploying and maintaining production ML models at scale
Technical Skills
- ML Platforms: Databricks (Workflows, Jobs, Delta Lake, Unity Catalog), AWS SageMaker (Pipelines, Training, Endpoints, Feature Store)
- MLOps: MLflow (tracking, registry, deployment), model monitoring, drift detection, deployment strategies
- Programming: Strong Python (pandas, scikitlearn, PyTorch/TensorFlow, XGBoost), PySpark, SQL
- Infrastructure: Terraform (modular code, Databricks/AWS providers), Docker, Git
- Cloud: AWS (S3, Lambda, ECR, IAM, CloudWatch), distributed training frameworks
- CI/CD: Jenkins/GitHub Actions/GitLab CI, automated testing, deployment automation
ML & Data Science Knowledge
- ML algorithms, model evaluation, validation techniques, and statistical testing
- Feature engineering, hyperparameter optimization, and model optimization
- Understanding of model drift, retraining strategies, and ML explainability
- Experience with Feature Stores and feature governance
Make Your Resume Now