Associate Staff Engineer, Devops

Nagarro1

India

Posted January 09, 2026

Full-time Not Applicable

Job Overview

Requirement:

Experience: 5+ years
Strong experience in DevOps or Site Reliability Engineering (SRE) roles.
Strong knowledge of Docker, Kubernetes, Terraform, and CI/CD pipelines.
Hands-on experience with AWS, Azure, or other cloud platforms.
Familiarity with GPU infrastructure and ML workloads is a plus.
Good understanding of monitoring and logging systems (Prometheus, Grafana).
Ability to collaborate with ML teams for optimized inference and deployment.
Strong troubleshooting and problem-solving skills in high-scale environments.
Knowledge of infrastructure security best practices, cost optimization, and performance tuning.
Exposure to vector databases and AI/ML deployment pipelines is highly desirable.

Responsibilities:

Maintain and manage Kubernetes clusters, AWS/Azure environments, and GPU infrastructure for high-performance workloads.
Design and implement CI/CD pipelines for seamless deployments and faster release cycles.
Set up and maintain monitoring and logging systems using Prometheus and Grafana to ensure system health and reliability.
Support vector database scaling and model deployment for AI/ML workloads.
Collaborate with ML engineering teams to optimize inference performance and resource utilization.
Ensure high availability, security, and scalability of infrastructure across multiple environments.
Automate infrastructure provisioning and configuration using Terraform and other IaC tools.
Troubleshoot production issues and implement proactive measures to prevent downtime.
Continuously improve deployment processes and infrastructure reliability through automation and best practices.
Participate in architecture reviews, capacity planning, and disaster recovery strategies.
Drive cost optimization initiatives for cloud resources and GPU utilization.
Stay updated with emerging technologies in cloud-native, AI infrastructure, and DevOps automation.