Make Your Resume Now

Senior DevOps Engineer

Posted March 02, 2026

Job Overview

AppZen is the leader in autonomous spend-to-pay software. Its patented artificial intelligence accurately and efficiently processes information from thousands of data sources so that organizations can better understand enterprise spend at scale to make smarter business decisions. It seamlessly integrates with existing accounts payable, expense, and card workflows to read, understand, and make real-time decisions based on your unique spend profile, leading to faster processing times and fewer instances of fraud or wasteful spend. Global enterprises, including one-third of the Fortune 500, use AppZen’s invoice, expense, and card transaction solutions to replace manual finance processes and accelerate the speed and agility of their businesses. To learn more, visit us at www.appzen.com.
AppZen is the leader in autonomous spend-to-pay software. Its patented artificial intelligence accurately and efficiently processes information from thousands of data sources so that organizations can better understand enterprise spend at scale to make smarter business decisions. It seamlessly integrates with existing accounts payable, expense, and card workflows to read, understand, and make real-time decisions based on your unique spend profile, leading to faster processing times and fewer instances of fraud or wasteful spend. Global enterprises, including one-third of the Fortune 500, use AppZen’s invoice, expense, and card transaction solutions to replace manual finance processes and accelerate the speed and agility of their businesses. To learn more, visit us at www.appzen.com.

About the Role:

  • We are seeking a highly skilled Senior DevOps Engineer to lead the design, implementation, and continuous improvement of our cloud infrastructure, kubernetes, CI/CD pipelines, observability systems, and reliability practices. This role is critical in ensuring platform stability, scalability, security, and operational excellence across production and non-production environments. You will work closely with Engineering, Security, and Product teams to build resilient, automated, and high-performing infrastructure systems.

Key Responsibilities:

  • Infrastructure & Cloud Engineering: Design, implement, and manage scalable cloud infrastructure (AWS preferred)
  • Lead infrastructure-as-code initiatives (Terraform / CloudFormation)
  • Improve high availability, disaster recovery, and multi-region resilience
  • Optimize cloud cost and resource utilization
  • Kubernetes & Container Platform: Architect and manage production-grade Kubernetes clusters
  • Improve cluster reliability, auto-scaling, and performance
  • Implement workload monitoring, alerting, and SLO-based reliability standards
  • Enforce namespace isolation and resource governance
  • CI/CD & Automation: Design and optimize CI/CD pipelines (Jenkins, ArgoCD)
  • Implement zero-downtime deployment strategies
  • Automate environment provisioning (fully touchless builds with seed data)
  • Improve deployment reliability and rollback mechanisms
  • Observability & Reliability: Own monitoring, alerting, and logging strategy (Prometheus, Grafana, Datadog, etc.)
  • Ensure 100% monitoring coverage for critical services
  • Reduce Sev1/Sev2 incidents caused by infrastructure
  • Create and maintain runbooks (COPs) for incident response
  • Define SLOs, SLIs, and error budgets
  • Security & Compliance: Implement IAM best practices and least privilege access
  • Improve secrets management and credential rotation
  • Partner with security team on audits and compliance controls
  • Incident Management. Lead root cause analysis for major incidents
  • Drive postmortems and preventive improvements
  • Improve MTTR and overall operational maturity

Required Skills & Experience:

  • 6+ years in DevOps / SRE / Cloud Engineering
  • Strong experience with AWS (VPC, IAM, EC2, S3, RDS, EKS, etc.)
  • Deep Kubernetes experience (production clusters)
  • Strong understanding of networking and Linux systems
  • Experience with Infrastructure as Code (Terraform preferred)
  • Experience implementing monitoring & alerting systems (Datadog, prometheus.Grafana)
  • Strong scripting skills (Python / Bash )
  • Experience managing production systems with high availability requirements
  • Good understanding on databases like Postgres, MySQL
  • Strong communication written and verbal skills 
  • Ability to follow structured processes while being proactive in identifying improvements.
  • Analytical and problem-solving mindset.
  • Willingness to work in night shift on a long-term basis.

Ready to Apply?

Take the next step in your career journey

Stand out with a professional resume tailored for this role

Build Your Resume – It’s Free!