Site Reliability Engineer (SRE) – Azure & SaaS Platforms
Full-time Mid-Senior LevelJob Overview
Site Reliability Engineering (SRE) is what you get when you treat operations as a software problem. Our mission is to safeguard and optimize the systems behind our services—with a constant focus on availability, performance, scalability, and security.
We are looking for a seasoned Site Reliability Engineer to help evolve and support our Azure-based SaaS platform, ideally with exposure to integrated payments systems. You will focus on building scalable infrastructure, optimizing secure CI/CD pipelines, and enabling full observability and automation in a fast-paced, cloud-native environment.
Essential Duties and Responsibilities
- Design and maintain secure, scalable CI/CD pipelines, incorporating tools such as SonarCloud for code quality and security scanning
- Build resilient, automated cloud infrastructure on Azure (with limited exposure to AWS as needed)
- Optimize platform performance, reliability, and cost-efficiency across distributed systems and cloud workloads
- Contribute to architecture and automation strategies for PCI-compliant, integrated payments services
- Lead incident response efforts and implement automation to reduce recurrence of production issues
- Implement and maintain observability across the platform using Coralogix, OpenTelemetry, Azure Monitor, and related tools
- Write and maintain Infrastructure as Code using Terraform, Ansible, or equivalent tools
- Eliminate complexity and manual operations through thoughtful automation and platform tooling
- Collaborate across engineering teams to embed reliability, scalability, and security into the development lifecycle
- Participate in on-call rotations for production support
- Other responsibilities as assigned
Relevant Technologies
- Languages: Python, Bash, PowerShell, Java, C#
- Cloud Platforms: Microsoft Azure (primary), AWS (secondary)
- CI/CD & DevSecOps Tools: Azure DevOps, GitHub Actions, Bitbucket, Bamboo, SonarCloud, Snyk
- Infrastructure as Code: Terraform, Ansible, Spacelift
- Observability & Monitoring: Coralogix, OpenTelemetry, Azure App Insights, CloudWatch, APM tools
- Architecture: Kubernetes, Docker, microservices, serverless (Azure Functions)
Make Your Resume Now