Make Your Resume Now

Platform Site Reliability Engineer

Posted January 16, 2026
Full-time Mid-Senior Level

Job Overview

Nexthink is looking for a strong Platform Engineer with SRE operations experience to strengthen our infrastructure and accelerate our ability to deploy, monitor, and scale systems effectively. As a SaaS provider, our customers rely on us to deliver a seamless, reliable, and scalable experience 24/7. This role needs to be located in West or Mountain Time Zone. 

Join Nexthink's vibrant team where cutting-edge technology meets innovation. Be a part of Nexthink's Digital Employee Experience technological revolution, ensuring our global customers enjoy a seamless user experience. Embrace the future with Nexthink in US; apply now and become a key player in our dynamic Platform Engineering/SRE organization.

What You'll Do:

  • Design, build, and maintain the infrastructure powering our multi-tenant SaaS platform with reliability, security, and scalability in mind.
  • Implement and manage cloud-native systems (AWS) using best-in-class tools and automation.
  • Operate and enhance Kubernetes clusters, deployment pipelines, and service meshes to support continuous delivery.
  • Establish and enforce SLOs, SLAs, and error budgets, and proactively address availability and performance issues.
  • Develop infrastructure as code (Terraform or similar) for repeatable and auditable provisioning.
  • Experience in programming solutions for Platform Tools such as for automation, monitoring, provisioning, using programming technologies.
  • Solid understanding of the network stack (TCP/IP, VPN, HTTP, SSL, routing, etc.), cloud topologies (VPC, Virtual Subnets, NACLS, NSG, ILB, ELB, etc.) and storage (S3, EBS, Azure Files etc).
  • Monitor system health, application performance, and user-facing SLAs using tools like Datadog, Prometheus, Grafana...
  • Be a main actor and improve incident response practices and help reduce mean time to detect (MTTD) and recover (MTTR). Experience in coordinating teams and persons to maintain a SLA.
  • Ability to troubleshoot, narrow down and fix incidents with minimal intervention of other functions.
  • Participate in a shared on-call rotation, responding to incidents, troubleshooting outages, and driving timely resolution and communication.
  • Work closely with software engineers to embed reliability and observability into every service.
  • Develop automated runbooks, health checks, and alerting to support reliable operations with minimal manual intervention.
  • Support automated testing, canary deployments, and rollback strategies to ensure safe, fast, and reliable releases.
  • Contribute to security best practices, compliance automation, and cost optimization.

Ready to Apply?

Take the next step in your career journey

Stand out with a professional resume tailored for this role

Build Your Resume – It’s Free!