Make Your Resume Now

Senior Platform Engineer/SRE - Tech Lead Critical Infrastructure Transformation

Posted September 26, 2025
Full-time
Mid-Senior level

Job Overview

Build the internal platform that powers our engineering teams, delivering mission-critical software to 4,000+ cloud hosting providers worldwide.

CloudLinux powers 4,000+ hosting providers managing millions of websites globally. Our infrastructure team is at a critical inflection point – moving from 8+ years of technical debt to building a modern platform. This isn't a typical SRE role; it's a chance to architect the future of infrastructure that cannot fail.

Where we are: Legacy systems, reactive operations, bus factor = 1. OpenNebula bottlenecks blocking releases. 70% time on firefighting.

Where we're going: Self-service platform, Infrastructure as Code, proactive engineering. You'll be one of 2-3 senior engineers leading this transformation alongside a new Infrastructure Director with full B-level support.

What You'll Actually Do

Stabilize & Assess:

  • Deep dive into OpenNebula issues with the existing team
  • Map critical dependencies and single points of failure
  • Implement quick wins (automated VM cleanup, monitoring gaps)
  • Begin documenting undocumented systems

Build Foundation:

  • Leading the design and development of an internal development platform (IDP)
  • Implement GitOps for critical workflows
  • Establish SLIs/SLOs for core services
  • Create runbooks for top incidents

Transform Platform:

  • Architect self-service Internal Developer Platform
  • Drive Infrastructure as Code to 60%+ coverage
  • Eliminate single points of failure
  • Drive development and implementation of complex architectural decisions

Technical Stack You'll Transform

Current:

  • Virtualization: OpenNebula (main bottleneck), oVirt/OpenStack/CloudStack, KVM
  • Storage: Ceph (recently stabilized), Cephadm, Rook
  • Network: Juniper
  • Bare metal (3 Datacenters) + AWS + Google Cloud + Azure
  • Automation: ~5% Terraform coverage, manual operations dominant
  • CI/CD: Gitlab, Jenkins, Gerrit, Github

Your Tools for Transformation:

  • Kubernetes & KubeVirt and/or all necessary
  • Terraform/Terragrunt + Ansible
  • GitOps (ArgoCD/Flux)
  • Python/Go for custom tooling
  • Modern observability stack

Requirements

To thrive in this role, we are looking for someone who has:

  • Migrated legacy systems to modern platforms at scale
  • Strong Kubernetes production experience (multi-tenant, federation)
  • Infrastructure as Code expertise (Terraform/Ansible in production)
  • Linux at scale (RHEL/CentOS/AlmaLinux, 1000+ servers)
  • Network fundamentals, underlay, overlay, (EVPN, BGP, VXLAN, DNS, network architecture & segmentation, native pod networking at scale)
  • Proven ability to work independently with minimal documentation
  • Experience building self-service platforms
  • English B2+ and excellent documentation skills

Critical Mindset:

  • Comfortable with ambiguity and technical debt
  • Pragmatic: know when to fix vs. replace vs. work around
  • Can balance firefighting with strategic improvements
  • Strong opinions, loosely held
  • Teaching mentality – you'll help upskill the team

What Makes You Successful Here:

  • You'll have significant technical decision-making power and direct impact
  • New Infrastructure Director + B-level backing for transformation
  • Approved investment in people and technology
  • Full authority to simplify and modernize
  • Protected time for strategic work, not just operations

The Opportunity

This isn't about maintaining the status quo. You'll:

  • Define infrastructure strategy affecting 4,000+ companies
  • Build an internal development platform
  • Lead technical transformation with real budget and support
  • Become the principal architect of a modern platform
  • Work directly with the Infrastructure Director
  • Shape how critical infrastructure software gets delivered globally

Benefits

What's in it for you?

  • Competitive senior-level compensation.
  • A focus on professional development.
  • Interesting and challenging projects.
  • Fully remote work with flexible working hours, which allows you to schedule your day and work from any location worldwide.
  • Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
  • Compensation for private medical insurance.
  • Co-working and gym/sports reimbursement.
  • Budget for education.
  • The opportunity to receive a reward for the most innovative idea that the company can patent.

Apply If You:

  • Thrive in high-impact, high-autonomy environments
  • Want to transform, not just maintain
  • Can see through chaos to architectural solutions
  • Are excited by the challenge, not scared by the current state
  • Believe infrastructure should be invisible when working, invaluable when measured

We're specifically looking for someone who has successfully navigated similar transformations. If you've only worked in already-stable environments, this role will be challenging. But if you've turned chaos into platform excellence before – let's talk.

By applying for this position, you consent to the processing of your personal data as described in our Privacy Policy (https://cloudlinux.com/candidate-privacy-notice), which provides detailed information on how we maintain and handle your data.

Ready to Apply?

Take the next step in your career journey

Stand out with a professional resume tailored for this role

Create Resume