Senior Platform Engineer/SRE - Tech Lead Critical Infrastructure Transformation
Job Overview
Build the internal platform that powers our engineering teams, delivering mission-critical software to 4,000+ cloud hosting providers worldwide.
CloudLinux powers 4,000+ hosting providers managing millions of websites globally. Our infrastructure team is at a critical inflection point – moving from 8+ years of technical debt to building a modern platform. This isn't a typical SRE role; it's a chance to architect the future of infrastructure that cannot fail.
Where we are: Legacy systems, reactive operations, bus factor = 1. OpenNebula bottlenecks blocking releases. 70% time on firefighting.
Where we're going: Self-service platform, Infrastructure as Code, proactive engineering. You'll be one of 2-3 senior engineers leading this transformation alongside a new Infrastructure Director with full B-level support.
What You'll Actually Do
Stabilize & Assess:
- Deep dive into OpenNebula issues with the existing team
- Map critical dependencies and single points of failure
- Implement quick wins (automated VM cleanup, monitoring gaps)
- Begin documenting undocumented systems
Build Foundation:
- Leading the design and development of an internal development platform (IDP)
- Implement GitOps for critical workflows
- Establish SLIs/SLOs for core services
- Create runbooks for top incidents
Transform Platform:
- Architect self-service Internal Developer Platform
- Drive Infrastructure as Code to 60%+ coverage
- Eliminate single points of failure
- Drive development and implementation of complex architectural decisions
Technical Stack You'll Transform
Current:
- Virtualization: OpenNebula (main bottleneck), oVirt/OpenStack/CloudStack, KVM
- Storage: Ceph (recently stabilized), Cephadm, Rook
- Network: Juniper
- Bare metal (3 Datacenters) + AWS + Google Cloud + Azure
- Automation: ~5% Terraform coverage, manual operations dominant
- CI/CD: Gitlab, Jenkins, Gerrit, Github
Your Tools for Transformation:
- Kubernetes & KubeVirt and/or all necessary
- Terraform/Terragrunt + Ansible
- GitOps (ArgoCD/Flux)
- Python/Go for custom tooling
- Modern observability stack
Requirements
To thrive in this role, we are looking for someone who has:
- Migrated legacy systems to modern platforms at scale
- Strong Kubernetes production experience (multi-tenant, federation)
- Infrastructure as Code expertise (Terraform/Ansible in production)
- Linux at scale (RHEL/CentOS/AlmaLinux, 1000+ servers)
- Network fundamentals, underlay, overlay, (EVPN, BGP, VXLAN, DNS, network architecture & segmentation, native pod networking at scale)
- Proven ability to work independently with minimal documentation
- Experience building self-service platforms
- English B2+ and excellent documentation skills
Critical Mindset:
- Comfortable with ambiguity and technical debt
- Pragmatic: know when to fix vs. replace vs. work around
- Can balance firefighting with strategic improvements
- Strong opinions, loosely held
- Teaching mentality – you'll help upskill the team
What Makes You Successful Here:
- You'll have significant technical decision-making power and direct impact
- New Infrastructure Director + B-level backing for transformation
- Approved investment in people and technology
- Full authority to simplify and modernize
- Protected time for strategic work, not just operations
The Opportunity
This isn't about maintaining the status quo. You'll:
- Define infrastructure strategy affecting 4,000+ companies
- Build an internal development platform
- Lead technical transformation with real budget and support
- Become the principal architect of a modern platform
- Work directly with the Infrastructure Director
- Shape how critical infrastructure software gets delivered globally
Benefits
What's in it for you?
- Competitive senior-level compensation.
- A focus on professional development.
- Interesting and challenging projects.
- Fully remote work with flexible working hours, which allows you to schedule your day and work from any location worldwide.
- Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
- Compensation for private medical insurance.
- Co-working and gym/sports reimbursement.
- Budget for education.
- The opportunity to receive a reward for the most innovative idea that the company can patent.
Apply If You:
- Thrive in high-impact, high-autonomy environments
- Want to transform, not just maintain
- Can see through chaos to architectural solutions
- Are excited by the challenge, not scared by the current state
- Believe infrastructure should be invisible when working, invaluable when measured
We're specifically looking for someone who has successfully navigated similar transformations. If you've only worked in already-stable environments, this role will be challenging. But if you've turned chaos into platform excellence before – let's talk.
By applying for this position, you consent to the processing of your personal data as described in our Privacy Policy (https://cloudlinux.com/candidate-privacy-notice), which provides detailed information on how we maintain and handle your data.