Make Your Resume Now

Senior Site Reliability Engineer

Posted April 13, 2026
Permanent - Full Time

Job Overview

Trading Technologies builds professional trading software, infrastructure, and data solutions for a diverse community of professional traders and market participants. With over 500+ employees across Chicago, New York, London, Singapore, India, Japan, Australia and beyond, TT powers some of the world’s most sophisticated trading operations with low-latency, highly reliable technology.

The Senior Site Reliability Engineer (SRE) is a development-first role focused on coding, automation, and platform stability. The ideal candidate is a skilled software developer who values a balance between building new tools and maintaining operational excellence. To ensure this balance, we adhere to the following 4 week rotation: 3 weeks pure development cycle, 1 week dispatch cycle responding to alerts and mitigating production issues. 

What you’ll be doing


Platform Reliability & Automation
  • Design, build, and maintain advanced telemetry and automation tooling to monitor global platform health and trigger automated corrective actions.
  • Own and improve incident response runbooks and automated remediation workflows, reducing MTTR over time.
  • Participate in on-call rotations, diagnosing and resolving system issues and escalations from the customer support team (this is an internal-facing role, not customer-facing).
  • Drive continuous improvement through post-incident reviews (PIRs) and engineering initiatives that eliminate classes of failure.
Software Development
  • Develop advanced monitoring software in python and GoLang.
  • Contribute to full-stack troubleshooting across our React.js frontend, Python backend services (Flask, Litestar, Celery), and AWS-managed Kafka (MSK/ESK).
  • Write infrastructure-as-code using Terraform, building reusable modules and submodules to provision and manage cloud resources.
Key Responsibilities
  • Development Cycle (3 Weeks): Focus on coding advanced telemetry, implementing automation strategies, and building tools that proactively monitor platform health.
  • Operations Cycle (1 Week): Rotate into an operational role to swiftly diagnose system issues and handle internal escalations, ensuring continuous platform stability.
  • Continuous Improvement: Use insights gained during the operations week to develop automated solutions that reduce future incidents and optimize system performance.


Skills, Knowledge and Expertise


Essential Skills & Experience

Software Development
  • Extensive professional Python development experience, including object-oriented design and multi-threaded applications.
  • Substantial hands-on Terraform experience—able to author modules and submodules from scratch.
  • Experience building or supporting React.js applications.

Cloud & Infrastructure
  • Substantial hands-on AWS experience across EC2, Lambda, CloudWatch, EKS, ECS, MSK, ELB, RDS, DynamoDB, and SQS.
  • Solid Linux systems experience, including monitoring critical system health parameters.

Desirable Skills & Experience
  • Familiarity with trading systems, financial markets, or low-latency environments
  • AWS Associate-level certification or higher (preferred but not required).
  • Experience with chaos engineering, SLO/SLI frameworks, or formal reliability programs.
  • Prior on-call experience at a high-traffic or mission-critical platform. 
  • Working understanding of TCP/IP, DNS, HTTP, and load balancing concepts
  • Experience with Golang, or a clear eagerness and ability to learn it quickly.

Ready to Apply?

Take the next step in your career journey

Stand out with a professional resume tailored for this role

Build Your Resume – It’s Free!