Senior Site Reliability Engineer

Tradingtechnologies

Czechia

Posted April 13, 2026

Permanent - Full Time

Job Overview

Trading Technologies builds professional trading software, infrastructure, and data solutions for a diverse community of professional traders and market participants. With over 500+ employees across Chicago, New York, London, Singapore, India, Japan, Australia and beyond, TT powers some of the world’s most sophisticated trading operations with low-latency, highly reliable technology.

The Senior Site Reliability Engineer (SRE) is a development-first role focused on coding, automation, and platform stability. The ideal candidate is a skilled software developer who values a balance between building new tools and maintaining operational excellence. To ensure this balance, we adhere to the following 4 week rotation: 3 weeks pure development cycle, 1 week dispatch cycle responding to alerts and mitigating production issues.

What you’ll be doing

Platform Reliability & Automation

Design, build, and maintain advanced telemetry and automation tooling to monitor global platform health and trigger automated corrective actions.
Own and improve incident response runbooks and automated remediation workflows, reducing MTTR over time.
Participate in on-call rotations, diagnosing and resolving system issues and escalations from the customer support team (this is an internal-facing role, not customer-facing).
Drive continuous improvement through post-incident reviews (PIRs) and engineering initiatives that eliminate classes of failure.

Software Development

Develop advanced monitoring software in python and GoLang.
Contribute to full-stack troubleshooting across our React.js frontend, Python backend services (Flask, Litestar, Celery), and AWS-managed Kafka (MSK/ESK).
Write infrastructure-as-code using Terraform, building reusable modules and submodules to provision and manage cloud resources.

Key Responsibilities

Development Cycle (3 Weeks): Focus on coding advanced telemetry, implementing automation strategies, and building tools that proactively monitor platform health.
Operations Cycle (1 Week): Rotate into an operational role to swiftly diagnose system issues and handle internal escalations, ensuring continuous platform stability.
Continuous Improvement: Use insights gained during the operations week to develop automated solutions that reduce future incidents and optimize system performance.

Skills, Knowledge and Expertise

Essential Skills & Experience

Software Development

Extensive professional Python development experience, including object-oriented design and multi-threaded applications.
Substantial hands-on Terraform experience—able to author modules and submodules from scratch.
Experience building or supporting React.js applications.

Cloud & Infrastructure

Substantial hands-on AWS experience across EC2, Lambda, CloudWatch, EKS, ECS, MSK, ELB, RDS, DynamoDB, and SQS.
Solid Linux systems experience, including monitoring critical system health parameters.

Desirable Skills & Experience

Familiarity with trading systems, financial markets, or low-latency environments
AWS Associate-level certification or higher (preferred but not required).
Experience with chaos engineering, SLO/SLI frameworks, or formal reliability programs.
Prior on-call experience at a high-traffic or mission-critical platform.
Working understanding of TCP/IP, DNS, HTTP, and load balancing concepts
Experience with Golang, or a clear eagerness and ability to learn it quickly.

Senior Site Reliability Engineer

Job Overview

What you’ll be doing

Skills, Knowledge and Expertise

Ready to Apply?