Make Your Resume Now

Senior DevOps and SRE Engineer

Posted October 10, 2025
Contract
Mid-Senior level

Job Overview

Job Title: Senior DevOps and SRE Engineer

Location: Washington, DC

Employment Type: Contract

About US

DMV IT Service LLC, founded in 2020, is a trusted IT consulting firm specializing in IT infrastructure optimization, cybersecurity, networking, and staffing solutions. We partner with clients to achieve technology goals through expert guidance, workforce support, and innovative solutions. With a client-focused approach, we also provide online training and job placements, ensuring long-term IT success.

Job Purpose

We are seeking an accomplished and technically skilled Senior DevOps and Site Reliability Engineer (SRE) to strengthen the reliability, scalability, and performance of cloud-based production systems. This senior-level role requires a blend of leadership, automation expertise, and hands-on technical ability to drive operational excellence and ensure high service availability in a dynamic environment.

Requirements

Key Responsibilities:

Deployment & Automation Engineering

  • Design, build, and optimize continuous integration and delivery (CI/CD) pipelines using GitHub Actions, Jenkins, or AWS CodePipeline.
  • Implement infrastructure automation and configuration management through Infrastructure-as-Code (IaC) tools such as Terraform, AWS CDK, or CloudFormation.
  • Develop automation scripts and self-service frameworks to streamline development and operations.
  • Write and maintain automation tools using programming languages such as Python, Go, or Java.

Site Reliability & Observability

  • Serve as an on-call responder for critical production systems, leading incident management and recovery operations.
  • Conduct post-incident reviews and implement long-term reliability improvements.
  • Define and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets to track performance and reliability.
  • Utilize advanced observability and monitoring tools (Dynatrace preferred, AppDynamics, ELK Stack) for system health and performance monitoring.
  • Apply distributed tracing and root cause analysis to detect and resolve performance bottlenecks.
  • Develop custom dashboards and automated alerts to enhance visibility and proactive issue detection.

Capacity, Performance & Cost Management

  • Create and maintain capacity models to ensure system scalability and readiness for growth.
  • Lead performance tuning and optimization across infrastructure and applications.
  • Implement cost-efficiency measures across cloud services to optimize spending.
  • Design and execute resiliency and performance testing strategies for production systems.

Security & Governance

  • Investigate and respond to security incidents with timely corrective actions.
  • Develop automated compliance and security validation workflows.
  • Contribute to the adoption of zero-trust architecture practices.
  • Apply ITIL-based methodologies and utilize ITSM tools (e.g., ServiceNow) for change and incident management.

Required Skills & Experience:

Education & Experience

  • Bachelor’s degree in Computer Science, Engineering, or a related technical discipline.
  • 5–8 years of experience in DevOps, Site Reliability Engineering (SRE), or Platform Engineering.
  • At least 3 years of experience managing and optimizing high-availability production environments.
  • Proven experience leading complex technical initiatives from design through deployment.

Technical Expertise

  • Advanced proficiency with AWS or other major cloud platforms.
  • Deep understanding of cloud infrastructure, networking, and core services.
  • Expertise with Infrastructure-as-Code tools (Terraform, AWS CDK, CloudFormation).
  • Strong knowledge of observability tools, especially Dynatrace.
  • Proficient in programming languages such as Python, Go, or Java.
  • Familiarity with relational, cloud-native, and NoSQL databases.

Professional & Leadership Skills

  • Excellent leadership and mentoring capabilities with the ability to guide technical teams.
  • Strong collaboration skills with an ability to influence across teams and departments.
  • Exceptional documentation and reporting skills, including Root Cause Analysis and process documentation.
  • Willingness to participate in on-call rotations and support production environments during non-standard hours.

Ready to Apply?

Take the next step in your career journey

Stand out with a professional resume tailored for this role

Create Resume