Observability Engineer
Full-time Mid-Senior LevelJob Overview
We are looking for an Observability Engineer to design, implement, and optimize enterprise observability solutions across applications, infrastructure, and cloud environments. This role focuses on monitoring, telemetry, automation, reliability engineering, and AIOps capabilities to improve system visibility, operational efficiency, and service reliability. The ideal candidate will have hands-on experience with observability platforms, cloud technologies, automation, and incident management practices while collaborating with engineering and operations teams to establish observability standards and best practices.
Responsibilities
- Design and implement end-to-end observability solutions across applications, infrastructure, and cloud environments.
- Develop dashboards, alerts, and telemetry frameworks to provide real-time visibility into system health and performance.
- Build automation solutions to eliminate repetitive operational tasks and improve efficiency.
- Enable runbook automation, self-healing capabilities, and automated incident triage workflows.
- Define and implement SLIs, SLOs, and alerting strategies to improve service reliability.
- Drive improvements in MTTD and MTTR through actionable alerts and telemetry-driven insights.
- Implement proactive monitoring, anomaly detection, and predictive alerting to identify issues before customer impact.
- Leverage AIOps capabilities for alert correlation and intelligent incident response.
- Integrate observability platforms with CI/CD pipelines, cloud services, and ITSM tools such as ServiceNow.
- Collaborate with engineering, product, and operations teams to establish observability standards and operational readiness practices.
- Mentor teams and drive adoption of observability best practices across the organization.
Make Your Resume Now