Site Reliability Engineer (SRE)

Koreai-india-careers

India

Posted March 25, 2026

Salaried, full-time

Job Overview

JD – Site Reliability Engineer (SRE)

Kore.ai is a pioneering force in enterprise AI transformation, empowering organizations through our

comprehensive agentic AI platform. With innovative offerings across "AI for Service," "AI for Work," and

"AI for Process," we're enabling over 400+ Global 2000 companies to fundamentally reimagine their

operations, customer experiences, and employee productivity.

Our end-to-end platform enables enterprises to build, deploy, manage, monitor, and continuously

improve agentic applications at scale. We've automated over 1 billion interactions every year with voice

and digital AI in customer service and transformed employee experiences for tens of thousands of

employees through productivity and AI-driven workflow automation.

Recognized as a leader by Gartner, Forrester, IDC, ISG, and Everest, Kore.ai has secured Series D

funding of $150M, including strategic investment from NVIDIA to drive Enterprise AI innovation.

Founded in 2014 and headquartered in Florida, we maintain a global presence with offices in India, UK,

Germany, Korea, and Japan.

About the Role

We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on Kubernetes
ecosystems to join our growing team. You will play a critical role in designing, operating and scaling our
cloud-native infrastructure, ensuring high availability, performance and resilience of our production
services.
The ideal candidate has deep hands-on expertise in Kubernetes orchestration, advanced autoscaling
strategies, GitOps workflows, infrastructure-as-code provisioning and modern observability practices.
You will work closely with engineering and product support teams to embed reliability into every layer of
our stack.

RESPONSIBILITIES

Design, manage, and optimize large-scale Kubernetes clusters (EKS/AKS/GKE or selfmanaged)
for reliability, security and cost efficiency.
Implement and maintain advanced autoscaling solutions using HPA, VPA and event-driven
scaling with KEDA.
Provision and manage cloud infrastructure and Kubernetes resources declaratively using
Crossplane for multi-cloud/hybrid environments.
Drive GitOps practices by owning and enhancing Argo CD deployments, application sets, and
progressive delivery workflows (canary, blue-green).
Define, monitor, and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs),
and error budgets using observability data.
Build and maintain comprehensive observability pipelines with tools using OpenTelemetry or
eBPF.
Participate in on-call rotations, lead incident response, perform root cause analysis and
facilitate blameless postmortems.
Collaborate on capacity planning, chaos engineering experiments and disaster recovery
strategies.

EXPERIENCE REQUIRED

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
8+ years of experience in SRE, Platform Engineering or DevOps roles with a heavy Kubernetes
focus.
Expert-level knowledge of Kubernetes, including custom resource definitions, operators,
networking (CNI), storage (CSI), and security (Pod Security Standards, OPA/Gatekeeper).
Good production experience with:
Autoscaling: HPA (metrics-based), VPA, and KEDA (event-driven scaling for queues,
databases, etc.).
Crossplane for provisioning cloud resources and composing control planes.
Argo CD for declarative GitOps deployments, multi-cluster management, and application
lifecycle.
Strong hands-on experience with observability platforms, particularly distributed tracing and
performance analytics or eBPF-based full-stack observability.
Proficiency in Infrastructure as Code tools (Terraform, Helm, Jsonnet/Kustomize).
Programming skills in Python, Go, or similar for automation and tooling.
Solid understanding of CI/CD pipelines (GitHub Actions, GitLab CI, Argo Workflows).

PREFERRED SKILLS

Experience with multi-region/multi-cluster Kubernetes architectures and service meshes
(Istio/Linkerd).
Contributions to or deep usage of chaos engineering tools (Chaos Mesh, Litmus).
Familiarity with cost optimization tools (Kubecost, CloudZero) and FinOps practices.
Relevant certifications (CKA/CKS, Google Professional Cloud Architect, etc.).
Experience implementing SLO-driven development and reliability budgeting.

EDUCATION QUALIFICATION

Bachelor’s degree in computer science, Engineering, or equivalent practical experience.