Sr. SRE II ›
United States
Posted on Jun 20, 2026
What you will do
- Own and evolve observability strategy, including monitoring, alerting, dashboards, logging, and distributed tracing.
- Define and manage SLIs, SLOs, and reliability metrics.
- Lead incident response, postmortems, and continuous improvement initiatives.
- Improve MTTD and MTTR through automation and operational excellence.
- Integrate observability into CI/CD pipelines and software delivery workflows.
- Build and maintain reliable cloud infrastructure on AWS and Kubernetes.
- Mentor engineers and promote SRE best practices across the organization
What we are looking for
- 8+ years of experience in software engineering, infrastructure, or operations.
- 5+ years of Site Reliability Engineering experience.
- Deep expertise with observability platforms such as New Relic, Datadog, Dynatrace, Grafana, or Prometheus.
- Strong experience with monitoring, alerting, incident management, and reliability engineering practices.
- Hands-on experience with AWS, Kubernetes, and cloud-native technologies.
- Proficiency in Python, Bash, PowerShell, or similar scripting languages.
- Excellent communication and collaboration skill
Preferred Experience
- Leading observability platform implementations or migrations at scale.
- Building SLI/SLO frameworks and reliability programs.
- Experience with OpenTelemetry, distributed tracing, and modern observability architectures.