ALBUM

jobs · companies

Sr. SRE II ›

Filevine

United States

Posted on Jun 20, 2026

Own and evolve observability strategy, including monitoring, alerting, dashboards, logging, and distributed tracing.
Define and manage SLIs, SLOs, and reliability metrics.
Lead incident response, postmortems, and continuous improvement initiatives.
Improve MTTD and MTTR through automation and operational excellence.
Integrate observability into CI/CD pipelines and software delivery workflows.
Build and maintain reliable cloud infrastructure on AWS and Kubernetes.
Mentor engineers and promote SRE best practices across the organization

8+ years of experience in software engineering, infrastructure, or operations.
5+ years of Site Reliability Engineering experience.
Deep expertise with observability platforms such as New Relic, Datadog, Dynatrace, Grafana, or Prometheus.
Strong experience with monitoring, alerting, incident management, and reliability engineering practices.
Hands-on experience with AWS, Kubernetes, and cloud-native technologies.
Proficiency in Python, Bash, PowerShell, or similar scripting languages.
Excellent communication and collaboration skill

Leading observability platform implementations or migrations at scale.
Building SLI/SLO frameworks and reliability programs.
Experience with OpenTelemetry, distributed tracing, and modern observability architectures.