Site Reliability Engineer (Salt Lake City) ›
Filevine
Software Engineering
Salt Lake City, UT, USA
Posted on Dec 23, 2025
Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS.
- Automate infrastructure provisioning, deployment, and monitoring using IaC tools (e.g., Terraform, CloudFormation).
- Monitor system health, performance, and capacity; proactively identify and resolve issues.
- Participate in on-call rotation to respond to and resolve production incidents.
- Collaborate with development teams to improve observability, logging, and alerting.
- Drive continuous improvement in reliability through chaos engineering, load testing, and post-incident reviews.
- Ensure security best practices and compliance requirements are embedded in our infrastructure.
- Optimize costs while maintaining performance and reliability standards.
Qualifications
- 3-5 years of experience in Site Reliability Engineering, DevOps, or similar roles.
- Deep expertise with AWS services (e.g., EC2, ECS/EKS, RDS, Lambda, S3, VPC, CloudWatch, etc.).
- Proficiency in infrastructure as code (Terraform preferred) and CI/CD pipelines.
- Strong scripting/programming skills (e.g., Python, Bash, Go).
- Experience with monitoring and observability tools (e.g., Datadog, Prometheus, Grafana, ELK stack).
- Solid understanding of networking, Linux systems, and container orchestration (Dockers).
- Proven ability to troubleshoot complex, distributed systems issues.
- Bachelor's degree in Computer Science, Engineering, or equivalent experience.
Nice-to-have
- AWS certifications (e.g., Solutions Architect, DevOps Engineer).
- Experience in SaaS environments or regulated industries (legal tech a plus).
- Familiarity with microservices, serverless architectures, and database reliability.
- Passion for building resilient systems that support high-stakes workflows.