hero
companies
Jobs

Site Reliability Engineer (AI Forms Platform) ›

Filevine

Filevine

Software Engineering, Data Science
United States
Posted on Jan 7, 2026

Responsibilities

  • Infrastructure as Code: Architect and deploy secure, scalable infrastructure using Terraform, CloudFormation, or similar tools to support the new Forms Platform.
  • Availability & Uptime: Ensure the platform meets strict SLA requirements for enterprise clients, minimizing downtime and "P1 incidents".
  • Observability: Implement comprehensive monitoring, logging, and alerting (Datadog, New Relic, etc.) to provide deep visibility into AI model performance and system health.
  • Security & Compliance: Design architecture that aligns with SOC standards and ensures proper handling of PII/PHI data and audit trails for model outputs.
  • Release Engineering: Build and maintain efficient CI/CD pipelines to support the "tapering" of legacy systems and the rapid deployment of new features.
  • Incident Response: Lead incident response efforts for the Forms Platform and conduct post-mortems to drive continuous improvement.
  • Automation: Aggressively automate manual operations tasks using scripting (Python/Go) and AI tools to reduce toil.

Qualifications

  • Bachelor’s degree in Computer Science, Computer Engineering, or related field.
  • 3+ years of SRE or DevOps experience, specifically in high-availability production environments.
  • Cloud Proficiency: Deep expertise in AWS or Azure ecosystem, including container orchestration (Kubernetes/Docker).
  • Security Mindset: Experience implementing security best practices (SOC2, HIPAA) in a cloud environment.
  • Scripting: Proficiency in Python, Go, or Bash for automation.
  • Agile/Scrum: 1 to 3 years experience with scrum/agile development methodologies.
  • AI Adaptability: Willingness and ability to use AI/LLMs to accelerate infrastructure development and debugging.
  • Communication: Excellent verbal and written communication skills to document architecture and incident reports