Senior Manager, Site Reliability Engineering (Job 3015480)
Category: Engineering
![ADT LLC Logo](https://cdn-static.findly.com/wp-content/uploads/sites/2794/2023/06/26140048/adt-large-logo.png)
This role requires you to be onsite three days a week at either our Irving, TX, Blue Bell, PA or Boca Raton, FL locations. The other two days are remote, offering the flexibility you need while still engaging in meaningful collaboration with cross-functional teams.
Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment Visa at this time.
What You’ll Do:
ADT is seeking a passionate and experienced Senior Manager of Site Reliability Engineering (SRE) to lead and grow our SRE team. You will be a critical leader in ensuring the reliability, performance, and scalability of our product platform and services, directly impacting customer experience and business success. This role requires a blend of technical expertise, leadership skills, and a deep understanding of SRE principles and practices.
- Build, mentor, and grow a high-performing SRE team, fostering a culture of collaboration, innovation, and ownership.
- Develop and implement a comprehensive SRE strategy aligned with business objectives and engineering roadmaps.
- Define and maintain service level objectives (SLOs), service level indicators (SLIs), and service level agreements (SLAs) for critical systems.
- Drive the adoption of automation and self-healing systems primarily running in GCP and AWS cloud platforms.
- Lead the implementation of robust monitoring, alerting, and observability solutions like Dynatrace or Datadog to proactively identify and resolve issues.
- Champion a data-driven approach to reliability, leveraging metrics and analytics to drive improvements.
- Develop and maintain incident management processes and playbooks.
- Lead incident response efforts for critical service outages, ensuring timely resolution and effective communication.
- Partner closely with development, operations, security, and product teams to ensure reliability is integrated throughout the software development lifecycle.
What You’ll Need:
- Four (4) year degree or equivalent experience.
- 8+ years of experience in Site Reliability Engineering, DevOps, or related operations roles.
- 3+ years of experience in a management or leadership role, leading and building SRE teams.
- Proficiency in scripting languages (Python, Bash, Go, etc.) and automation tools (Ansible, Terraform, Chef, Puppet).
- Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, New Relic, etc.).
- Familiarity with CI/CD pipelines and DevOps practices.
- Knowledge of networking, security, and database technologies.
- Experience with containerization and orchestration technologies (Docker, Kubernetes).
Compensation & Benefits:
The salary range for this role is $140,800 – $211,200 and is based on experience and qualifications.
Certain roles are eligible for annual bonus and may include equity. These awards are allocated based on company and individual performance.
We offer employees access to healthcare benefits, a 401(k) plan and company match, short-term and long-term disability coverage, life insurance, wellbeing benefits and paid time off among others. Employees accrue up to 120 hours in their first year. Your accrual rate increases after your first year. We also offer 6 paid holidays.
Anticipated application end date will be on 2/28/2024.