Search by job, company or skills

Acquire Intelligence

Head of Site Reliability Engineering

new job description bg glownew job description bg glownew job description bg svg
  • Posted 11 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We're an award-winning global outsourcer providing contact center and back office services on behalf of our global clients. Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!

Acquire Intelligence exists to help businesses unlock smarter ways of working. We believe that by combining the best of people, process, and automation, companies can grow faster and operate with greater confidence. Our purpose is to remove complexity, improve performance, and drive intelligent transformation for organizations around the world.

As an Acquire Intelligence employee, your role is vital in achieving and exceeding individual and team targets that support company objectives, while building and maintaining stakeholder relationships. You're also responsible for complying with and enforcing procedures aligned with our information security policies.

As a values-led organization, we expect all our team members to exemplify our four values: Curious and Clever, Entrepreneurial Energy, Fast with Intent, and Laugh and Learn.

A SNAPSHOT OF YOUR ROLE

Leadership & People Management

Build an SRE team of initially 3-6 engineers: goal setting, career development, regular 1:1s, and annual performance reviews.

Ensure operational system knowledge is captured and that the team is kept fresh on operating and troubleshooting procedures.

Recruit, onboard, and mentor new engineers; scale the team to meet business growth.

Maintain an inclusive, psychologicallysafe culture centered on learning and continuous improvement.

Own, and participate in, the oncall roster for the team, ensuring equitable rotations and sustainable workloads.

Service Level Management & Reliability

Define, monitor, and enforce SLOs and error budgets across all production systems.

Continuously analyse errorbudget burn to halt risky deployments and guide capacity decisions.

Champion a datadriven reliability mindset throughout engineering and product teams.

Infrastructure Automation & Management

Architect and implement InfrastructureasCode in Pulumi/TypeScript for AWS resources (EKS, MSK, Single Store, MongoDB, S3, etc.).

Lead largescale migration or modernization projects (e.g., Kubernetes upgrades, multiAZ resilience).

Eliminate toilany manual task >2 engineerdays/quarter or frequently repeated becomes an automation candidate.

Incident Response & PostMortem Leadership

Participate in on-call monitoring and response roster.

Serve as escalation point and incident commander.

Ensure postmortems are published within 48 hours with actionable never again tasks tracked to closure.

Improve runbooks and gameday exercises; train engineers on incident command principles.

Security & Compliance

Enforce leastprivilege IAM policies and champion DevSecOps practices.

Contribute to SOC 2 & ISO 27001 evidence collection and continuous control monitoring.

Oversee security patch pipelines, vulnerability management, and secrets hygiene.

Operational Excellence & Continuous Improvement

Own reliability KPIs (MTTR, change failure rate, meantime between failures).

Lead quarterly reliability reviews and drive the reliability roadmap.

Partner with Product on capacity forecasts and costoptimization initiatives.

A BIT ABOUT YOU

Minimum Experience

10+ years operating production systems at scale, including 3+ years in an SRE/DevOps capacity.

2+ years people or technical leadershipmentoring, performance coaching, or line

management.

Proven expertise with AWS EKS, MSK, largescale databases (SingleStore, PostgreSQL, MongoDB).

Demonstrated incident commander experience with strong communication under pressure.

Handson InfrastructureasCode with Pulumi/TypeScript or Terraform.

Familiarity with highvolume data pipelines (10k msgs/sec) and IoT workloads.

Technical Proficiency

Expertlevel TypeScript (Node.js services, AWS Lambda, Pulumi tooling).

Deep understanding of AWS networking, container networking (CNI), TLS, HTTP, DNS.

Advanced observability: Prometheus, Grafana, Loki, PagerDuty, AWS CloudWatch.

CI/CD (GitLab or GitHub Actions), automated testing & rollout strategies (blue/green, canary).

Security best practices: IAM, KMS, secrets management, compliance frameworks.

Education

Bachelor's in Computer Science, Engineering, or equivalent practical experience.

WHAT WE VALUE

  • Curious and Clever Smart questions spark smart solutions
  • Entrepreneurial Energy Think like an owner. Solve like a founder
  • Fast with Intent We move fast and deliver real results
  • Laugh and Learn We don't take ourselves too seriously, just our results

What Are You Waiting For

Apply now and help turn data into action with Acquire Intelligence!

Join the A-Team and experience the A-Life!

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 135887307