Search by job, company or skills

O

Service Delivery and Incident Response Lead

3-5 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Your Role

People & Team Leadership

  • Lead, coach, and mentor IT engineers to build strong technical and leadership capabilities.
  • Set clear performance goals aligned with our Beliefs, Vision, Mission, Methods (BVMM).
  • Conduct 1:1s, performance reviews, and career growth discussions.
  • Foster a culture of ownership, collaboration, and continuous learning.
  • Maintain balanced workloads, shift coverage, and clear succession plans to sustain healthy 247 operations.

Service Operations & Reliability

  • Oversee daily service health, capacity, and reliability across all supported environments.
  • Ensure compliance with operational KPIs through proactive planning and improvement.
  • Balance demand vs. capacity and manage shift coverage to prevent burnout.
  • Partner with engineering teams to maintain runbooks, knowledge bases, and escalation paths.
  • Drive automation and workflow optimization to reduce manual overhead.
  • Use data insights to guide decisions and improvements.

Incident & Problem Management

  • Lead end-to-end incident response, triage, communication, and resolution in real time.
  • Act as Incident Commander for high-impact events across a global environment.
  • Track and improve metrics like MTTD, MTTM, and MTTR.
  • Champion blameless Post-Incident Reviews (PIRs) and translate learnings into long-term system and process improvements.

Strategic & Cross-Functional Impact

  • Represent in customer reviews, operational syncs, and briefings.
  • Collaborate with SREs, product owners, and partner engineers to align priorities and reliability goals.
  • Contribute to frameworks and governance initiatives.
  • Lead service onboarding/off-boarding and strengthen operational readiness checkpoints.
  • Identify and close systemic operational gaps through process and tool improvements.

Your Qualifications

  • Bachelor's degree in Computer Science, Information Technology, Engineering, or a related discipline.
  • 3+ years in Service Delivery, Incident Response, or Operations Leadership within enterprise-scale, 247 environments.
  • Proven experience managing technical teams, driving performance, and leading through critical situations.
  • Strong grounding in ITSM / ITIL principles (Incident & Problem Management).
  • Familiarity with cloud, distributed systems, or enterprise infrastructure.
  • Skilled in monitoring, alerting, and ticketing tools (e.g., PagerDuty, Datadog, Grafana, Splunk, ServiceNow).

Core Competencies

  • People and Performance Leadership
  • Incident Command and Escalation Management
  • Analytical and Problem-Solving Skills
  • Communication and Decision-Making Under Pressure
  • Root Cause and Post-Incident Analysis
  • Operational Planning and Service Governance
  • Stakeholder and Partner Management
  • IT Service Management (Incident & Problem Management)
  • Observability, Monitoring, and Automation Tools
  • Passion for People Development, Operational Discipline, and Continuous Improvement

Plus points if you have:

  • ITIL V3 or V4 certification
  • AWS Certified SysOps Administrator
  • SRE Foundation or Crisis/Incident Management certifications
  • Background in SRE practices and operational frameworks that promote reliability and automation

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 136152285