Department
Digital & Technology Office
Employee Type
Probationary
At Cebu Pacific , we embrace challenges head-on, staying at the forefront of innovation through agile technology, data-driven insights, and a relentless focus on customer and operational excellence. A career with our Digital Team offers you the opportunity to shape the future of air travel—whether by developing cutting-edge digital products, harnessing the power of analytics, or driving impactful projects that enhance passenger experiences and streamline operations.
We don't just build technology—we create experiences and innovations that inspire and redefine the future of travel. Be part of this exciting journey and let your expertise take flight as a moment maker in the ever evolving field of Digital as a
Senior Site Reliability En gineer . Visit our careers site to learn more about how your moment matters at Cebu Pacific: Cebu Pacific Careers Site
The Senior Site Reliability Engineer will serve as the first line of defense for our 24/7 operations. You will act as the guardian of our production environment, utilizing Dynatrace to maintain a holistic view of both Infrastructure and Application health.
You will not just monitor uptime; you will actively test system resilience, manage major incidents, and facilitate stability reporting. You will be the primary notification point for all P1/P2 incidents, responsible for deep-dive triage, quick remediation, and coordinating Major Incident Management (MIM).
Primary Responsibilities:
24/7 Incident Command & Alerting
- 24/7 Availability: Participate in a shift rotation or on-call schedule to ensure continuous coverage. You are the eyes on glass for the organization.
- Unified Alerting: Manage the notification workflow. Ensure that Critical Alerts for both Infrastructure failures and Application failures trigger immediate notifications to the 24/7 team.
- Major Incident Management (MIM): Lead the technical response during critical outages. Coordinate cross-functional teams to restore service rapidly.
Observability Strategy (Dynatrace Focus)
- Dynatrace Administration: Act as the Subject Matter Expert (SME) for our Dynatrace implementation.
- Configure Management Zones, Alerting Profiles, and Dashboards to provide a Single Pane of Glass.
- Utilize Dynatrace PurePath for distributed tracing to identify bottlenecks in microservices.
- Leverage Davis AI to automatically detect anomalies and reduce alert noise.
- Comprehensive Monitoring Scope:
- Network Health: Monitor VPN Tunnel status, Load Balancer (ALB/NLB) health, and DNS latency. Trigger: Alert on packet loss or high latency.
- Infrastructure Health: Monitor Disk/Volume usage, CPU/Memory saturation, and SSL Certificate expiry.
- Security: Monitor for DDoS attack patterns and WAF spikes.
Resilience & Chaos Engineering
- Chaos Engineering: Plan and execute Chaos Engineering exercises (e.g., simulating pod failures, network latency, zone outages) to test the system's resilience and verify that failover mechanisms work as expected.
- Reliability Recommendations: Proactively analyze trends and provide architectural recommendations to development and infrastructure teams to improve system stability.
- First Line Troubleshooting: Serve as the L1/L2 troubleshooter for Kubernetes (EKS), AWS, and Linux issues. Execute Quick Fix runbooks to mitigate impact before escalating to platform engineering.
Application Triage & Analysis
- Deep-Dive Triage: Go beyond system check to perform deep analysis using Dynatrace. Analyze stack traces and exception logs to pinpoint the exact line of code causing the failure.
- Root Cause Differentiation: Rapidly differentiate between an Infrastructure Issue (e.g., Network timeout) vs. an Application Logic Error (e.g., NullPointer caused by bad data).
- Blameless RCA: Facilitate Root Cause Analysis sessions to ensure permanent fixes are applied to recurring problems.
Governance & Reporting (Stability Cadence)
- Stability Calls: Facilitate and lead the Weekly/Bi-Weekly Stability Call. Present the health status of all technical towers to leadership and stakeholders.
- Reporting: Generate regular reports on system uptime, error budgets, incident trends, and MTTR (Mean Time To Recovery).
- Cross-Tower Visibility: Ensure that the dashboards and reports provide value to all teams (Network, App, Cloud), ensuring no siloed blind spots in production.
Automation & Toil Reduction
- Remediation Scripting: Develop scripts (Python/Bash) to Auto-Heal common issues (e.g., clearing logs when disk is full, restarting stuck services).
- Process Improvement: Identify manual checks and convert them into automated Dynatrace alerts or synthetic tests.
Qualifications:
- Shift Availability: Must be willing to work in a 24/7 shift environment or strictly defined on-call rotation.
- Dynatrace Expertise: Deep experience administering and using Dynatrace in a production environment (Dashboards, OneAgent, PurePaths).
- Troubleshooting Expertise:
- Network: Understanding of DNS, TCP/IP, Load Balancing, and Firewalls.
- Compute/Storage: Understanding of block vs. object storage, CPU stealing, and memory management.
- Governance: Experience facilitating technical management calls and producing executive-level reliability reports.
- Application Debugging: Ability to read application logs (Java, Node, Python) to understand why a service failed.
- Cloud (AWS) & K8s: Solid understanding of EKS, EC2, and other AWS Services
Why Join Us:
- We are the first Great Place to Work ® certified airline in Southeast Asia.
- We have been recognized as Best Employer Brand on LinkedIn for two consecutive years.
- Be part of a forward-thinking team that values innovation and continuous improvement.
- Play a key role in developing and nurturing the talents that drive our success.
- Accelerate your career with access to extensive learning programs and leadership development initiatives, all under Ceb U, our corporate university.
- Enjoy unique employee perks such as free travel for you and your family. Expanded coverage to common law partners and same sex partners!
- Be assured of a comprehensive healthcare coverage upon hire.
Note: This position is for an Individual Contributor and will be based in Pasay City, Metro Manila but currently follows a hybrid workplace flexibility arrangement.
Your moment matters. Be a Moment Maker!
Cebu Pacific warns the public against fake hiring and training advertisements by unknown groups. We do not require payment from candidates during the recruitment process nor do we require submission of physical application documents. For official information on our job openings, please visit our LinkedIn or career site at Cebu Pacific Careers Site for reference .
Experience Range Range (Years)
4 - 8 years
Job posted on
2026-03-12