CORE PROFILE
As a Site Reliability Engineer, you will be working on the critical API gateways, backend services, and infrastructure that make Maya function smoothly for millions of people in the Philippines and beyond. Your work will improve features that users rely on every day.
Maya operates at a large scale, so we need someone with a keen eye for detail. You will work on the infrastructure for new services, analyze and optimize existing infrastructure, improve reliability, reduce costs, and collaborate with software engineers to maintain high quality and availability. This is an opportunity to make a real impact in a fast-paced environment.
NATURE OF WORK
- Work with product owners, developers, and other SREs to understand requirements and deliver projects.
- Build and maintain highly available, reliable, robust, scalable, and cost-efficient infrastructure.
- Automate deployments, monitoring, and system management to reduce manual work and improve delivery speed and operational efficiency.
- Develop reusable infrastructure templates to simplify and standardize resource provisioning.
- Manage and optimize budget while balancing cost and performance.
- Ensure compliance and security by adhering to industry standards and frameworks such as PCI-DSS and BSP regulations.
- Lead incident response, troubleshoot issues, and conduct root cause analysis.
- Stay up to date on SRE and DevOps best practices and new technologies.
- Support and mentor team members.
REQUIRED QUALIFICATIONS
- 5+ years of experience in Site Reliability Engineering and working in a DevOps culture.
- AWS certification, at least on an Associate Level.
- Strong expertise in Kubernetes (EKS) and container orchestration.
- Hands-on experience with Infrastructure-as-Code (IaC) using Terraform.
- Experience with CI/CD pipeline management using GitLab CI or similar tools.
- Experience with monitoring, logging, and telemetry tools like Splunk, AWS CloudWatch, or Dynatrace and how to utilize them effectively.
- Strong knowledge of WAF rules/policies, and security configurations.
- Experience with service mesh technologies (Istio, Envoy, AWS App Mesh).
- Experience with deployment strategies (Blue/Green, Canary, Rolling).
- Hands-on networking experience that covers VPC, VLAN, Peering, and Routing.
- Knowledge of operating, scaling and optimizing relational database systems (such as PostgreSQL), NoSQL database systems (such as DynamoDB, MongoDB), and key-value stores (Redis).
- Proficient in shell scripting and/or Python.
- Familiarity with Java, Node.js, or other programming languages.
- Strong troubleshooting and problem-solving skills, especially under pressure.
- Effective communication and teamwork in an Agile environment (we use Scrum).
- Nice to have: Experience with leading a small SRE team (up to 5 members).