Senior Data Platform Reliability Engineer

Opswerks

Philippines

5-7 Years

Save

Posted 3 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Your Role

As a Senior Data Platform Engineer, you will be responsible for operating, maintaining, and continuously improving the company's data platforms running on Kubernetes (On-premise and/or on AWS/GCP) - similar on the DoEKS (Data on EKS) / AIoEKS (AI on EKS) deployment frameworks

Deploy new releases and configuration changes through GitOps/DevOps
Monitor platform and service health using logs, metrics, and observability tools
Participate in incident response, root cause analysis and 24x7 operational rotations
Improve platform observability, operational tooling/automations, self-service capabilities and reliability practices to reduce recurring issues
Investigate & troubleshoot user concerns by either correlating them to system-related issues, breaking integrations and/or user-specific errors/misconfigurations up to recommending/executing resolutions
Provide technical mentorship to junior engineers
Advocate for platform standards, security best practices, and operational excellence

Your Qualifications

Minimum 5 years of solid experience supporting production data workloads/platforms (Spark/Airflow/Jupyter)
5+ years of hands-on experience on ETL/ELT pipeline development & data transformations (Python/Java & SQL)
Practical proficiency in Kubernetes environments including Cloud-provider managed Kubernetes flavors (AWS-EKS/GCP-GKE)
Comprehensive knowledge on Linux environments, microservice architectures and service communication patterns
Strong troubleshooting fundamentals such as application crashes, resource contentions, service latency, and scaling behavior
Well-rounded competency in analyzing logs, metrics, monitoring systems, and service KPIs

Plus points if you have: