Senior Data Engineer (Kafka / Hybrid 2x a week onsite)

5-14 Years

Quick Apply

Posted a month ago
Be among the first 10 applicants

Job Description

Are you a Data Engineer with a strong background in distributed systems and real-time data streaming looking to work on large-scale, high-impact data platforms

We are looking for a modern data engineer who operates in a cloud-native, data-as-code environment built on AWS, with a heavy focus on highly scalable and real-time data processing.

The ideal candidate has hands-on, production-level experience architecture-streaming technologies such as Apache Kafka, Spark, Flink or RabbitMQ as a core skill, alongside Apache Airflow for orchestration, and Python or Java for building high-throughput data pipelines.

This role suits engineers who think like software developers—comfortable with version control, CI/CD, testing, and distributed computing frameworks—rather than traditional ETL or legacy data warehouse practitioners.

Work Setup

Hybrid: 2x onsite per week
Office Location: Mandaluyong (Rockwell Business Center, Sheridan)
Schedule: Monday to Friday, 10:00 AM to 7:00 PM

What You'll Do

Design, build, and maintain high-throughput, scalable data pipelines integrating internal and external data sources within a Databricks ecosystem.
Develop, optimize, and maintain real-time streaming and complex batch data processing workflows.
Architect distributed data transformation layers using code-first frameworks (e.g., PySpark, Spark SQL, Flink).
Ensure data quality, reliability, and observability across live data streams through validation and monitoring frameworks.
Build high-performance data APIs and services for internal and external consumption.
Troubleshoot and resolve production infrastructure and streaming pipeline issues.
Work closely with Product, BI, Engineering, and Infrastructure teams.
Participate in code reviews and Agile ceremonies.

Must-Have Qualifications

MUST have advanced, hands-on experience with streaming and real-time data technologies such as Apache Kafka, Spark (Core/Streaming), or RabbitMQ.
MUST have hands-on experience with Databricks and its ecosystem
Proven experience building and maintaining distributed data pipelines using Spark and Apache Airflow.
Strong programming skills in Python or Java with solid software engineering fundamentals.
Advanced SQL skills and hands-on experience with relational, columnar, and NoSQL databases (MySQL, PostgreSQL, MongoDB, Elasticsearch).
Experience architecting solutions within the AWS cloud ecosystem (e.g., EMR, MSK, Glue, S3).
Strong understanding of modern data architectures, specifically Data Lakes and Lakehouse patterns (e.g., Apache Iceberg, DeltaLake).
Experience building and exposing APIs / REST services for data consumption.
Strong data modeling, data quality, and pipeline observability practices.
Excellent communication and collaboration skills.