Senior Data Engineer (Big Data)

4-6 Years

Save

Early Applicant

Job Description

Role summary

We're hiring a hands-on Data Engineer to build and own the reporting/analytics data pipeline. You'll turn raw event streams and legacy datasource records into well-structured, high-performance datasets used by Looker and our web applications. This role sits at the intersection of real-time streaming, analytical storage (ClickHouse, OLAP engines), and product instrumentation. You'll design data models, author roll-up jobs, perform gap analysis, and help improve legacy reporting.

Key responsibilities

Design, implement, and operate ETL/ELT pipelines that consume raw events and legacy data sources (batch + stream).
Build and maintain a core highly-granular event table (single source of truth) and a set of optimized rolled-up reporting tables for fast Looker/BI queries and front-end reporting.
Implement minute / near-real-time rollups and support re-roll/ re-computation when late data arrives.
Work with streaming infrastructure (Kafka or similar) to ingest and process events; design low-latency pipelines and minute-level aggregation jobs.
Tune schema design, partitioning, and clustering strategy for analytical stores to enable predictable performance at scale.
Query raw event data for research and debugging; produce ad-hoc analysis and root-cause investigations.
Read application code to understand what is being logged, propose additional instrumentation, and ensure event schemas are robust for analytics.
Perform gap analysis for business reporting requestspropose pragmatic technical solutions and implementation plans.
Support and improve legacy reporting (multiple older DBs and calculation logic); document and standardize transformation logic.
Design/partner on BI models (LookML, Tableau data sources, Power BI datasets) and collaborate with analysts and Product to expose the right metrics and dimensions.
Write clear documentation, tests, and monitoring/alerts for data quality and pipeline health.

Must-have skills / experience

4+ years working with large big datasets in production (event data, clickstreams, ad metrics).
Strong SQL skills and ability to author performant analytical queries (window functions, aggregates, time-series).
Experience with analytic columnar/OLAP stores (ClickHouse strongly preferred).
Experience with data-lake/ query engines (Trino/Presto, Hive) or comparable query engines.
Practical experience with streaming systems (Kafka or equivalent) and real-time processing patterns (minute rollups, stream materialized views).
Experience working with BI tools (Looker, Tableau, Power BI, etc.): Building datasets, modeling metrics/dimensions, and making datasets performant for dashboarding and self-serve analytics.
Familiarity with CI/CD for data jobs, dbt, or orchestration tools (Airflow, Prefect): Able to ship repeatable, tested pipelines and support automated backfills/re-rolls.
Solid understanding of data partitioning, sharding, and physical schema choices that affect query performance in clustered stores.
Comfortable reading application code to discover what is being logged and to propose instrumentation changes.
Good statistics/math literacy (means, standard deviation, percentiles, sampling) and working knowledge of ad metrics (CPM, RPM, CTR) used to validate reporting and optimize ad delivery.
Strong debugging mindset: Able to investigate data lineage, late/duplicate events, and data quality issues.
Clear communicator who can explain technical tradeoffs and write documentation.

Nice-to-have

Hands-on ClickHouse schema design and tuning experience.
Experience managing or operating Hadoop/Trino/Hive clusters.
Experience with additional stream processors (e.g., Flink, Spark Structured Streaming) or real-time materialized view systems.
Experience with LookML specifically (or deep Tableau/Power BI modeling experience).
Experience with monitoring, SLAs for data latency, and data quality frameworks (tests, alerts, observability).

What success looks like in the first 3-6 months

Delivered at least one new reporting pipeline from raw events to Looker/Tableau-ready tables with documentation and monitoring.
Implemented a robust minute/near-real-time rollup
Performed a gap analysis for 2 legacy reports and delivered a plan (and at least one quick win) to standardize/improve accuracy.
Demonstrated reproducible processes to re-compute or backfill aggregated tables when late data arrives.

Work Details:

Schedule: Monday to Friday, 6:00am- 3:00pm or 7:00am- 4:00pm (PH Time); depending on business needs
Location: Makati | *Work from Home Until Further Notice
Status: Full Time employment