Role summary
- We're hiring a hands-on Data Engineer to build and own the reporting/analytics data pipeline. You'll turn raw event streams and legacy datasource records into well-structured, high-performance datasets used by Looker and our web applications. This role sits at the intersection of real-time streaming, analytical storage (ClickHouse, OLAP engines), and product instrumentation. You'll design data models, author roll-up jobs, perform gap analysis, and help improve legacy reporting.
Key responsibilities
- Design, implement, and operate ETL/ELT pipelines that consume raw events and legacy data sources (batch + stream).
- Build and maintain a core highly-granular event table (single source of truth) and a set of optimized rolled-up reporting tables for fast Looker/BI queries and front-end reporting.
- Implement minute / near-real-time rollups and support re-roll/ re-computation when late data arrives.
- Work with streaming infrastructure (Kafka or similar) to ingest and process events; design low-latency pipelines and minute-level aggregation jobs.
- Tune schema design, partitioning, and clustering strategy for analytical stores to enable predictable performance at scale.
- Query raw event data for research and debugging; produce ad-hoc analysis and root-cause investigations.
- Read application code to understand what is being logged, propose additional instrumentation, and ensure event schemas are robust for analytics.
- Perform gap analysis for business reporting requestspropose pragmatic technical solutions and implementation plans.
- Support and improve legacy reporting (multiple older DBs and calculation logic); document and standardize transformation logic.
- Design/partner on BI models (LookML, Tableau data sources, Power BI datasets) and collaborate with analysts and Product to expose the right metrics and dimensions.
- Write clear documentation, tests, and monitoring/alerts for data quality and pipeline health.
Must-have skills / experience
- 4+ years working with large big datasets in production (event data, clickstreams, ad metrics).
- Strong SQL skills and ability to author performant analytical queries (window functions, aggregates, time-series).
- Experience with analytic columnar/OLAP stores (ClickHouse strongly preferred).
- Experience with data-lake/ query engines (Trino/Presto, Hive) or comparable query engines.
- Practical experience with streaming systems (Kafka or equivalent) and real-time processing patterns (minute rollups, stream materialized views).
- Experience working with BI tools (Looker, Tableau, Power BI, etc.): Building datasets, modeling metrics/dimensions, and making datasets performant for dashboarding and self-serve analytics.
- Familiarity with CI/CD for data jobs, dbt, or orchestration tools (Airflow, Prefect): Able to ship repeatable, tested pipelines and support automated backfills/re-rolls.
- Solid understanding of data partitioning, sharding, and physical schema choices that affect query performance in clustered stores.
- Comfortable reading application code to discover what is being logged and to propose instrumentation changes.
- Good statistics/math literacy (means, standard deviation, percentiles, sampling) and working knowledge of ad metrics (CPM, RPM, CTR) used to validate reporting and optimize ad delivery.
- Strong debugging mindset: Able to investigate data lineage, late/duplicate events, and data quality issues.
- Clear communicator who can explain technical tradeoffs and write documentation.
Nice-to-have
- Hands-on ClickHouse schema design and tuning experience.
- Experience managing or operating Hadoop/Trino/Hive clusters.
- Experience with additional stream processors (e.g., Flink, Spark Structured Streaming) or real-time materialized view systems.
- Experience with LookML specifically (or deep Tableau/Power BI modeling experience).
- Experience with monitoring, SLAs for data latency, and data quality frameworks (tests, alerts, observability).
What success looks like in the first 3-6 months
- Delivered at least one new reporting pipeline from raw events to Looker/Tableau-ready tables with documentation and monitoring.
- Implemented a robust minute/near-real-time rollup
- Performed a gap analysis for 2 legacy reports and delivered a plan (and at least one quick win) to standardize/improve accuracy.
- Demonstrated reproducible processes to re-compute or backfill aggregated tables when late data arrives.
Work Details:
- Schedule: Monday to Friday, 6:00am- 3:00pm or 7:00am- 4:00pm (PH Time); depending on business needs
- Location: Makati | *Work from Home Until Further Notice
- Status: Full Time employment