Search by job, company or skills

Flexisource IT

Senior Data Engineer (Big Data)

4-6 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role summary

  • We're hiring a hands-on Data Engineer to build and own the reporting/analytics data pipeline. You'll turn raw event streams and legacy datasource records into well-structured, high-performance datasets used by Looker and our web applications. This role sits at the intersection of real-time streaming, analytical storage (ClickHouse, OLAP engines), and product instrumentation. You'll design data models, author roll-up jobs, perform gap analysis, and help improve legacy reporting.

Key responsibilities

  • Design, implement, and operate ETL/ELT pipelines that consume raw events and legacy data sources (batch + stream).
  • Build and maintain a core highly-granular event table (single source of truth) and a set of optimized rolled-up reporting tables for fast Looker/BI queries and front-end reporting.
  • Implement minute / near-real-time rollups and support re-roll/ re-computation when late data arrives.
  • Work with streaming infrastructure (Kafka or similar) to ingest and process events; design low-latency pipelines and minute-level aggregation jobs.
  • Tune schema design, partitioning, and clustering strategy for analytical stores to enable predictable performance at scale.
  • Query raw event data for research and debugging; produce ad-hoc analysis and root-cause investigations.
  • Read application code to understand what is being logged, propose additional instrumentation, and ensure event schemas are robust for analytics.
  • Perform gap analysis for business reporting requestspropose pragmatic technical solutions and implementation plans.
  • Support and improve legacy reporting (multiple older DBs and calculation logic); document and standardize transformation logic.
  • Design/partner on BI models (LookML, Tableau data sources, Power BI datasets) and collaborate with analysts and Product to expose the right metrics and dimensions.
  • Write clear documentation, tests, and monitoring/alerts for data quality and pipeline health.

Must-have skills / experience

  • 4+ years working with large big datasets in production (event data, clickstreams, ad metrics).
  • Strong SQL skills and ability to author performant analytical queries (window functions, aggregates, time-series).
  • Experience with analytic columnar/OLAP stores (ClickHouse strongly preferred).
  • Experience with data-lake/ query engines (Trino/Presto, Hive) or comparable query engines.
  • Practical experience with streaming systems (Kafka or equivalent) and real-time processing patterns (minute rollups, stream materialized views).
  • Experience working with BI tools (Looker, Tableau, Power BI, etc.): Building datasets, modeling metrics/dimensions, and making datasets performant for dashboarding and self-serve analytics.
  • Familiarity with CI/CD for data jobs, dbt, or orchestration tools (Airflow, Prefect): Able to ship repeatable, tested pipelines and support automated backfills/re-rolls.
  • Solid understanding of data partitioning, sharding, and physical schema choices that affect query performance in clustered stores.
  • Comfortable reading application code to discover what is being logged and to propose instrumentation changes.
  • Good statistics/math literacy (means, standard deviation, percentiles, sampling) and working knowledge of ad metrics (CPM, RPM, CTR) used to validate reporting and optimize ad delivery.
  • Strong debugging mindset: Able to investigate data lineage, late/duplicate events, and data quality issues.
  • Clear communicator who can explain technical tradeoffs and write documentation.

Nice-to-have

  • Hands-on ClickHouse schema design and tuning experience.
  • Experience managing or operating Hadoop/Trino/Hive clusters.
  • Experience with additional stream processors (e.g., Flink, Spark Structured Streaming) or real-time materialized view systems.
  • Experience with LookML specifically (or deep Tableau/Power BI modeling experience).
  • Experience with monitoring, SLAs for data latency, and data quality frameworks (tests, alerts, observability).

What success looks like in the first 3-6 months

  • Delivered at least one new reporting pipeline from raw events to Looker/Tableau-ready tables with documentation and monitoring.
  • Implemented a robust minute/near-real-time rollup
  • Performed a gap analysis for 2 legacy reports and delivered a plan (and at least one quick win) to standardize/improve accuracy.
  • Demonstrated reproducible processes to re-compute or backfill aggregated tables when late data arrives.

Work Details:

  • Schedule: Monday to Friday, 6:00am- 3:00pm or 7:00am- 4:00pm (PH Time); depending on business needs
  • Location: Makati | *Work from Home Until Further Notice
  • Status: Full Time employment

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 135693505