Data Engineer Intern - Speech and Language Intelligence

CSI Interfusion

Philippines, Quezon City

Fresher

Save

Posted 23 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

This role is not just an internship.

It is an entry point into worldclass AI collaboration.

Your Impact & Responsibilities

As a Data Engineer Intern, you will operate as a handson contributor to our ASR data pipeline, not a passive assistant.

You Will

Engineer, preprocess, and qualityvalidate largescale speech and text datasets that directly influence ASR model performance
Design and execute data transformations including text normalization, data chunking, format conversion, and structured analysis
Optimize audio pipelines through segmentation, merging, transcoding, and subtitle/caption quality assurance
Strengthen data pipelines by improving robustness, traceability, and reproducibility through clean logs and documentation
Proactively identify data quality risks, triage issues at scale, and close the feedback loop with clarity and ownership

Your work feeds production speech models, not toy datasets.

Qualifications

We are looking for individuals who value engineering rigor, data quality, and longterm growth.

Required

Undergraduate or Masters student from a toptier university (Top 10 preferred) in Computer Science, Electrical Engineering, Statistics, Data Science, or related fields
Strong Python fundamentals, with the ability to write, debug, and improve dataprocessing scripts
High ownership mindset with exceptional attention to data quality, standards, and reproducibility
Able to commit to 6 months or longer to ensure meaningful technical depth and impact

Nice To Have

Exposure to Speech, ASR, or NLP through coursework or handson projects
Experience with speech/audio processing, data collection workflows, or multimedia QA (e.g., captions/subtitles)
Chinese language proficiency is a strong plus, enabling smoother collaboration with crossregional teams