Company Description
NCS (Formerly Yondu Inc) is a leading AI Tech Services company. With a 15,000-strong team across the Asia Pacific, NCS scales its platforms and capabilities to provide clients with greater agility and AI expertise across a range of industries. Embracing a strong ecosystem of global partners, NCS transforms technology services delivery combining AI with digital resilience to drive real business impact. NCS is a subsidiary of the Singtel Group.
Role Description
We are seeking a dedicated and skilled Machine Learning Operations Specialist to join our team. In this full-time, hybrid role (partly based in Taguig with work-from-home flexibility), you will design, implement, and optimize machine learning models for business applications. Your responsibilities will include overseeing end-to-end machine learning pipelines, automating data workflows, evaluating algorithm performance, and collaborating with cross-functional teams to ensure seamless integration of solutions into existing systems.
Qualifications:
Model Monitoring and Maintenance
- Monitor performance, health, and operational status of ML models deployed to production.
- Detect anomalies, data drift, concept drift, or degradation in prediction quality.
- Implement corrective actions or coordinate with Data Science teams for retraining or model refinement.
Incident Management
- Triage and diagnose issues affecting model availability, accuracy, latency, or data quality.
- Execute immediate stabilization tasks or escalate to appropriate teams when needed.
- Document all incidents, resolutions, and preventive actions.
Deployment and Release Management
- Plan and execute rollout of new model versions, platform updates, or configuration changes.
- Validate deployments through regression checks, functional testing, and performance validation.
- Manage model versioning, rollback procedures, and release documentation.
Model Lifecycle Operations
- Manage routine operational tasks such as data refreshes, threshold tuning, and configuration updates.
- Automate workflows for recurring operational tasks where applicable.
- Support periodic model retraining cycles initiated by Data Science or ML Engineering teams.
Model Management Platform Support
- Support the enhancement and maintenance of the Model Management Platform, including model registry, monitoring tools, lineage dashboards, and pipeline components.
- Collaborate in implementing CI/CD pipelines for ML models (training, validation, deployment).
- Integrate new tools or operational improvements to enhance model reliability and observability.
Operational Stability and Risk Mitigation
- Address high-impact issues due to upstream data inconsistencies, environment drift, or infrastructure instability.
- Identify operational risks and propose preventive measures to ensure model uptime, accuracy, and compliance.
- Monitor and optimize resource usage related to ML tooling, serving infrastructure, and pipelines.
Cross-Functional Collaboration
- Work closely with Data Scientists, ML Engineers, DevOps, SysOps, and Platform teams to maintain seamless ML
- operations.
- Participate in post-incident reviews and recommend improvements for pipeline reliability, monitoring, and tooling.