Search by job, company or skills

N

Machine Learning Specialist LLM

Save
  • Posted 19 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

  • Execute performance tuning activities for model serving infrastructure to maintain optimal latency and throughput.
  • Conduct post-deployment validation checks to ensure model prediction stability, API responsiveness, and overall service quality.
  • Support the enhancement of operational pipelines, including CI/CD workflows, configuration templates, and automated monitoring scripts.
  • Participate in service reliability reviews to improve platform uptime, incident response processes, and operational readiness.
  • Coordinate closely with DevOps and Platform Engineering to address infrastructure-level concerns related to model hosting and deployment.
  • Assist in the rollout of platform-level improvements, including model registry enhancements, container optimization, and new monitoring tools.

Minimum Qualifications

Key Requirements: (Must have)

  • Machine Learning Operations (MLOPS)
  • With Cloud background- AWS, GCP, Azure, Alibaba etc)
  • Understanding SQL (Data Pipelines)/ Data Engineering
  • Containerization (Docker and Kubernetes)

Other Qualifications

  • Minimum of 2+ years hands-on experience in a production environment covering MLOps, Data Engineering, or Software Engineering.
  • Demonstrated ability to meet and exceed strict Service Level Agreements (SLAs), especially those related to system uptime, stability, incident response, and resolution.
  • Experience supporting cloud-hosted ML systems in distributed, high-availability environments.

Knowledge

  • Strong understanding of model deployment workflows, including model versioning, serving, rollout strategies, and post-deployment validation.
  • Knowledge of cloud platforms (e.g., AWS Cloud) and their native ML services used for hosting, monitoring, and managing model endpoints.
  • Familiarity with containerization (Docker) and orchestration (Kubernetes) for scalable ML serving infrastructure.
  • Understanding of performance monitoring concepts, including latency tracking, model health indicators, and drift signals.
  • Knowledge of CI/CD processes, configuration templates, and automated operational workflows specific to ML systems.

Skills

  • Proven expertise in MLOps, specifically managing model deployment, proactive monitoring, incident resolution, and performance tuning.
  • Ability to write and maintain automation scripts, validation utilities, and operational workflows to support ML pipelines.
  • Ability to collaborate effectively with DevOps, Data Science, and Platform Engineering teams to improve model reliability and system stability.
  • Skilled in applying structured software development methodologies (e.g., Agile/Scrum) to support platform enhancements and iterative delivery.
  • Strong analytical, troubleshooting, and root-cause diagnosis skills in production environments.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 150587353