Search by job, company or skills

innodata inc.

AI/LLM ENGINEER

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 14 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are looking for an AI Engineer who leads our efforts to train, align, and optimize large language models. RLHF and reinforcement learning are the core of this role — you will own the full post-training pipeline from supervised fine-tuning through reward modeling and RL optimization, while also ensuring models run efficiently in production. This is a role that bridges alignment research and systems engineering.

What You'll Do

  • Own and drive the full RLHF pipeline: data collection, reward model training, and RL fine-tuning using PPO, DPO, GRPO, and RLAIF
  • Design and run Supervised Fine-Tuning (SFT) pipelines on open-weight models (LLaMA, Mistral, Qwen) as the foundation for RLHF
  • Build and train reward models that accurately capture human preferences from annotation data
  • Design human feedback collection pipelines: labeling rubrics, annotator calibration, and preference dataset curation
  • Implement Constitutional AI and RLAIF techniques to reduce reliance on costly human annotation
  • Red team models post-training — probing for jailbreaks, regressions, unsafe outputs, and alignment failures
  • Design and maintain evaluation benchmarks to measure alignment, safety, and capability before and after RL training
  • Optimize inference pipelines and runtimes (llama.cpp, vLLM, TensorRT) to serve aligned models efficiently at scale
  • Implement quantization strategies (INT4/INT8/FP8, LoRA, QLoRA) to deploy fine-tuned models on target hardware
  • Write and tune low-level C/C++ and Rust code for inference performance where Python cannot reach
  • Diagnose and resolve training instabilities, reward hacking, and production inference bugs under pressure
  • Stay at the frontier — read alignment and RL papers weekly and translate findings into working experiments

Core Requirements and Technical Skills

  • Hands-on experience implementing RLHF end-to-end — not just using libraries, but understanding the mechanics
  • Deep familiarity with policy gradient methods: PPO stability, KL divergence constraints, reward shaping
  • Experience with Direct Preference Optimization (DPO) and its variants as an RLHF alternative
  • Understanding of reward hacking, Goodhart's Law, and mitigation strategies in RL training
  • Familiarity with RLAIF (RL from AI Feedback) and Constitutional AI approaches
  • Ability to design preference datasets and annotation rubrics that produce high-quality reward signal
  • Experience diagnosing training instabilities: reward collapse, mode collapse, KL divergence blowup
  • Python as the primary language for all training, fine-tuning, and evaluation pipelines
  • Strong mathematical foundation: RL theory, probability, linear algebra, optimization — deep enough to derive loss functions and debug training dynamics
  • C and C++ for systems-level inference work, runtime contributions, and performance-critical paths
  • Rust experience with ML tooling.
  • Familiarity with transformer architecture, attention, tokenization, and how post-training interacts with pretraining
  • Experience with distributed training frameworks for large-scale fine-tuning
  • Experience with vector databases such as FAISS or Milvus
  • Familiarity with retrieval-augmented generation (RAG) pipelines
  • Experience integrating LLMs with external tools, APIs, and agent-based systems
  • Exposure to Rapid Application Development (RAD) approaches for building and iterating AI solutions efficiently

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 146402357

Similar Jobs