Job Overview:
We are looking for a highly skilled AI / LLM Engineer to lead the training, alignment, and optimization of large language models. This role focuses on Reinforcement Learning from Human Feedback (RLHF) and end-to-end post-training pipelines, while ensuring models are efficient and production-ready.
Key Responsibilities:
- Lead and manage the end-to-end RLHF pipeline (data collection, reward modeling, RL fine-tuning – PPO, DPO, GRPO, RLAIF)
- Design and implement Supervised Fine-Tuning (SFT) pipelines using models like LLaMA, Mistral, and Qwen
- Build and train reward models based on human feedback
- Develop annotation pipelines (guidelines, calibration, dataset curation)
- Apply Constitutional AI & RLAIF to reduce manual labeling
- Perform model evaluation & red teaming for safety and quality
- Create benchmarks for performance, alignment, and reliability
- Optimize inference pipelines (llama.cpp, vLLM, TensorRT)
- Implement model optimization (INT4/INT8/FP8, LoRA, QLoRA)
- Troubleshoot training issues & production bugs
- Collaborate with teams to bring research into production
- Stay updated with the latest in AI, RL, and LLM advancements
Qualifications:
- Bachelor's degree in Computer Science, Engineering, Mathematics, or related field
- Proven 3-5 years experience in AI/ML, NLP, or LLM development
- Strong understanding of Reinforcement Learning and RLHF
Required Skills:
- Hands-on experience with end-to-end RLHF pipelines
- Strong knowledge of PPO, KL divergence, reward shaping
- Experience with DPO and related techniques
- Familiarity with RLAIF & Constitutional AI
- Strong Python programming skills
- Solid background in math (probability, linear algebra, optimization)
- Experience troubleshooting training instabilities
- Understanding of transformers & LLM workflows
- Experience with distributed training
Technical Skills:
- Languages: Python, C/C++, Rust (preferred)
- Frameworks: PyTorch, TensorFlow, Hugging Face
- Inference Tools: llama.cpp, vLLM, TensorRT
- Data Tools: FAISS, Milvus, RAG pipelines
- Integration: APIs, agent systems, external tools
Good to Have:
- Experience with vector databases & retrieval systems
- Exposure to Rapid Application Development (RAD)
- Strong interest in AI alignment & safety
Why Join Us
- Work on cutting-edge AI technologies
- Build impactful, real-world LLM solutions
- Be part of an innovative and collaborative team
Additional information:
Location: Preferably candidates based in the Philippines.
Availability: Can start immediately or as soon as possible.
Flexibility: Open to any shift assignment, including the graveyard shift.