Search by job, company or skills

Maria Health

AI Engineer

Save
new job description bg glownew job description bg glow
  • Posted 15 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

AI Engineer — Maria Health

Location: Remote (Philippines) · Hybrid if based in Metro Manila Compensation: ₱120,000–₱160,000 per month (competitive) · ₱100,000–₱130,000 per month (alternative)

About Maria Health

We're the Philippines first fully online HMO broker, operating with a full broker license from the Insurance Commission and backed by InLife (Insular Life). We sit at the intersection of insurance, healthcare, and technology — a category barely touched by AI in this market. We're building the operating system that runs HMO brokerage in the Philippines, and AI is at the core of it.

The Role

As an AI Engineer, you'll be responsible for the AI's accuracy and reliability across every workflow the people running HMO operations depend on. You'll design the RAG architecture, curate the knowledge bases the AI draws from, write the prompts that shape how it reasons, and build the evaluations that catch regressions before users do. This is the AI quality function on a small team, and the bar is straightforward: the AI has to produce correct, grounded outputs that hold up to scrutiny from users, reviewers, and regulators.

Scope of Work
  • Owns AI quality and accuracy across the platform end to end. Designs, builds, evaluates, and iterates on the AI logic that powers each workflow.
  • Designs the RAG architecture, curates the knowledge bases the AI reasons over, writes the prompts and agent logic for each workflow, and partners with the Platform Engineer on the Guardrails policies that shape what the AI is allowed to say.
  • Tunes AI outputs for accuracy and reliability, secures the AI logic against prompt injection and unsafe outputs, monitors AI quality across every workflow, troubleshoots regressions and quality issues, and makes sure AI decisions are auditable and well-documented for whoever needs to review them later.
  • Works with the Platform Engineer and Product Lead to define what correctness means for each workflow, how the AI improves over time, and how quality concerns surface in product decisions.
Responsibilities
  • Design and maintain the RAG architecture for each AI-driven workflow: chunking strategies, retrieval methods, reranking, and context assembly
  • Curate and maintain the content of AI knowledge bases: document selection, organization, and content updates as insurer policies and procedures change
  • Write and version prompts and prompt templates for each workflow, iterating based on evaluation results and observed AI behavior
  • Design agent logic and tool use patterns for multi-step workflows where the AI needs to reason across multiple tools or steps
  • Build evaluation datasets and quality metrics for each AI-driven workflow: golden datasets, accuracy thresholds, regression detection
  • Measure AI accuracy over time and surface regressions when prompts, models, or knowledge base content changes
  • Work with the Platform Engineer to define Guardrails policies that shape what the AI is allowed to say, do, or refuse, working from product and compliance requirements
  • Investigate and resolve AI quality issues: hallucinations, retrieval failures, prompt drift, unexpected outputs
  • Make AI decisions auditable for review: maintain decision traces for AI-driven outputs that affect users (claims approvals, endorsement validations, and similar), capturing which prompt version, knowledge base version, and model version produced each output, so any future security engineer, internal reviewer, or external regulator can understand how the AI arrived at a given decision
  • Partner with the Product Lead on what AI accuracy and quality should mean for each workflow, and how those standards evolve as new processes come online
  • Partner with the Platform Engineer on how AI logic runs in the platform, agreeing on conventions for prompt deployment, knowledge base sync, evaluation runs, and how quality metrics surface in monitoring
  • Document the AI layer for the team: maintain a versioned prompt library, knowledge base content maps, evaluation suites and quality baselines, agent logic and tool use patterns, and Guardrails policy intent
  • Respond to AI quality incidents within agreed response windows
Required Skills
  • 3+ years in software engineering, with at least 2+ years building production LLM applications (RAG systems, agent workflows, evaluation pipelines, or prompt-driven features)
  • Hands-on application-level work on AWS Bedrock, including Knowledge Bases, Guardrails, Evaluations, Prompt Management, and Agents
  • Designing and operating RAG systems in production: chunking strategies, embedding model selection, retrieval methods, reranking, and context window management
  • Curating knowledge base content for AI consumption: document selection, organization, lifecycle management, and adapting content as source material changes
  • Prompt engineering and prompt versioning: writing, testing, and iterating on prompts; managing prompt versions across environments; structured output and tool use patterns
  • Designing agent logic and tool use patterns: multi-step reasoning, tool selection, state management, and failure handling in agentic workflows
  • Building evaluation systems for AI outputs: golden dataset construction, accuracy measurement, regression detection, and human review workflows
  • Handling sensitive data in AI workflows: PII redaction, prompt injection defense, content safety patterns, and avoiding data leakage through prompts or outputs
  • Strong Python for production-grade AI applications and data pipelines, beyond scripting
  • Working with vector stores and embedding models for production RAG systems
  • Measuring AI quality in production: dashboards and metrics for accuracy, retrieval quality, and regression detection
  • Troubleshooting AI quality issues: diagnosing hallucinations, retrieval failures, prompt drift, and unexpected outputs across the AI logic stack
Preferred Skills
  • Familiarity with agent frameworks for designing and orchestrating multi-step LLM workflows
  • Experience with prompt management tools for versioning prompts, tracking outputs, and measuring quality in production
  • Familiarity with other LLM providers, for evaluating model trade-offs
  • Domain familiarity with insurance, healthcare, or financial services: claims processes, enrollment workflows, billing, or regulatory contexts
Why this role matters

Most AI hires get to play with prompts. This one ships AI that real users — account managers, BD reps, insurer counterparts, and eventually regulators — will depend on. Every output is auditable. Every decision is traceable. And every workflow you transform compounds into the operating system we're building for HMO brokerage in the Philippines.

How to apply

Send your CV and a short note on the most interesting RAG, agent, or evaluation system you've shipped to production to [Confidential Information].

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148239931

Similar Jobs

Philippines, Manila

Skills:

TensorflowGcpPytorchSparkAzurePythonAWSAirflowScikit-learnMLflowSageMakerKubeflow

Philippines, Manila

Skills:

JavaTensorflowAgile MethodologiesPytorchPythonApisdata pipelinesRPA platformsDevSecOps principlesfeature engineeringmachine learning modelsLLM prompt engineeringSemantic KernelLangChain AgentsAzure OpenAI servicesmodel evaluation techniquesAI libraries and frameworksCrewAIAutogentest-driven developmentdata pre-processing

Manila

Skills:

Data ScienceMachine Learning AlgorithmsPythonRFeature EngineeringData PreprocessingData Visualization ToolsCloud Platforms and Services

Remote

Skills:

ErpRest ApiBpmSapOracleNetsuitePythonPandasSqlEtlPowerbiepicordmtAirflow

Early Applicant
Philippines, Manila

Skills:

Aws LambdaGithubProofpointTest Automation ToolsJIRATensorflowJenkinsAzure MLNltkMicrosoft 365PytorchAzure FunctionsWinscpRest ApisPythonAWS Step FunctionsScikit-learnspaCyPmpSharepointAzure Cognitive Servicesdatabase interactions