Search by job, company or skills

  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Overview:

  • The AI / DevOps Engineer is responsible for building, deploying, and operating AI-powered software and the infrastructure that runs it. The role combines hands-on software development with DevOps and platform engineering, with a strong focus on Large Language Model (LLM) applications, agentic systems, and workflow automation.
  • The individual in this role designs and develops AI-enabled applications and automations, integrates LLM APIs and self-hosted models, and builds the pipelines, infrastructure, and observability needed to ship and run these systems reliably across cloud, on-premise, private cloud, and GPU environments.
  • This position bridges development and operations across the full lifecycle, from requirements and system design through CI/CD, deployment, monitoring, security, and post-production support, while continuously evaluating emerging AI tools and practices to improve efficiency and quality.
  • Continuous learning is essential, as the AI / DevOps Engineer must stay current with a fast-moving AI and infrastructure landscape, including new models, agentic coding tools, orchestration frameworks, and automation techniques.

Responsibilities:

AI & LLM Application Development

  • Design, develop, and maintain AI-powered applications and services that integrate Large Language Models (LLMs) and other machine learning models into business workflows.
  • Integrate LLM APIs (such as Anthropic Claude and other providers) as well as self-hosted and open-source models running on private GPU infrastructure.
  • Build retrieval-augmented generation (RAG) pipelines using vector databases, embeddings, and semantic search to ground model outputs in enterprise data.
  • Design and implement agentic systems and tool-use workflows, including integrations through the Model Context Protocol (MCP) and connections to internal and third-party services.
  • Apply prompt engineering, evaluation, and guardrail techniques to improve accuracy, safety, reliability, and cost-efficiency of AI features.
  • Write clean, efficient, reusable, and well-tested code following established standards and secure coding practices.

Software Development & Integration

  • Design and develop scalable, secure backend services, APIs, and integrations that support AI and automation use cases.
  • Perform application and data integration with internal and external systems using RESTful APIs, web services, webhooks, and message queues.
  • Translate business requirements into functional and technical specifications, and participate in architecture and design discussions.
  • Ensure solutions are compatible across multiple platforms and environments, including cloud, on-premise, and private cloud deployments.

Workflow & Process Automation

  • Design, develop, and deploy automation workflows that combine traditional automation with AI-driven decision-making.
  • Build automations using modern tools and platforms such as Power Automate, n8n, Zapier-style iPaaS, and custom scripts, replacing manual and repetitive processes.
  • Develop and operate desktop and agentic automation, including AI desktop agents (for example, agent-based assistants such as Cowork / Open Claw-style tools) that perform tasks across applications.
  • Implement web automation and data extraction where required using tools such as Playwright, Puppeteer, or Selenium.
  • Use agentic coding tools such as Claude Code to accelerate development, automate engineering tasks, and build internal tooling.
  • Automate IT and operational workflows such as provisioning, monitoring, alerting, ticketing, and incident response.

Infrastructure, Cloud & DevOps

  • Build, maintain, and optimize infrastructure across cloud (AWS, GCP), on-premise, and private cloud environments for efficiency, scalability, and reliability.
  • Provision and manage GPU compute for model inference and AI workloads, optimizing for performance and cost.
  • Design and maintain CI/CD pipelines to automate building, testing, and deployment of applications, models, and automations.
  • Manage source control and Git-based workflows on platforms such as GitHub, GitLab, or Bitbucket, including branching strategies, pull/merge requests, and code review processes.
  • Containerize and orchestrate workloads using Docker and Kubernetes, and manage infrastructure as code (e.g., Terraform).
  • Manage deployment, release, and configuration management, and support smooth promotion of changes from development to production.
  • Administer Linux/Unix and Windows environments supporting development and production systems.

Monitoring, Reliability & Performance

  •  Implement monitoring, logging, alerting, and observability for applications, infrastructure, and AI/LLM workloads using tools such as Prometheus, Grafana, the ELK/Loki stack, Datadog, or cloud-native services (e.g., AWS CloudWatch).
  • Track AI-specific metrics such as latency, token usage, cost, accuracy, and quality, and act on the results.
  • Proactively identify, troubleshoot, and resolve performance issues and production incidents within agreed timelines.
  • Participate in root cause analysis and drive preventive improvements to system reliability and stability.

Security & Compliance

  • Apply security best practices across development, automation, and operations, including secrets management, access control, and network security.
  • Address AI-specific security and governance concerns such as data privacy, prompt injection, safe handling of sensitive data, and responsible use of models.

Collaboration & Continuous Improvement

  • Work closely with developers, data and AI engineers, operations staff, business analysts, and other stakeholders to deliver end-to-end solutions.
  • Participate in Agile/Scrum ceremonies including sprint planning, daily stand-ups, reviews, and retrospectives.
  • Act as a liaison between technical teams and stakeholders, and communicate solutions, trade-offs, and results clearly.
  • Prepare and maintain technical documentation, including system designs, runbooks, and operational procedures.

Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, Software Engineering, or a related field (or equivalent practical experience).
  • At least 2–4 years of combined experience across software development, DevOps, or automation; experience with AI/LLM-based solutions is strongly preferred.
  • Demonstrated experience building and deploying applications in cloud, on-premise, or private cloud environments.
  • Experience integrating APIs and third-party services, and building automated workflows.
  • Working knowledge of Agile/Scrum methodologies and collaboration tools.
  • Relevant certifications in cloud (AWS, GCP), DevOps, or AI/ML are an advantage.

Technical Skills

  • Programming Languages & Core Stack
  • Python (required, primary): main language for AI/LLM development, automation, data work, and scripting; experience with frameworks and libraries such as FastAPI or Flask, plus LangChain, LlamaIndex, or the Anthropic and OpenAI SDKs.
  • TypeScript / JavaScript (required): for backend services (Node.js) and front-end or full-stack work (React or similar), API integrations, and building agentic and MCP-based tooling.
  • Bash / Shell scripting (required): for automation, CI/CD, and Linux/Unix system administration.
  • SQL (required): for querying and managing relational databases such as PostgreSQL, MySQL, or SQL Server.
  • PowerShell (preferred): for Windows administration and automation in mixed environments.
  • Go and/or C# (advantageous): for performant backend services, infrastructure tooling (Go), or .NET-based enterprise integrations (C#).
  • Configuration and infrastructure-as-code languages: YAML and JSON for pipelines and config, and HCL (Terraform) for provisioning infrastructure.
  • Strong proficiency in the core languages above, with the ability to pick up additional languages as project needs evolve.
  • Hands-on experience integrating LLM APIs (e.g., Anthropic Claude, OpenAI) and building AI features such as RAG, agents, and tool use.
  • ·Familiarity with AI frameworks and libraries such as LangChain, LlamaIndex, or similar, and with vector databases (e.g., Pinecone, Weaviate, pgvector, or FAISS).
  • Experience with the Model Context Protocol (MCP) and agentic coding tools such as Claude Code is an advantage.
  • Strong knowledge of cloud platforms, especially AWS and GCP, plus on-premise and private cloud deployment.
  • Experience provisioning and using GPU compute for model inference and training.
  • Solid DevOps skills: CI/CD pipelines (e.g., GitHub Actions, GitLab CI/CD, Jenkins), Docker, Kubernetes, and infrastructure as code (e.g., Terraform).
  • Experience with Linux/Unix and Windows administration and scripting (e.g., Bash, Python).
  • Knowledge of RESTful APIs, JSON, webhooks, and system integration concepts.
  • Experience with databases including PostgreSQL, MySQL, MongoDB, or SQL Server.
  • Familiarity with workflow automation tools (e.g., Power Automate, n8n) and web automation (e.g., Playwright, Puppeteer, Selenium).
  • Strong experience with Git and Git-based platforms such as GitHub, GitLab, and Bitbucket, including branching strategies, pull/merge requests, code review, and repository management, along with modern software development life cycle practices.
  • Hands-on experience with monitoring and observability tooling such as Prometheus, Grafana, the ELK/Loki stack, Datadog, or cloud-native services (e.g., AWS CloudWatch), including metrics, logs, dashboards, and alerting for applications, infrastructure, and AI workloads.
  • Awareness of AI safety, security, and governance considerations, including data privacy and prompt-injection risks.

Soft Skills

  • Strong analytical and problem-solving abilities.
  • Excellent verbal and written communication skills.
  • Ability to work both independently and collaboratively within a team.
  • Strong organizational and time-management skills.
  • Ability to manage multiple priorities and meet deadlines in a fast-paced environment.
  • Attention to detail and a commitment to quality.
  • Curiosity and a strong willingness to learn and adapt to rapidly evolving AI and infrastructure technologies.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 148681973

Similar Jobs

Remote

Skills:

ErpRest ApiBpmSapOracleNetsuitePythonPandasSqlEtlPowerbiepicordmtAirflow

Philippines, Central Luzon

Skills:

DockerWATI WhatsApp Business APIWave Accounting APIAirtable APIChatwootn8n self-hostedClaude APIVPSGoogle Workspace APIsREST API integrationWhatsApp Business API