Project Description:
- We are seeking a Senior DevOps / Platform Engineer with deep AWS expertise to evolve and operate our core cloud platform, while enabling Enterprise-grade Agentic AI capabilities on top of established infrastructure.
- This role is focused on platform stability, scalability, and developer enablement, ensuring that traditional services and emerging agentic systems coexist securely and reliably.
- You will play a key role in transforming our platform into a foundation that supports AI-augmented and agentic SDLC workflows, without compromising operational excellence.
Responsibilities:
- • Design, build, and operate a shared AWS cloud platform that supports both traditional services and agentic AI workloads.
- • Own and evolve Infrastructure as Code (IaC) using Terraform, ensuring consistency, security, and repeatability across environments.
- • Extend existing infrastructure to support Enterprise-grade Agentic AI systems, including:
- o Execution runtimes for autonomous and semi-autonomous agents
- o Secure access to data, services, and APIs
- o Platform-level guardrails for safety, governance, and cost control
- • Build platform abstractions, templates, and tooling that enable teams to safely consume agentic capabilities.
- • Support and integrate agentic SDLC tools and processes, including AI-assisted development, testing, and release automation.
- • Develop and maintain CI/CD pipelines for both traditional applications and AI-driven components.
- • Implement platform-level observability (metrics, logs, traces) across services and agent workloads.
- • Enforce security and compliance best practices (IAM, secrets management, encryption, least privilege).
- • Collaborate closely with application, AI/ML, and security teams to improve developer experience and platform reliability.
- • Act as a technical leader in architecture discussions, platform standards, and operational readiness.
Mandatory Skills Description:
- • 8+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering roles.
- • Deep, hands-on AWS expertise, including services such as:
- o EC2, EKS/ECS, VPC, IAM, S3, RDS/DynamoDB, CloudWatch, Lambda
- • Strong production experience with Terraform, including:
- o Designing modular Terraform architectures
- o Managing state, environments, and multi-account setups
- • Proven experience rolling out Enterprise-grade Agentic AI infrastructure on top of existing platforms, including:
- o Supporting agent execution within established networking, security, and compliance boundaries
- o Enabling scalability, observability, and governance for agent behavior
- • Hands-on experience supporting agentic SDLC tools and processes, such as:
- o AI-assisted coding, testing, and deployment workflows
- o Agent-based automation within CI/CD and operational processes
- • Solid understanding of Linux, networking, and cloud security fundamentals.
- • Experience with containerized and Kubernetes-based platforms (Docker, EKS).
- • Familiarity with MLOps or AI platform components (model serving, vector databases, feature stores).
Nice-to-Have Skills Description:
- • Experience designing internal developer platforms (IDPs).
- • Knowledge of policy-as-code and governance frameworks (OPA, SCPs, tagging strategies).
- • AWS certifications (Solutions Architect, DevOps Engineer).
- • Experience operating platforms at enterprise scale or in regulated environments.