About the Role
We are looking for a hands-on and driven DevOps Engineer to help scale and secure our Kubernetes-based production platform. Nearly all our applications run on Kubernetes (primarily managed on-premises Kubernetes platform and Amazon EKS). This role is central to ensuring system reliability, operational excellence, and continuous improvement across our cloud-native environment. You will work closely with Engineering and Operations teams to operate production workloads, improve automation, strengthen security practices, and enhance our CI/CD pipelines. If you enjoy owning systems end-to-end and working on real production infrastructure, this role is for you.
Your Role
Kubernetes & Container Platform Management
- Operate and maintain production workloads running on Kubernetes (managed on-prem clusters and Amazon EKS)
- Deploy, upgrade, and manage applications using Helm or similar package management tools
- Troubleshoot cluster-level and workload-level issues (networking, storage, scaling, resource constraints)
- Implement and maintain Kubernetes best practices for security, RBAC, and namespace isolation
- Optimize resource utilization and autoscaling configurations
Cloud & Infrastructure Management
- Support and manage AWS infrastructure (EKS, EC2, S3, IAM, VPC, networking, security groups)
- Implement and maintain Infrastructure as Code using Terraform, AWS CloudFormation
- Participate in infrastructure design reviews and capacity planning
System Reliability, Security & Production Support
- Maintain high availability and performance across production environments
- Participate in a structured on-call rotation to support production systems
- Respond to incidents, perform root cause analysis (RCA), and drive preventive improvements
- Contribute to post-incident reviews focused on learning and long-term resilience
- Continuously monitor and remediate infrastructure, container image, and dependency vulnerabilities
- Ensure timely upgrades of base images, Helm charts, system packages, and third-party libraries
- Collaborate with development teams to resolve security findings (e.g., CVEs) in code repositories
- Strengthen infrastructure and container security posture across all environments
CI/CD, Automation & Continuous Improvement
- Maintain and enhance CI/CD pipelines supporting containerized deployments
- Automate builds, testing, and Kubernetes-based deployments
- Improve release processes to minimize downtime and deployment risk
- Use tools such as GitHub Actions, Jenkins, and Ansible to reduce manual operational tasks
- Promote DevOps and cloud-native best practices across teams.
Your Qualifications
- At least 2 years of experience in DevOps, SRE, or Systems Engineering
- Hands-on experience managing production workloads in Kubernetes
- Experience with Amazon EKS
- Experience deploying applications using Helm (or similar tooling)
- Experience with containerization (Docker)
- Experience with vulnerability scanning and remediation workflows
- Experience with Bash and/or Python scripting
- Strong troubleshooting skills and operational mindset
- Willingness to participate in a structured on-call rotation
Plus points if you have:
- Experience operating multi-environment Kubernetes clusters (dev/staging/prod)
- Experience implementing autoscaling strategies (HPA, cluster autoscaler)
- Experience with monitoring and observability tools (Prometheus, Grafana, ELK Stack, Graylog)
- Familiarity with Kubernetes networking concepts (Ingress, Services, CNI)
- Experience supporting production systems with monitoring and alerting
- Cloud certifications (e.g., AWS Certified DevOps Engineer or equivalent)