We are seeking a Site Reliability Engineer (SRE) with a strong software engineering background and a passion for building reliable, scalable, and highly observable systems. As an SRE, you will focus on improving service reliability through automation, reducing operational toil, implementing SLOs and error budgets, and partnering closely with software engineering teams to ensure smooth and stable production operations.
Essential Functions
Reliability Engineer
- Define, measure, and manage SLIs, SLOs, and error budgets across critical services.
- Analyze system performance and identify opportunities to improve reliability, resilience, and scalability.
- Lead reliability reviews and proactively prevent incidents before they impact customers.
Observability & Monitoring
- Build and optimize monitoring, logging, and alerting systems.
- Ensure meaningful, actionable alerts and dashboards.
- Implement distributed tracing.
Automation & Tooling
- Reduce toil through automation.
- Build reliability-focused tools and automated remediation.
CI/CD & Production Readiness
- Enhance CI/CD pipelines for safe, reliable deployments.
- Implement canary, blue/green, and automated rollback mechanisms.
- Enforce production readiness standards.
Incident Management
- Participate in on-call rotations.
- Lead incident response and post-incident reviews.
- Promote blameless incident culture.
Cloud & Infrastructure Engineering
- Design and maintain cloud-native infrastructure across AWS and Azure.
- Work with containers, serverless, and event-driven systems.
- Ensure systems are secure, scalable, and cost-efficient.
Infrastructure as Code (IaC)
- Build infrastructure using Terraform.
- Maintain consistent automated provisioning.
Security & Compliance
- Integrate security into pipelines.
- Support audits and compliance processes.
Essential Qualifications
Education/Certification: Bachelor's degree in Computer Science, Software Engineering, or related field.
Experience Required
- 4+ years in SRE, DevOps, or Platform Engineering.
- Strong programming or scripting experience.
- Hands-on cloud experience.
- Understanding of distributed systems.
- Experience with observability tools.
- Familiarity with chaos engineering and resilience patterns.
Preferred Skills
- Experience with SLIs/SLOs and error budgets.
- Strong software development background.
- Experience building self-service platforms.
BEL USA is proud to be a certified Great Place to Work and Top Workplace!
80% of our employees think BEL is a great place to work.
Grow your career with BEL, a company that puts its people first.
Working at Bel USA, LLC | Great Place To Work