Search by job, company or skills

O

Senior Site Reliability Engineer

5-7 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 12 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Your Role

  • Serve as Subject Matter Expert (SME) for distributed applications on hybrid cloud platforms, documenting best practices and providing guidance to peers.
  • Champion continuous operational improvements informed by metrics analysis and customer feedback.
  • Lead incident management, troubleshooting, response coordination, and conduct comprehensive post-incident reviews.
  • Clearly communicate complex technical issues to development teams, document root causes, and collaborate internally to create robust solutions.
  • Manage, deploy, and maintain enterprise applications and cloud-based systems using secure, scalable, and reliable frameworks.
  • Proactively monitor, troubleshoot, and optimize the health, performance, and reliability of applications and platforms.
  • Perform detailed log analysis and utilize stack traces to debug and resolve issues reported by partners and end-users.
  • Develop comprehensive documentation covering operational procedures, system configurations, and environment setups.
  • Continuously identify and implement automation opportunities to reduce manual tasks and operational overhead.
  • Train junior engineers in different subjects of expertise.
  • Participate in a 24x7 shifting rotation.

Your Qualifications

  • Bachelor's degree in Information Technology, Engineering, or a related technical field.
  • Minimum 5 years of experience supporting critical, high-availability production systems with a focus on automation, reliability, and operational excellence.
  • At least 5+ years of hands-on experience in at least 12 tools per domain:
  • Linux Administration & Troubleshooting: RHEL, CentOS, Ubuntu, or similar Unix-based OS.
  • Distributed Applications: Microservices architecture and distributed application support.
  • Logging & Monitoring: Splunk, Grafana, Prometheus.
  • Incident Management: PagerDuty, ServiceNow.
  • Version Control: Git, GitHub, GitLab.

Plus points if you have:

  • Certifications such as CKA, CKAD, or cloud certifications (AWS, Azure, GCP).
  • Experience supporting and maintaining PaaS environments, CDNs, Messaging Queues, API Gateways, and Proxies in scalable, resilient architectures.
  • Proven success in cross-functional collaboration within modern DevOps environments.
  • Ability to drive operational efficiency through automation, using Bash, Python, or similar scripting languages.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 141925751