Search by job, company or skills

Amadeus

Lead Service Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Title

Lead Service Reliability Engineer

Purpose of the role

The Lead Site Reliability Engineering for Stratos will be responsible for ensuring the reliability, performance and scalability of our mission-critical platforms. In this role, you will be safeguarding operational excellence in the products under Stratos, influence reliability strategies, integral in production incident response, and help improve operational metrics.

The role requires a deep and/or broad expertise in our environment architecture to drive efficiency improvements. It involves recommending solutions and best practices, shaping departmental strategy, and converting strategic objectives into actionable plans for the area. Additionally, the role includes setting clear targets for the team and monitoring progress to ensure alignment with goals. Collaboration is key, as you will work closely with teams such as Development and Amadeus Production Support to make configuration changes or design and develop code that meets target SLOs. You will identify opportunities to optimize costs while maintaining stability, which may include leading toil-reduction initiatives, managing capacity planning and tuning, updating SOPs, and developing code for performance improvements. This is a hybrid role requiring on-site presence 23 days per week.

In This Role You'll

  • Define and track Service Level Indicators (SLIs), Objectives (SLOs), and Error Budgets in partnership with engineering and product leads
  • Collaborate with Operations and Development teams to drive service reliability, availability, and scalability
  • Influence architecture and deployment standards to align with SRE principles
  • Drive and participate in toil reduction projects to minimize if not eliminate recurring manual activities performed by the team
  • Champion observability, automation, and infrastructure-as-code practices to reduce manual intervention and improve system health
  • Establish feedback loop with development teams for them to have visibility on the how stable and reliable their services are in client environments
  • Drive production incident response and lead root cause analysis and continuous improvement
  • Design/Develop operational improvement items with development teams working with them closely in prioritizing these improvements
  • Provide input on process improvements to Change, Release, and Incident Management
  • Create and implement support playbooks that resources can use as part of emergency response to production issues

About The Ideal Candidate

  • Knowledgeable and experienced in utilizing different Azure resources such as Storage, Network, Functions, Logic Apps. App Services and AKS
  • Strong technical expertise on Azure DevOps, developing in git and working on gitops repo and build/release pipelines
  • Have hands-on experience in developing Azure Powershell scripts, Azure Runbooks, or any other infrastructure automation tools
  • Knowledgeable in cloud platform and AI technologies
  • Experienced with monitoring and logging tools (Grafana, Dynatrace, Splunk)
  • Proven ability to adapt to emerging cloud technologies and industry leading DevOps applications such as Terraform, Docker Containers, and Kubernetes
  • Knowledgeable in cloud implementation of Navitaire products across different cloud infrastructure models
  • Understands production environments and processes and ways on how they can be further optimized through various Azure features and other cloud technologies/services
  • Proven ability to drive problem solving efforts through effective issue analysis
  • Has the ability to lead efforts to implement infrastructure changes to increase environment stability and support scalability
  • Has the ability to drive collaborations with different Navitaire teams in enforcing environment standards and policies
  • Effectively works in a team environment and contributes in building capabilities of team members
  • Proficient in C#
  • Proven ability to work in a dynamic, fast-paced and multi-cultural environment
  • Willing to work on shifting schedules

Diversity & Inclusion

Amadeus aspires to be a leader in Diversity, Equity and Inclusion in the tech industry, enabling every employee to reach their full potential by fostering a culture of belonging and fair treatment, attracting the best talent from all backgrounds, and as a role model for an inclusive employee experience.

Amadeus is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to gender, race, ethnicity, sexual orientation,age, beliefs, disability or any other characteristics protected by law.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 138600303