Search by job, company or skills

Mahindra Satyam

Site Reliability Engineering Lead

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 20 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Tech Mahindra represents the connected world, offering innovative and customer-centric information technology experiences, enabling Enterprises, Associates, and the Society to Rise. It has 150,000+ professionals working for 1000+ Global Customers (including Fortune 500 companies) in 90 Countries. We're part of the esteemed Mahindra group, headquartered in India. Under a new CEO, Tech Mahindra is committed to a transformative journey with Scale @ Speed as our guiding principle.

Job description:

Site Reliability Engineering (SRE) at combines software and systems engineering with the art of

machine learning to build and run large-scale, massively distributed, and fault-tolerant systems. You will have

the opportunity to sharpen your expertise in coding, performance analysis, and large-scale system design

while making a tangible impact on the future of Infrastructure services and AML systems.

Responsibilities

• Design, build, and maintain highly available, scalable, and fault-tolerant systems. Collaborate with

software engineering teams to ensure applications are designed with reliability and performance in

mind.

• Develop and maintain automation procedures to maximize system efficiency, minimize human

intervention, and optimize routine tasks.

• Monitor and analyze system performance to identify and address bottlenecks before they impact

users. Ensure the infrastructure can handle rapid growth in web traffic and ML data processing.

• Participate in 24/7 on-call rotations (including scheduled shifts and holidays). Practice sustainable oncall

response, conduct root-cause analysis, and lead blameless post-mortems to prevent recurrence.

• Implement monitoring tools (SLIs/SLOs/SLAs) and set up automated alerting and metrics to track

system health and performance.

• Implement and maintain security best practices and ensure all systems meet regulatory requirements.

Job Requirements

Minimum Qualifications:

• Education: Bachelor's or Master's degree in Computer Science, Information Technology, Computer

Engineering, or a related field.

• Experience: 3+ years of experience as a Site Reliability Engineer, Systems Engineer, or Software

Engineer.

• Coding: Proficient in at least one high-level programming language (e.g., Python, Go, C++, or Java)

and shell scripting. Strong understanding of data structures and algorithms.

• Systems: Strong understanding of Linux operating systems and open-source technologies and a

solid understanding of network architecture.

• Databases: Competent knowledge of relational database systems and database modeling.

Preferred Qualifications:

• Experience with containers and container orchestration platforms such as Docker and Kubernetes.

• Proficiency in or exposure to machine learning frameworks such as TensorFlow, PyTorch, MXNet, or

PaddlePaddle.

• Hands-on experience with monitoring tools and methodologies (e.g., Prometheus, Grafana).

• Soft Skills: Strategic thinking, exceptional communication, and the ability to collaborate effectively with

cross-functional teams in a fast-paced environment.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147182831