
Search by job, company or skills
Urgent Hiring!
Senior Manager-Technical Operations Engineer
Tasks:
Hands-on engineer with expertise in developing complex, large scale enterprise applications/tools, including self-healing capability, automation, AI solutions, etc.
Responsible for technical aspects of software engineering for assigned applications including design, developing prototypes, and coding assignments.
Empower teams to automate demand driven scalable application deployments in test or production environments.
Apply specialized knowledge of industry standards or practices to assigned initiatives to identify complex and or broad problems and issues and formulate recommendations.
Collaborates with leadership across teams to define solutions, technical implementation to drive software maturity and practices.
Drive the technical roadmap for runtime systems, ensuring the reliability, scalability, and performance of platforms.
Establish and monitor key performance indicators (KPIs) for runtime and resiliency and drive continuous improvement efforts to meet or exceed these metrics.
Provide technical thought leadership and guidance to the team, fostering a culture of innovation/automation, collaboration, and accountability.
Act as a technical contributor by participating in architecture design, code reviews, and troubleshooting complex technical issues.
Design and implement innovative solution/framework that will improve software engineering velocity, infrastructure resiliency and security, and data availability.
Develop common framework components (to be leveraged by enterprise applications), define standards for configuration, monitoring, reliability, and performance engineering.
Qualifications:
Prior relevant IT work experience (minimum 5 years) including systems development activity
Bachelor's degree in computer science, Information Systems, or other related fields
Extensive knowledge of Cloud and Distributed systems - Java, Python, Unix, Databases (MongoDB, Oracle, Db2, Couchbase etc.), Soap & Rest API, Kafka, etc.
Extensive knowledge of observability - Dynatrace, Grafana, Prometheus, ELF, Kibana, etc.
Experience in building automations (using scripting/Ansible/Python/Java etc.)
Strong troubleshooting and analytical skills, debugging techniques for root cause analysis.
Expertise in identifying Application/Infrastructure risks necessary for highly available systems, mitigation strategy and the ability to work well with others to ensure risks are mitigated.
ITIL (ServiceNow) and Agile knowledge.
Experience with using CICD tools (Jenkins, GitHub etc.)
Strong knowledge of site reliability engineering (SRE) and non-functional requirements (NFRs)
Send CVs for screening!
Job ID: 142160657