We are seeking a highly skilled Systems and Monitoring Engineering Lead to oversee our global infrastructure health and monitoring strategy. With 7–10 years of experience in infrastructure operations and a proven track record of leadership, you will ensure the stability, performance, and availability of our critical systems.
This role is a blend of technical expertise and strategic leadership, requiring a deep understanding of enterprise monitoring suites, server administration, and the ability to lead a 24x7 NOC/operations environment in a fast-paced global setting.
Key Responsibilities
- Team Leadership: Supervise and mentor a team of engineers within the NOC/Operations environment, driving technical excellence and professional growth.
- Monitoring Strategy: Design, implement, and maintain enterprise monitoring solutions using tools such as WhatsUp Gold, SolarWinds, SCOM, and Nagios.
- Infrastructure Oversight: Administer and troubleshoot Windows Server and Linux environments, ensuring seamless integration with VMware vSphere virtualization.
- Incident Management: Act as a lead for critical incidents, coordinating between technical teams and stakeholders to ensure rapid resolution and adherence to SLAs.
- Process Optimization: Champion ITIL processes, focusing on Event Management, Incident Management, and Change Management to improve operational maturity.
- Network Collaboration: Leverage a strong understanding of TCP/IP, DNS, and Firewall configurations to diagnose complex connectivity issues across the global stack.
- ITSM Excellence: Manage workflows and reporting within ServiceNow (or JIRA/Rally) to ensure high-quality data and efficient ticket lifecycles.
Required Qualifications
- Experience: 7–10 years in Infrastructure Operations, Compute Administration, or 24x7 NOC environments.
- Leadership: 2+ years of experience in a team lead or supervisory role within a technical operations team.
- Monitoring Expertise: Deep hands-on experience with:
- WhatsUp Gold, SolarWinds, and SCOM.
- Experience with Nagios is highly preferred.
- Systems Knowledge:
- Solid administration skills in Windows Server.
- Foundational knowledge of Linux and VMware vSphere concepts.
- Networking: Working knowledge of TCP/IP, DNS, routing, ports, and firewalls.
- Tools: Experience with ServiceNow (preferred), JIRA, or Rally.
Preferred Skills & Attributes
- ITIL Certification: Highly desirable.
- Communication: Exceptional verbal and written communication skills with the ability to translate technical issues for non-technical stakeholders.
- Agility: Proven ability to operate effectively in a high-pressure, fast-paced global environment.
- Analytical Mindset: A proactive approach to identifying potential system bottlenecks before they impact the business.