Search by job, company or skills

T

Observability SME

Save
  • Posted 16 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role Overview

We are seeking a Highly skilled Observability SME/Lead to lead the Observability tower. The SME will be responsible for designing, implementing, and optimizing observability frameworks to ensure seamless monitoring and visibility into applications, infrastructure, and networks.

Required qualifications:

10+ years of Deep expertise in observability and monitoring platforms (Prometheus, Grafana, Splunk, Datadog, Dynatrace, ELK, AppDynamics, etc.).

Prior experience leading enterprise-wide Observability transformations.

Key Responsibilities

1. Design and implement end-to-end observability strategies/Frameworks

2. Develop and maintain architecture standards, best practices, and governance models for observability solutions.

3. Integrate Observability with other tools (e.g., ITSM, AIOps, and logging platforms).

4. Ensure scalability, high availability, and performance of monitoring solutions.

5. Collaborate with DevOps, IT, and business teams to align observability strategies with organizational goals.

6. Conduct training sessions to empower teams with observability best practices.

7. Analyze metrics, logs, and traces to detect anomalies and performance bottlenecks.

8. Generate and distribute performance reports to stakeholders.

9. Fine-tune alerting thresholds and configurations.

10. Collaborate with incident response teams to troubleshoot and resolve issues.

Required Skills

Expertise in implementation and admin activities

Strong knowledge in open telemetry.

Good experience in data enrichment and data standardization

Proficiency in cloud platforms (AWS, Azure, GCP) and their monitoring capabilities.

Experience in containerized environments (Kubernetes, Docker) and related monitoring.

Knowledge of scripting (Python, Bash, PowerShell) for automation.

Understanding of AIOps and integration with observability platforms.

Familiarity with protocols like SNMP, REST API, and log forwarding.

Proficiency in creating dashboards, custom queries, and alerts in Dynatrace and Zabbix.

Understanding of monitoring key performance indicators (KPIs) for applications and infrastructure.

Soft Skills

Leadership in cross-functional team environments.

Excellent problem-solving and analytical skills.

Ability to convey complex observability concepts to stakeholders.

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 148959349