Search by job, company or skills

Novare

Senior Cloud Engineer (Experience with Grafana Cloud)

Save
new job description bg glownew job description bg glow
  • Posted 7 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are looking for a highly skilled Senior Cloud Engineer (Experience with Grafana Cloud) to join our infrastructure team. In this role, you will be the architect and guardian of our monitoring ecosystem, ensuring that our complex multi-cloud environments are transparent, resilient, and high-performing.

You will focus on the LGTM stack (Loki, Grafana, Tempo, Mimir) to provide deep insights into our distributed systems across public and private cloud sectors.

Key Responsibilities

  • Observability Architecture: Design, implement, and maintain a unified observability platform using the Grafana Labs stack to monitor microservices and infrastructure.
  • Telemetry Pipelines: Configure and optimize the collection of metrics (Mimir), logs (Loki), and distributed traces (Tempo).
  • Infrastructure as Code: Automate the provisioning of observability resources and cloud infrastructure using Terraform.
  • Multi-Cloud Management: Manage and scale monitoring solutions across AWS, GCP, and Private Cloud environments.
  • Kubernetes Orchestration: Deploy and manage observability agents (like OpenTelemetry or Promtail) within K8s clusters.
  • Dashboarding & Alerting: Create sophisticated Grafana dashboards and proactive alerting strategies to reduce Mean Time to Detection (MTTD).
  • Performance Tuning: Optimize storage and query performance for high-cardinality data within Mimir and Loki.

Technical Skills Required

Primary Stack (The LGTM Stack)

  • Grafana: Expert-level dashboard creation, variable mapping, and visualization.
  • Loki: Deep understanding of log aggregation, Label management, and LogQL.
  • Mimir: Experience scaling Prometheus-compatible metrics at a massive scale.
  • Tempo: Implementation of distributed tracing to debug latency and request flows.

Infrastructure & Cloud

  • Containerization: Advanced knowledge of Kubernetes (K8s) (operators, Helm charts, and networking).
  • Public Cloud: Hands-on experience with AWS (CloudWatch, EKS) and GCP (Operations Suite, GKE).
  • Private Cloud: Experience managing infrastructure in on-premise or colocation environments.
  • IaC: Proficiency in Terraform for maintaining Observability as Code.

Qualifications

  • Experience: 3+ years in DevOps, SRE, or Systems Engineering with a heavy focus on monitoring.
  • Philosophy: A monitor everything mindset with a focus on the Four Golden Signals (Latency, Traffic, Errors, and Saturation).
  • Problem Solving: Strong debugging skills across the full stack—from the network layer to application code.
  • Communication: Ability to translate complex telemetry data into actionable insights for developers and stakeholders.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148349223