Search by job, company or skills

Maya

Operations Engineer

4-6 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 20 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

CORE PROFILE

The Technical Operations Engineer acts as a subject matter expert, owning root cause analysis, automation design, and resiliency improvements across core services. Core competencies include Advanced Troubleshooting (performing deep diagnostics across infrastructure, applications, and integrations), Root Cause & Postmortem Ownership (leading RCAs and implementing permanent fixes), Automation & Scripting Proficiency (building workflows or tools that eliminate recurring manual effort), Observability Architecture (designing meaningful dashboards, alerting strategies, and health checks), and Continuous Optimization (proactively identifying performance bottlenecks and resiliency gaps). Success is measured by a visible reduction in incident recurrence, automation coverage of repetitive tasks (2030%+), improved service uptime, and documented best practices that elevate lower support tiers.

NATURE OF WORK

  • Work on shifting schedule (Morning & Mid, 12x4) to ensure 24/7 coverage.
  • Act as the escalation point for high-severity and complex technical incidents.
  • Perform deep diagnostics across infrastructure, databases, applications, APIs, and integrations.
  • Design and develop automation scripts, workflows, or tools to eliminate repetitive manual task.
  • Integrate automation into operational processes, monitoring, and remediation workflows.
  • Design dashboards, alerting strategies, and health checks that provide actionable insights.
  • Reduce noise by improving signal-to-noise ratio in monitoring and alerting systems.
  • Work with engineering teams to strengthen system design, redundancy, and self-healing mechanisms.
  • Document best practices, troubleshooting guides, runbooks, and technical standards.

DISPLAYED SKILL MASTERY

  • Advanced Troubleshooting & Diagnostics
  • Automation & Scripting Expertise
  • Observability & Monitoring Design
  • Root Cause Analysis & Postmortem Leadership
  • System Performance & Resiliency Optimization
  • Cross-Team Communication & Influence
  • Continuous Improvement Mindset
  • Operational Excellence & Ownership

REQUIRED QUALIFICATIONS

  • Bachelor's degree in Computer Science or related field
  • 4+ years of experience in Technical Operations, Site Reliability Engineering / DevOps or similar roles
  • Proficiency in scripting/automation (Python, Bash, JavaScript or similar)
  • Solid understanding of monitoring platforms, log aggregation, traces, and metrics
  • Experience with cloud platforms (AWS, GCP, Azure)
  • Familiarity with automation frameworks, CI/CD pipelines, or configuration management tools.
  • Exposure to observability solutions (Datadog, Splunk, Dynatrace, Prometheus, etc.).
  • Experience tuning performance for high-availability or distributed systems.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 144535399