
Search by job, company or skills

Firmus Technologies
Firmus Technologies is a global leader pioneering the solution to AI's energy challenge, founded in Australia in 2019 by a visionary team of entrepreneurs. Our mission is to create the most energy-efficient AI infrastructure, combining cutting edge technology with a steadfast commitment to sustainability.
Through ground-breaking research and development, we invented a verticalized AI Factory - a new class of digital infrastructure that replaces traditional data centres. Built on new approaches to liquid cooling, energy management, water use and modular construction methodology, the Firmus AI Factory delivers low-cost AI tokens across Asia-Pacific.
Firmus AI Cloud
We provide customers with access to energy savings via our large-scale GPU cloud, Firmus AI Cloud. Rated Silver in The GPU Cloud ClusterMAX™ Rating System, our cloud empowers developers, enterprise, education and government users to train AI models with unmatched efficiency and cost savings. With an ever-growing list of services and applications, we are committed to building a cloud experience for our customers that is market-leading, proprietary and built to scale.
Why you'll love working here
ROLE SUMMARY
Firmus Technologies is seeking a skilled Site Reliability Engineer to join our Operations team, supporting the daily operations and maintenance of our AI-accelerated High-Performance Computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams, and assist the Global Operations Centre (GOC). This is a unique opportunity to contribute directly to the stability and growth of cutting-edge AI infrastructure.
SKILLS AND EXPERIENCE
Location & Reporting
Employment Basis
Full-time
Diversity
At Firmus, we are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions.
Join us in our mission to revolutionize the AI industry through sustainable practices and cutting-edge engineering. Apply now to be part of shaping the future of sustainable AI infrastructure.
Job ID: 149111513
Skills:
Elk, Prometheus, Bash, Grafana, Distributed Systems, Containers, Scripting Languages, automation, Python, Kubernetes, Infrastructure as Code tools, Go, cloud platforms, monitoring and observability tools, OpenTelemetry, Linux systems
Skills:
Kibana, Prometheus, Cloud Formation, Groovy, Datadog, Gcp, Terraform, Linux, Azure, Kubernetes, Python, AWS, Go
Skills:
.NET, Java, Continuous Integration, Spring Boot, Networking Technologies, Continuous Delivery, Terraform, Ansible, Python, AWS, AI capabilities, container orchestration, observability
Skills:
Java, Shell, Virtual Machines, Linux, Containers, Load Balancing, Python, middleware, K8S, Go, AI models
Skills:
Unix, Elk, Prometheus, Bash, Grafana, Datadog, Incident Response, Terraform, Docker, Linux, Ansible, Splunk, Python, Kubernetes, Infrastructure as Code, Go, cloud platforms, Monitoring, Root Cause Analysis, alerting, automation scripts, observability solutions
We don’t charge any money for job offers