Search by job, company or skills

B

Site Reliability Engineer, Hybrid Cloud Operation and Delivery - Data Infrastructure

5-7 Years
Save
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Responsibilities

Our team is responsible for infrastructure systems of hybrid cloud, including products in IaaS/PaaS/SaaS/AI models. We strive to be a leading Site Reliability Engineering (SRE) team in the industry, driving reliability, scalability, and performance at scale. As part of the SRE team, you will tackle complex, large-scale challenges, leveraging your expertise in coding, algorithms, complexity analysis, and distributed system design. We foster a culture of diversity, intellectual curiosity, and open collaboration. Engineers are empowered with strong ownership, autonomy, and the opportunity to work across a wide range of impactful projects. What you will be doing: - Responsible for delivery products in hybrid cloud scenarios, including cloud platform planning, software deployment, resource expansion, etc. Collaborate with R&D teams to complete project delivery. - Responsible for the operation of cloud platform environments for internal and external customers, including daily alarm handling, on-call support, change, as well as ensuring stability of cloud platform during important event periods. - Participate in stability construction of cloud products with R&D team, and continuously improve capabilities in high availability architecture, disaster recovery, alarm monitoring, etc, based on the experience we get from large-scale systems on site. - Continuously promote the improvement of hybrid cloud serviceability, participate in the standardized SOW of O&M and delivery for new product versions, and build the SRE serviceability acceptance standards to improve implementation efficiency.

Qualifications

Minimum Qualification(s): - Bachelor's / Master's Degree in Computer Science or related major, with at least 5 years of relevant experience - Solid basic knowledge of computer software, understanding of Linux operating system, network, middleware and other related principles. - Familiar with one or more programming languages, such as Shell, Python, Go, or Java. Knowledge of building scripts or tools to handle different problems. - Experience in operation and maintenance of one or more fields, including virtual machines, containers, K8s, load balancing, middleware, AI models, etc. Preferred Qualification(s): - Experience in operation and maintenance of IDC equipment such as switches and GPU servers - Working experience in cloud platform related vendors

More Info

About Company

ByteDance is a technology company operating a range of content platforms that inform, educate, entertain and inspire people across languages, cultures, and geographies.
Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.
Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.

Job ID: 149229297

Similar Jobs

Singapore

Skills:

.NETJavaContinuous IntegrationSpring BootNetworking TechnologiesContinuous DeliveryTerraformAnsiblePythonAWSAI capabilitiescontainer orchestrationobservability

Singapore

Skills:

KibanaPrometheusCloud FormationGroovyDatadogGcpTerraformLinuxAzureKubernetesPythonAWSGo

Singapore, Alexandra Road

Skills:

ElkPrometheusBashGrafanaDistributed SystemsContainersScripting LanguagesautomationPythonKubernetesInfrastructure as Code toolsGocloud platformsmonitoring and observability toolsOpenTelemetryLinux systems

Orchard Road, Singapore

Skills:

UnixElkPrometheusBashGrafanaDatadogIncident ResponseTerraformDockerLinuxAnsibleSplunkPythonKubernetesInfrastructure as CodeGocloud platformsMonitoringRoot Cause Analysisalertingautomation scriptsobservability solutions

Singapore

Skills:

ElkPrometheusBashGrafanaKubernetesPythonLinux environmentsNvidia BCMSlurm