Find your next great job

A daily email of jobs matching your skills and preferences.

Sign Up 👋

Virtual Service Reliability Engineer

Posted about 2 years ago

Service Reliability Engineer

US-RemotePosting date14 hours ago(10/28/2019 8:36 PM)

Job ID 74041 Category Software Engineering, Systems Engineering, Technical Support

At Red Hat, we connect an innovative community of customers, partners, and contributors to deliver an open source stack of trusted, high-performing solutions. We offer cloud, Linux, middleware, storage, and virtualization technologies, together with award-winning global customer support, consulting, and implementation services. Red Hat is a rapidly growing company supporting more than 90% of Fortune 500 companies.

Job summary

The Red Hat OpenShift Service Reliability Engineering (SRE) team is looking for a Service Reliability Engineer to join us. In this role, you will work on OpenShift, which is enterprise Kubernetes, as part of the first team to host and manage the code in the public cloud. You'll play a key part within the team, as you'll be responsible for keeping the Red Hat OpenShift platform environment available and secure. Along with the rest of your team, you will interact with other site reliability engineers and product engineering associates around the world to deliver large, containerized cluster environments. You'll be responsible for provisioning, upgrades, problem detection and automated recovery scenarios, incident management, and understanding complicated, interconnected data points to resolve faults when issues arise. As a Service Reliability Engineer, you'll need to be able to work in a complicated and fast-paced environment while quickly learning new skills and creating ways to consistently meet service-level agreements (SLAs) and keep a globally-distributed, cloud-based, containerized service (enterprise Kubernetes) running for our customers. Successful applicants must reside in a state where Red Hat is registered to do business.

Primary job responsibilities

Interact with automated monitoring and healing infrastructure to ensure healthy environments

Develop automation to autocorrect or completely prevent issues in our online solutions

Participate in release cycles of our offerings, deploying code to integration, staging, and production environments, integrating with continuous integration (CI) and continuous delivery (CD) tools, monitoring, and participating in change management

Perform software updates, peer code reviews, testing, and Common Vulnerabilities and Exposures (CVE) analysis; respond to security threats

Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions

Resolve customer issues in cooperation with Red Hat's global customer support team

Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in our environment

Participate in a regular shift and on-call rotation; this will include a weekend working schedule

Required skills

5+ years of experience managing Linux servers running Red Hat Enterprise Linux (RHEL), CentOS, or Fedora hosted at a cloud provider like Amazon Web Services (AWS), Google Compute Engine (GCE), or Microsoft Azure

3+ years of experience with enterprise system monitoring; knowledge of Prometheus is a plus

3+ years of experience with enterprise configuration management software like Red Hat Ansible Automation, Puppet, or Chef

2+ years of experience with functional programming languages like Go, C#, Java, PHP, Python, or Ruby

Experience delivering a hosted service

Demonstrated ability to quickly and accurately troubleshoot system issues

Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP

Solid communication skills and experience working directly with and presenting to customers

Experience with Kubernetes is a plus

Experience with Docker-based containers is a plus

Apply Now! 🤞

A new window will open to the job source site.

Job research tailored to you.

Growing a career that's right for you is a life-changer, but it's undeniable that the job search gets tougher every year. With automated hiring processes, resume filters and questionable interview practices, finding a job that a tech skillset has become seriously challenging.

That's where we step in. Careeriscope can help lighten the stress load by making your search a bit easier. We help you find matches based on the job search criteria you set, then send a summary of the results in a daily email sent every morning for review.