Sr. Site Reliability Engineer

Sr. Site Reliability Engineer || 3 Positions ||

North Carolina, United States | 2023-01-09 18:05:56 | Posted by : N/A

Apply Now Share Job

Job Code : JPC - 6364

No of Positions : 1

Title: Site Reliability Engineer

Duration: Direct Hire

Location: Charlotte, NC Region (Required to go in office 1 day/2-3 weeks)

Job Description

As part of the cloud operations organization, this position makes significant contributions towards the delivery of DevOps solutions that support best-in-class cloud based microservice applications. Our team is looking for an engineer who is excited about automatic automation. SREs discover ways to help promote the availability of services and applications, improve processes through remediation of manual and/or repetitive tasks, and solve complex technical problems in a fast-paced, collaborative, inclusive, and iterative environment.

The candidate will support the Xcelerator platform and will be responsible for identifying, managing, improving, and reporting on availability, resiliency, reliability, and stability efficiencies. This includes providing technical guidance and leadership to drive solutions, create & enhance processes that deliver excellence. A strong relationship with the various product teams of the Xcelerator platform is necessary to support core objectives. This roles success will be defined by product teams within DISW business units meeting their SLAs.

Responsibilities

Provide & lead the design, deployment, automation, and scripting solutions to drive new capabilities, visibility, and efficiency
Collaborate with other technical platforms and partners to engineer automated and integrated solutions between tools, services, teams that increase availability, reliability, and performance.
Own and ensure the internal and external SLA’s meet and exceed expectations
Be part of maintaining a 24x7, global, highly available SaaS environment
Participate in an on-call rotation that supports our production infrastructure
Troubleshoot production availability incidents that often span across multiple teams and services.
Lead production incident post-mortems, and contribute to solutions to prevent problem recurrence; with the goal of automated response to all non-exceptional service conditions
Communicate to business and technical partners on incidents as they occur when they impact system performance or availability at a critical level Required Knowledge/Skills, Education, and Experience
Bachelor’s Degree with at least 2+ years of IT experience or equivalent experience.
4+ years experience with Amazon Web Services (AWS) services
3+ years experience as a Site Reliability Engineer or equivalent role
3+ years experience with automation via scripting & API development
2+ years experience with monitoring tools
2+ years experience with containerization, specifically Kubernetes
2+ years experience Terraform, CloudFormation, Ansible, or equivalent tools Qualified Applicants must be legally authorized for employment in the United States. Qualified Applicants will not require employer sponsored work authorization now or in the future for employment in the United States.

Preferred Knowledge/Skills, Education, and Experience

Siemens Teamcenter software
Desired certifications include: Security, Kubernetes, AWS or Azure certification
2+ years experience with issue/incident tracking tool (ServiceNOW, ServiceDesk, Jira or equivalent tools)
2+ years experience with open source tools (Linux, Python, Git, Ansible)
2+ years experience Enterprise IT environment with distributed environments
Networking concepts, including firewalls, VPN, routing, load balancers, security and DNS
Senior level system administration experience, including troubleshooting, support, mentorship/training, and oversight

Menu

Sr. Site Reliability Engineer

Get the latest Python jobs in your inbox.