Software Engineer (Site Reliability), Retail Engineering – 200566590 -Austin, Texas, United States

Apple

Carrier Services offer seamless integration of Apple Retail Stores and Apple Online store with major US Carriers for iPhone activations.

We are looking for a talented Site Reliability Engineer to join our growing team.

As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of our systems and services. You will work closely with our engineering and operations teams to design, build, and maintain robust infrastructure and automation solutions.

If you are an SRE engineer who can thrive in a dynamic environment and can make a meaningful impact through your technical expertise and dedication to excellence, come join our team as a Site Reliability Engineer (SRE).

This role demands extensive hands on experience of working as SRE engineer for large scale, customer facing Cloud applications. Candidate should have good understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts. Candidate should have excellent troubleshooting and problem solving skills

Candidate will be expected to represent the SRE organization in design reviews and operational readiness exercises for new and existing services. They will also be required to collaborate with technical and non technical teams and analyze stats to come up with a clear picture on current state of our system. Having good working knowledge of Oracle and Cassandra databases will be beneficial in this regard.

Candidate should have a passion to automate manual operations and to improve them through repeated iteration.They should have good understanding of networking and load balancing concepts and should be able to lead a small team and come up with innovative solutions. They should be self motivated, capable of taking business critical decisions and should be comfortable working in a dynamic, ever changing environment. Candidate should be proactive in dealing with critical production issues and take them to closure while working with required partners.

Participate in an on call rotation providing hands-on technical expertise during service impacting events

2 years of hands on experience as an SRE engineer supporting large scale micro services applications.2 years of experience in deploying, supporting and monitoring Cloud services in a large scale, customer facing environment.2 years of hands on experience in developing Java based applications.2 years of hands on experience building complex queries and dashboard using Splunk.2 years of promoting observability of systems for monitoring, alerting, and metrics reporting using Datadog, Prometheus and similar tools.2 years proficiency with at least 1 scripting language like Python etc.2 years hands on experience working with Kubernetes, Docker, and containerizationProven track record for eliminating repetitive manual processes using automation2 years working on maintenance tasks for Oracle and Cassandra Databases.

 

Job Overview