People at Apple don’t just build products — they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many of these daily experiences possible. If you’ve used Apple products, you’ve likely interacted with us. iCloud Services SRE teams are responsible for the systems and services that directly support our customers and their experiences. We are looking for passionate and talented Site Reliability Engineers to continue our focus on providing our customers the highest quality Apple Services experience. Our services have to scale globally, stay highly available, and “just work.” If you love designing, engineering, and running systems and infrastructure that will help millions of customers, then this is the place for you!
Do you love engineering and running systems and infrastructure that will delight millions of customers? Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. When you bring passion and dedication to your job, and there’s no telling what you could accomplish. Our best candidates have proven software development skills, strong distributed systems expertise, understand SRE principles, and know what it will take to run services at Apple scale. We play a critical role in the day-to-day operations of services that are relied upon across Apple.
Our team leads the reliability engineering for iCloud Identity core services.
IN THIS ROLE, YOU ARE EXPECTED TO DO
1. Lead data-driven roadmap, quarterly planning for a subset of core services from a reliability perspective
2. Be responsible for the subset of core services for the entire software lifecycle from a reliability perspective including infrastructure setup, capacity planning, deployment, monitoring, architecture, and software implementation by collaborating closely with the development team.
We’re looking for a creative, versatile, and passionate person who loves solving engineering problems and working broadly across a fast moving and collaborative organization. If this sounds compelling, please reach out!
5+ years of software development or production operations experience in a large-scale environmentAttention to detail, allergy to ambiguity, and a firm grip on deadlinesSRE Principles – Skills and experience in monitoring, alerting, error budgets, fault analysis, and automationCoding experience using an object-oriented programming language like Java, Golang, or PythonExcellent troubleshooting and problem solving skillsExcellent written and verbal communication skillsExperience in improving the whole lifecycle of global services from inception through deployment, operations, and refinementExperience with Linux/Unix, Networking, Systems Management, Systems SecurityExperience managing large numbers of diverse systemsAbility to participate in on call service support