People at Apple don’t just build products — they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many of these daily experiences possible. If you’ve used Apple products, you’ve likely interacted with us. iCloud Services SRE teams are responsible for the systems and services that directly support those customers and their experiences. We focus on availability and automation of key services that run iCloud every minute of every day all around the world.
We are looking for an SRE with experience building and supporting machine learning (ML) infrastructure. You will apply SRE best practices to ensure the availability, reliability, and performance of our ML systems and services. You will actively engage with our development partners and product teams regularly so the ML services we well aligned with business needs.
If you love designing and running systems and infrastructure that will delight millions of customers this team is for you!
Responsibilities will include:
Support and maintain ML services by measuring and monitoring availability, latency, and overall system health
Deploy and support existing and new ML models and infrastructure
Provide insights to partner stakeholders through log and telemetry analysis
Maintaining documentation and automating manual processes where possible
Be part of an oncall rotation providing hands-on technical expertise during service impacting events
Collaborate with other engineers on code, infrastructure, and design reviews, and process enhancements