Appleās Compute Frameworks team in GPU, Graphics and Displays org provides a suite of high-performance data parallel algorithms for developers inside and outside of Apple for iOS, macOS and Apple TV. Our efforts are currently focused in the key areas of linear algebra, image processing, machine learning, along with other projects of key interest to Apple. We are always looking for exceptionally dedicated individuals to grow our outstanding team to lay the foundation of technologies like Apple Intelligence.
Our team is seeking extraordinary machine learning and GPU programming engineers who are passionate about providing robust compute solutions for accelerating machine learning networks on Apple Silicon using GPU and Neural Engine.
Role has the opportunity to influence the design of compute and programming models in next generation GPU and Neural Engine architectures.
Responsibilities:
* Adding optimizations in machine learning computation graph.
* Defining and implementing APIs in Metal Performance Shaders Graph, investigating new algorithms.
* Developing and maintaining MLIR dialect in Apple and open source with upgrades using latest LLVM.
* Performing in-depth analysis, compiler and kernel level optimizations to ensure the best possible performance across hardware families.
* Tune GPU and Neural Engine accelerated compute across products.
* Tuning the cost model and optimizing runtime dispatch to multiple IPs to get best performance on Apple Silicon.
Intended deliverables:
* GPU Compute acceleration technology.
* Apple Intelligence implementation and acceleration.
* Optimized compute graphs across products.
If this sounds of interest, we would love to hear from you!
Proven programming and problem-solving skills.Good understanding of machine learning fundamentals.GPU compute programming models & optimization techniques.GPU compute framework development, maintenance, and optimization.Experience with system level programming and computer architecture.Experience with high performance parallel programming, GPU programming or LLVM/MLIR compiler infrastructure is a plus.