What to ExpectAs a member of the Dojo Machine Learning team, you will
be responsible for enabling Tesla’s neural networks to train efficiently on our
upcoming in-house custom silicon supercomputer systems. Join a small team of
experienced developers in optimizing and scaling the deployment of our Pytorch
derived neural networks on Tesla’s custom massively parallel Dojo accelerators.
Work with many of the same great engineers who delivered Tesla’s custom FSD
Computer. The ideal candidate has experience with writing software for large
distributed systems.What You’ll DoUnderstand and model the end-to-end training performance
of the Autopilot SW team’s Pytorch-derived neural networks on the Dojo systemDevelop software that scales and improves training
performance based on your analysis of the bottlenecksCollaborate with the Dojo HW team to understand current
HW architecture and propose future improvements
What You’ll BringDegree in Engineering, Computer Science, or equivalent in
experience and evidence of exceptional abilityExperience scaling neural network training systems or
other large distributed systemsFamiliarity with the internals of PyTorch and/or JAXPerformance analysis experienceExperience coding parallel programsAble to work from Palo Alto office
PALO ALTO, California
Full time