What to Expect
Our team productionalizes ML models – we train and deploy large neural networks for efficient inference on compute-constrained edge devices (CPU / GPU / AI ASIC). The nature of this role is multi-disciplinary – you will work at the intersection of machine learning and systems by building the ML frameworks and infrastructure that enable the seamless training, deployment, and inference of all neural networks that run on Autopilot and Optimus.
What You’ll Do
Build robust AI frameworks to lower neural networks to edge devices
Build robust AI infrastructure to train and fine-tune networks for Autopilot and Optimus on large GPU clusters
Deploy state-of-the-art neural networks on heterogenous compute, including Tesla’s in-house AI ASIC, with an aim to maximize network performance while minimizing latency
Collaborate with AI scientists and compiler engineers to effectively compress large models to run in low precision
Design and implement custom GPU kernels (CUDA / OpenCL) for efficient training and post-processing of network outputs
What You’ll Bring
Proficiency with Python and C++, including modern C++ (14/17/20)
Proficiency with PyTorch or another machine learning framework
Proficiency with training and deploying neural networks for real-world AI
Proficiency with computer systems and computer architecture
Experience with CUDA
Palo Alto, California
Full time