What to Expect
At Tesla, you will have access to unparalleled resources that set us apart from other companies in the AI industry. You will have access to the largest self-driving dataset in the world, providing a unique, and perhaps the only, environment to investigate scaling laws for sequential decision-making problems. Tesla also offers one of the highest GPU resources per engineer in the industry, giving you a significantly larger computational budget compared to typical AI research environments.
These unique resources enable you to conduct experiments and scaling analyses at a level unmatched by any other company, allowing you to grow as a researcher by tackling challenges that others simply cannot offer. By working at Tesla, you’ll have the opportunity to push the boundaries of AI and autonomous driving technology while advancing your skills in an environment that truly values innovation and cutting-edge research.
What You’ll Do
Perform scaling law analyses on model size, data size, data mixture, training compute, and other critical parameters to optimize our AI models using the largest self-driving dataset in the world
Develop and implement novel architectures and algorithms to effectively scale large End-to-End (E2E) self-driving models
Create and maintain infrastructure for efficient, large-scale distributed training of E2E models, resolving compute and memory bottlenecks for training and inference
Evaluate and enhance model performance, with a focus on increasing miles driven without human intervention
Work closely with cross-functional teams to deploy AI models in production, ensuring they meet stringent performance and reliability standards
Contribute to the development of tools and frameworks that improve the scalability and efficiency of model training and deployment processes
What You’ll Bring
Proven experience in scaling and optimizing large AI models, with a strong understanding of infrastructure challenges and solutions
Proficiency in Python and a deep understanding of software engineering best practices
In-depth knowledge of deep learning fundamentals, including optimization techniques, loss functions, and neural network architectures
Experience with deep learning frameworks such as PyTorch, TensorFlow, or JAX
Strong expertise in distributed computing and parallel processing techniques
Demonstrated ability to work collaboratively in a cross-functional team environment
Strong problem-solving skills and the ability to troubleshoot complex system-level issues
Palo Alto, California
Full time