Data is central to training and validating every aspect of autonomous vehicle software, and this requires the redefinition of the entire AI software infrastructure. In this talk we’ll look at NVIDIA’s internal end-to-end AI platform, MagLev, which enables continuous data ingest from cars producing multiple terabytes of data per hour, and enables AI designers to iterate on new neural network designs across thousands of GPU systems and validate their behavior over petabyte-scale datasets. We’ll talk about our overall stack, from data center deployment to pipeline automation, large-scale dataset management, training, and testing. We’ll also review how the platform provides traceability and reproducibility for the models it produces.