We could not locate the page you were looking for.

Below we have generated a list of search results based on the page you were trying to reach.

404 Error
Today we’re joined by Mido Assran, a research scientist at Meta’s Fundamental AI Research (FAIR). In this conversation, we discuss V-JEPA, a new model being billed as “the next step in Yann LeCun's vision” for true artificial reasoning. V-JEPA, the video version of Meta’s Joint Embedding Predictive Architecture, aims to bridge the gap between human and machine intelligence by training models to learn abstract concepts in a more efficient predictive manner than generative models. V-JEPA uses a novel self-supervised training approach that allows it to learn from unlabeled video data without being distracted by pixel-level detail. Mido walks us through the process of developing the architecture and explains why it has the potential to revolutionize AI.
Mido Assran is a Research Scientist at Meta in Fundamental AI Research (FAIR). Previously, Mido obtained his PhD in Electrical and Computer Engineering from McGill University and Mila, the Quebec AI Institute, where he was primarily advised by Michael Rabbat. His current research focuses on advancing self-supervised representation learning and low-shot prediction. In the past, Mido worked on parallelizing deep reinforcement learning, and large-scale optimization. His research on representation learning has been featured in several media outlets, including VentureBeat, TechCrunch, and SiliconANGLE. He was also fortunate to have a featured profile piece in the ICCV daily magazine. Mido served as an expert reviewer for ICML and NeurIPS, and received Best Reviewer awards at NeurIPS'20, ICML'20, ICML'21, and AAAI'20.
Today we’re joined by Sherry Yang, senior research scientist at Google DeepMind and a PhD student at UC Berkeley. In this interview, we discuss her new paper, "Video as the New Language for Real-World Decision Making,” which explores how generative video models can play a role similar to language models as a way to solve tasks in the real world. Sherry draws the analogy between natural language as a unified representation of information and text prediction as a common task interface and demonstrates how video as a medium and generative video as a task exhibit similar properties. This formulation enables video generation models to play a variety of real-world roles as planners, agents, compute engines, and environment simulators. Finally, we explore UniSim, an interactive demo of Sherry's work and a preview of her vision for interacting with AI-generated environments.
Sherry is a PhD student at UC Berkeley advised by Pieter Abbeel and a senior research scientist at Google DeepMind. Her research aims to develop machine learning models with internet-scale knowledge to make better-than-human decisions. To this end, she has developed techniques for generative modeling and representation learning from large-scale vision, language, and structured data, coupled with developing algorithms for sequential decision making such as imitation learning, planning, and reinforcement learning. Sherry initiated and led the Foundation Models for Decision Making workshop at NeurIPS 2022 and 2023, bringing together research communities in vision, language, planning, and reinforcement learning to solve complex decision making tasks at scale.  Before her current role, Sherry received her Bachelor’s degree and Master’s degree from MIT advised by Patrick Winston and Julian Shun.
Today we’re joined by Sayash Kapoor, a Ph.D. student in the Department of Computer Science at Princeton University. Sayash walks us through his paper: "On the Societal Impact of Open Foundation Models.” We dig into the controversy around AI safety, the risks and benefits of releasing open model weights, and how we can establish common ground for assessing the threats posed by AI. We discuss the application of the framework presented in the paper to specific risks, such as the biosecurity risk of open LLMs, as well as the growing problem of "Non Consensual Intimate Imagery" using open diffusion models.
Sayash Kapoor is a Ph.D. student in the Department of Computer Science. His research critically investigates machine learning methods and their use in science and has been featured in WIRED, the Los Angeles Times, and Nature, among other media outlets. At Princeton, Kapoor organized The Reproducibility Crisis in ML-based Science, a workshop that saw more than 1,700 registrations. He has worked on machine learning in several institutions and academia, including Facebook, Columbia University, and EPFL Switzerland. He is a recipient of a Best Paper Award by ACM FAccT and an Impact Recognition award by ACM CSCW. Kapoor is co-authoring a book on “AI Snake Oil” with Princeton Prof. Arvind Narayanan that looks critically at what AI cannot do.
Today we’re joined by Akshita Bhagia, a senior research engineer at the Allen Institute for AI. Akshita joins us to discuss OLMo, a new open source language model with 7 billion and 1 billion variants, but with a key difference compared to similar models offered by Meta, Mistral, and others. Namely, the fact that AI2 has also published the dataset and key tools used to train the model. In our chat with Akshita, we dig into the OLMo models and the various projects falling under the OLMo umbrella, including Dolma, an open three-trillion-token corpus for language model pretraining, and Paloma, a benchmark and tooling for evaluating language model performance across a variety of domains.
Akshita is a Senior Research Engineer on the AllenNLP team, involved in R&D for natural language processing (NLP). Most recently, Akshita has been working on the OLMo project, where she has contributed to pretraining dataset construction, model training and inference, and evaluation tools and benchmark. She has also worked on open-source libraries such as allennlp, ai2-tango, etc. Akshita graduated with a Master’s degree in Computer Science from the University of Massachusetts Amherst in 2020, where she worked with Prof. Mohit Iyyer at the intersection of NLP and digital humanities. Previously, Akshita worked at Cerebellum Capital (Summer 2019), and at InFoCusp (2015-2018), where she worked on building a data science platform. In her spare time, Akshita enjoys reading novels, writing (especially poetry), and dancing.
Today we’re joined by Ben Prystawski, a PhD student in the Department of Psychology at Stanford University working at the intersection of cognitive science and machine learning. Our conversation centers on Ben’s recent paper, “Why think step by step? Reasoning emerges from the locality of experience,” which he recently presented at NeurIPS 2023. In this conversation, we start out exploring basic questions about LLM reasoning, including whether it exists, how we can define it, and how techniques like chain-of-thought reasoning appear to strengthen it. We then dig into the details of Ben’s paper, which aims to understand why thinking step-by-step is effective and demonstrates that local structure is the key property of LLM training data that enables it.
Ben Prystawski is a third-year PhD student in the Department of Psychology at Stanford University, advised by Professor Noah Goodman. He works at the intersection of cognitive science and machine learning, studying reasoning and cultural learning. His research combines machine learning methods, Bayesian models, and human experiments to understand human and machine intelligence.
Today we're joined by Armineh Nourbakhsh of JP Morgan AI Research to discuss the development and capabilities of DocLLM, a layout-aware large language model for multimodal document understanding. Armineh provides a historical overview of the challenges of document AI and an introduction to the DocLLM model. Armineh explains how this model, distinct from both traditional LLMs and document AI models, incorporates both textual semantics and spatial layout in processing enterprise documents like reports and complex contracts. We dig into her team’s approach to training DocLLM, their choice of a generative model as opposed to an encoder-based approach, the datasets they used to build the model, their approach to incorporating layout information, and the various ways they evaluated the model’s performance.
Armineh Nourbakhsh is an Executive Director at J.P. Morgan AI Research, where she leads the Document AI team. Her career spans 15 years of research in Natural Language Processing and Multimodal Machine Learning, and her work has been deployed in award-winning technologies such as Reuters Tracer and Westlaw Quick Check. In 2020, Armineh was the recipient of the WBC Rising Star Award, celebrating the next generation of women leading technology and innovation in Financial Services.
Today we’re joined by Sanmi Koyejo, assistant professor at Stanford University, to continue our NeurIPS 2024 series. In our conversation, Sanmi discusses his two recent award-winning papers. First, we dive into his paper, “Are Emergent Abilities of Large Language Models a Mirage?”. We discuss the different ways LLMs are evaluated and the excitement surrounding their“emergent abilities” such as the ability to perform arithmetic Sanmi describes how evaluating model performance using nonlinear metrics can lead to the illusion that the model is rapidly gaining new capabilities, whereas linear metrics show smooth improvement as expected, casting doubt on the significance of emergence. We continue on to his next paper, “DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models,” discussing the methodology it describes for evaluating concerns such as the toxicity, privacy, fairness, and robustness of LLMs.
Today we’re joined by Kamyar Azizzadenesheli, a staff researcher at Nvidia, to continue our AI Trends 2024 series. In our conversation, Kamyar updates us on the latest developments in reinforcement learning (RL), and how the RL community is taking advantage of the abstract reasoning abilities of large language models (LLMs). Kamyar shares his insights on how LLMs are pushing RL performance forward in a variety of applications, such as ALOHA, a robot that can learn to fold clothes, and Voyager, an RL agent that uses GPT-4 to outperform prior systems at playing Minecraft. We also explore the progress being made in assessing and addressing the risks of RL-based decision-making in domains such as finance, healthcare, and agriculture. Finally, we discuss the future of deep reinforcement learning, Kamyar’s top predictions for the field, and how greater compute capabilities will be critical in achieving general intelligence.
Today we’re joined by Ram Sriharsha, VP of engineering at Pinecone. In our conversation, we dive into the topic of vector databases and retrieval augmented generation (RAG). We explore the trade-offs between relying solely on LLMs for retrieval tasks versus combining retrieval in vector databases and LLMs, the advantages and complexities of RAG with vector databases, the key considerations for building and deploying real-world RAG-based applications, and an in-depth look at Pinecone's new serverless offering. Currently in public preview, Pinecone Serverless is a vector database that enables on-demand data loading, flexible scaling, and cost-effective query processing. Ram discusses how the serverless paradigm impacts the vector database’s core architecture, key features, and other considerations. Lastly, Ram shares his perspective on the future of vector databases in helping enterprises deliver RAG systems.