We could not locate the page you were looking for.

Below we have generated a list of search results based on the page you were trying to reach.

404 Error
I studied physics in Munich at the University of Technology, Munich, at the Universita degli Studi di Pavia and at AT&T Research in Holmdel. During this time I was at the Maximilianeum München and the Collegio Ghislieri in Pavia. In 1996 I received the Master degree at the University of Technology, Munich and in 1998 the Doctoral Degree in computer science at the University of Technology Berlin. Until 1999 I was a researcher at the IDA Group of the GMD Institute for Software Engineering and Computer Architecture in Berlin (now part of the Fraunhofer Geselschaft). After that, I worked as a Researcher and Group Leader at the Research School for Information Sciences and Engineering of the Australian National University. From 2004 onwards I worked as a Senior Principal Researcher and Program Leader at the Statistical Machine Learning Program at NICTA. From 2008 to 2012 I worked at Yahoo Research. In spring of 2012 I moved to Google Research to spend a wonderful year in Mountain View and I continued working there until the end of 2014. From 2013-2017 I was professor at Carnegie Mellon University. I co-founded Marianas Labs in early 2015. In July 2016 I moved to Amazon Web Services to help build AI and Machine Learning tools for everyone.
There are few things I love more than cuddling up with an exciting new book. There are always more things I want to learn than time I have in the day, and I think books are such a fun, long-form way of engaging (one where I won’t be tempted to check Twitter partway through). This book roundup is a selection from the last few years of TWIML guests, counting only the ones related to ML/AI published in the past 10 years. We hope that some of their insights are useful to you! If you liked their book or want to hear more about them before taking the leap into longform writing, check out the accompanying podcast episode (linked on the guest’s name). (Note: These links are affiliate links, which means that ordering through them helps support our show!) Adversarial ML Generative Adversarial Learning: Architectures and Applications (2022), Jürgen Schmidhuber AI Ethics Sex, Race, and Robots: How to Be Human in the Age of AI (2019), Ayanna Howard Ethics and Data Science (2018), Hilary Mason AI Sci-Fi AI 2041: Ten Visions for Our Future (2021), Kai-Fu Lee AI Analysis AI Superpowers: China, Silicon Valley, And The New World Order (2018), Kai-Fu Lee Rebooting AI: Building Artificial Intelligence We Can Trust (2019), Gary Marcus Artificial Unintelligence: How Computers Misunderstand the World (The MIT Press) (2019), Meredith Broussard Complexity: A Guided Tour (2011), Melanie Mitchell Artificial Intelligence: A Guide for Thinking Humans (2019), Melanie Mitchell Career Insights My Journey into AI (2018), Kai-Fu Lee Build a Career in Data Science (2020), Jacqueline Nolis Computational Neuroscience The Computational Brain (2016), Terrence Sejnowski Computer Vision Large-Scale Visual Geo-Localization (Advances in Computer Vision and Pattern Recognition) (2016), Amir Zamir Image Understanding using Sparse Representations (2014), Pavan Turaga Visual Attributes (Advances in Computer Vision and Pattern Recognition) (2017), Devi Parikh Crowdsourcing in Computer Vision (Foundations and Trends(r) in Computer Graphics and Vision) (2016), Adriana Kovashka Riemannian Computing in Computer Vision (2015), Pavan Turaga Databases Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases (2021), Xin Luna Dong Big Data Integration (Synthesis Lectures on Data Management) (2015), Xin Luna Dong Deep Learning The Deep Learning Revolution (2016), Terrence Sejnowski Dive into Deep Learning (2021), Zachary Lipton Introduction to Machine Learning A Course in Machine Learning (2020), Hal Daume III Approaching (Almost) Any Machine Learning Problem (2020), Abhishek Thakur Building Machine Learning Powered Applications: Going from Idea to Product (2020), Emmanuel Ameisen ML Organization Data Driven (2015), Hilary Mason The AI Organization: Learn from Real Companies and Microsoft’s Journey How to Redefine Your Organization with AI (2019), David Carmona MLOps Effective Data Science Infrastructure: How to make data scientists productive (2022), Ville Tuulos Model Specifics An Introduction to Variational Autoencoders (Foundations and Trends(r) in Machine Learning) (2019), Max Welling NLP Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics (2013), Emily M. Bender Robotics What to Expect When You’re Expecting Robots (2021), Julie Shah The New Breed: What Our History with Animals Reveals about Our Future with Robots (2021), Kate Darling Software How To Kernel-based Approximation Methods Using Matlab (2015), Michael McCourt
Sam Charrington: [00:00:00] Welcome to the TWIML AI podcast. I'm your host, Sam Charrington. Hey, what's up, everyone. Before we jump into today's interview, I'd like to give a huge thanks to our friends at Microsoft for their continued support of the podcast. Microsoft's mission is to empower every single person on the planet to achieve more, to inspire customers to reimagine their businesses and the world. Learn more at Microsoft.com/AI and Microsoft.com/innovation. And now, onto the show. All right, everyone. I am here with David Carmona. David is the general manager of artificial intelligence and innovation at Microsoft. David, welcome to the TWIML AI podcast. David Carmona: [00:01:01] Thank you, Sam. Pleasure to be here with you. Sam Charrington: [00:01:04] It is great to have you on the show. And I'm looking forward to digging into our conversation, which will focus on AI at scale and large scale language models, and a bunch of really interesting things you're doing there. Before we jump into the topic, though, I'd love to have you share a little bit about your background and how you came to work on all this cool stuff. David Carmona: [00:01:25] Yeah. Well, I've been in Microsoft for almost 20 years, 19 and a half. Sam Charrington: [00:01:30] Wow. David Carmona: [00:01:30] So, almost getting to that magical [laughs], magical moment. And it's funny because my beginning with Microsoft, I was [inaudible 00:01:37] to Microsoft. That was 20 years ago. So, that was the big Windows moment. Right? But actually, I didn't come to Microsoft because of Windows. I came to Microsoft because of, … At that time, my favorite product, which was Visual Studio. So, I was a developer. I still am a developer. I will always be a developer no matter what I am. Sam Charrington: [00:01:57] [laughs]. David Carmona: [00:01:58] And for me, working in Visual Studio has been like my entire career. So, [inaudible 00:02:04] I started with AI and, and VR probably way too early [laughs]. That didn't end well. So, I ended in traditional development. And I had a ton of fun with that. And I, when I move … I'm originally from Spain. When I moved here to the US [inaudible 00:02:17], I worked in, in, in Visual Studio. So, I ended managing the business for Visual Studio and all our tools like .NET and, and all of that. It was a super fun time because it was that big transition in Microsoft to open development. So, I was lucky to do things like launching TypeScript. Right? Or- Sam Charrington: [00:02:36] Oh, wow. David Carmona: [00:02:36] … open-sourcing .NET or making it cross-platform, or releasing Visual Studio code. Right? So, super fun stuff. But then like five years ago, this AI thing started to become super real. So, [laughs] I was, I was offered to lead a new team in Microsoft, focused on the business, on creating a new business for AI. And I, I didn't think about it twice. So, yeah, that's where I am. So, it's interesting … So, as you can see, my career is always like, between technology and businesses. I think … I, I mean, knock on wood, but I think I'm in, in that great balance right now [laughs]. So, I have both. I'm super fortunate to have both because I work, connecting with Microsoft research and, and the entire organization of technology and research in, Microsoft. My goal, my team's goal is really to connect that with the business. So, we work on … We define it as themes, like bigger themes of innovation in Microsoft. And then we connect those themes to actual real products and technologies that we can take to market. it's super cool. And one of those things … We have many, but one of them … I think like, probably the start of the themes is, is AI at scale. Sam Charrington: [00:03:46] Okay. And so is the role primarily focused on taking innovations that are happening in research to existing Microsoft products? Or is it more focused on creating new business opportunities? Or is there some balance between the two? David Carmona: [00:04:01] Yeah. It's a balance. So, we have … The way that we work in Microsoft on our framework for innovation is based on Horizon. So, we have … We refer to them as the three [inaudible 00:04:10] Horizon. Right? So, we have Horizon 1, two, and three. Three, Horizon 3 are the like, the moonshots, right? Like, longer-term new business creation, new category creation for Microsoft. A lot of that is, driven by curiosity, in most cases, in research. So, we leave a lot of room for researchers to work on those themes. But then we go all the way to Horizon 2, which are things that are really about opening new opportunities or creating new opportunities for existing products. And you can go to Horizon 1 even, which is extending existing products. Right? So, making them better. So, we work in that, in that balance, between the three. Sam Charrington: [00:04:52] Nice. And so you mentioned AI at scale as being one of your big focus areas. What exactly does that mean at Microsoft? David Carmona: [00:05:00] Yeah. So, AI at scale, I mean, we, we named that as a new category. So, it's not that it's a product or anything like that. So, it's how we refer to what we believe is a huge change in the way that we are going to see people developing AI. And it's driven by m- many different things, many different trends and technology breakthroughs. But I think the most important one is this concept of massive models and, and what they mean. Right? So, this, this ability to create now, like, this huge [laughs], massive models with billions of, of parameters. And beyond the technical achievement, the reality is that those massive models are opening new opportunities that go beyond the technology and get into the business. Right? So, we can discuss it today. So, [inaudible 00:05:47] … So, we can spend a lot of time on the technology behind it. And then- Sam Charrington: [00:05:47] Mm-hmm [affirmative]. David Carmona: [00:05:47] … we can, we can focus a little bit on, "Hey, but what does it really mean?" So, how is this going to change the way that any company can develop AI? Right? And, and [inaudible 00:05:59] it's really interesting. And then there's a whole ecosystem around this concept like, that, that you need to, for example, train these models, you need an AI supercomputer. So, that's another piece of the puzzle, right, for AI at scale. Sam Charrington: [00:06:14] So, we talk a lot about the increasing size of models and, you know, particularly in the context of NLP and language models. But help us contextualize that. You know, we throw around, you know, millions of parameters and, you know, hundreds of layers, and things like that. How is it shaking out? Or how do you think of this progression towards larger-size models? David Carmona: [00:06:41] Yeah. I think in, in a sense, you probably remember [laughs] [inaudible 00:06:45] ImageNet moment for, [laughs]- Sam Charrington: [00:06:46] [laughs]. David Carmona: [00:06:47] … for [inaudible 00:06:48] learning. Right? So eh- Sam Charrington: [00:06:49] Uh-huh [affirmative]. David Carmona: [00:06:49] That was, … I mean, [inaudible 00:06:51] many people referring to this moment, like the ImageNet moment for NLP. Right? So, because we get to a point that there's something that allows us to increase the size of the model. So, we go for it. And then we see, "Hey, wait a second. This is getting better. So, the more parameters that I add, the better that this is getting." Right? So, that was the moment in ImageNet with ResNet, for example. Right? That we added so many layers, and, "Hey, this, this image classifier is, is working so much better." So, we are kind of in the same place, but at a totally different scale, right, or order of magnitude. Right? For example, that model, the ResNet model for ImageNet, I think had like 60 million parameters. I mean, a completely different domain. That was computer vision. Now, we're talking about billions of parameters. And, and, and when we see progression, it's being like, very [laughs], very quick. So, [crosstalk 00:07:44]- Sam Charrington: [00:07:46] Mm-hmm [affirmative]. David Carmona: [00:07:46] I don't know. GPT-2. So, the first version was like 100 million parameters. Then, I think BERT was like 300. Then you have Turing NLR. I think it, at that time, was like 1.2 billion. Then you have GPT-2, 1.5. Then you have Turing NLG. That was 17 billion parameters. That was last year [laughs]. We're not talking months ago. That, … We're not talking about, about years ago. And then we had just, just a couple of months after that, GPT-3 with 175 billion [laughs] parameters. Right? So- Sam Charrington: [00:08:18] Yeah. David Carmona: [00:08:18] Every step is 10 times [laughs] [inaudible 00:08:21]. It's a new order of magnitude [crosstalk 00:08:22]- Sam Charrington: [00:08:22] Mm-hmm [affirmative]. David Carmona: [00:08:22] … which is super impressive [laughs]. Sam Charrington: [00:08:24] So, we've kind of transitioned from … In the domain of Vision, you know, we would always talk about the number of layers as an indication of the size and complexity of the model. And now, when we talk about these language models, we tend to talk about parameters. What is that? And how does that tie to the architecture of these models? David Carmona: [00:08:45] Yeah. I mean, behind … It's not that we didn't want to build these massive models before. It's that we couldn't [laughs]. That's the reality. Sam Charrington: [00:08:52] Mm-hmm [affirmative]. David Carmona: [00:08:52] And I think the big breakthrough to really enable these, these sizes of the model is the transformer architecture. And yeah, definitely a lot of say about that. But, yeah, the transformer architecture, it has … I mean, it's also based in layers. In this case, they are symmetric. So, it scales very well because it always has the same number of inputs and outputs. So, you can stack up all the layers. And, and it was a huge change because that broadened the blocker that we had before with scaling these NLP models, is that we were using techniques as, as you know, as recurrent neural networks. Right? Like, LSTM and things like those. And those things are great because it allows you to connect, for example, in a text, the words between words. You can have some kind of memory. So, a word right now can be impacted by words in the text before. Right? And, and you keep that memory. The problem is that the way that we were doing that was very sequential. So, and I mean, by definition, a recurrent neural network taking the previous step as an input. So, you need to finish that step to go to the next one. So, that impacted on the scalability of the models. So, I think with the transformer architecture, we kind of broke that ceiling because now, suddenly, we don't have an architecture that is [inaudible 00:10:05]. So now, in this case, it's all in parallel. We take the, all the inputs in parallel and with some techniques, in particular, … I think the most important one [inaudible 00:10:16] I would highlight too. But definitely, for that work, two things have to happen. One, it's the concept of the positional embedding, so how every word needs to get an input in the, in the model, the position somehow, a flag of an indication of where that word is because that's [laughs], of course, important [laughs]. It's very important- Sam Charrington: [00:10:36] Mm-hmm [affirmative]. David Carmona: [00:10:37] … Where a word is in a sentence to understand the sentence. But then the second thing is this concept of attention or, in this case, self attention, which is a way to kind of replicate that concept of connecting or changing the meaning of words, depending on the words that were happening before, or even in the case of bidirectional [inaudible 00:10:56] words are happening after that. Right? And that's, that's a whole new construct applied to NLP that is proving to be, not only super scalable, but even, performing even better [inaudible 00:11:08] the traditional approach to NLP. Sam Charrington: [00:10:43] Hmm. And so how should we think about how attention works in these kinds of models? David Carmona: [00:10:43] So, I, I, I mean, it's a very simplistic view, but I like to think of it … Because attention is not new. So, we've been using attention- Sam Charrington: [00:10:44] Mm-hmm [affirmative]. David Carmona: [00:11:23] … in, in others … Even in other domains. Right? Like, vision or i- image generation, or … I mean, the most simple example that I use all the time is movie recommendation. Right? So, how do you know if, if a user is gonna like a movie or not? So, the way that you do that is that you take a vector defining the movie in, you know, in any dimensional space. And then you take another vector defining the taste of the user. And then you multiply those vectors, right, to get the distance, the, like, the cosine distance or similarity between those two vectors. And that's an indication how much the user will like the movie. That's that's attention, but in the case, of two different entities. Right? My taste and the movie. In this case, self attention is like doing similar, but with a sentence with itself or with a text with itself. Right? So, but in this case, the w- the attention that we want to measure is the connection between the words. So, how one word is related or connected to the rest of the words. And at the end, you're gonna have like, a heat map, right, so, where every word is connected in some manner with other words. So, if you're saying, "The kid hit the ball, and he was happy." So, he will be super connected with the boy. Right? So, I mean, super simple because at the end, you have multi [inaudible 00:12:42] attention blocks. And, and then you have all these different layers. It's like trying to understand [inaudible 00:12:49] networks. After three layers, you're lost [laughs]. You are completely lost on [crosstalk 00:12:53]. Sam Charrington: [00:12:53] [laughs]. David Carmona: [00:12:53] But I mean, that's the core principle of it. Sam Charrington: [00:12:56] Mm-hmm [affirmative]. Part of what's interesting here is that, you know, we've transitioned from an approach to NPL that was, like you mentioned … Prior to capturing positionality, you know, we'd take a bag of words of things that was at document level, didn't capture where those words were, didn't really do a good job of capturing the relationships, but we're just looking at the statistical properties of a document or sentence or- David Carmona: [00:13:22] Yeah. Sam Charrington: [00:13:23] … corpus to now looking at the relationships between all of these entities that make up language. Is that part of the power of this [crosstalk 00:13:31]? David Carmona: [00:13:32] Yeah. Yeah. E- exactly. I would say that and then the concept of, of training these models with self supervised algorithms. Right? So- Sam Charrington: [00:13:42] Mm-hmm [affirmative]. David Carmona: [00:13:42] [inaudible 00:13:43] supervised training. I think that's the other thing that, that … It was the explosion in all these models, is how now, … Because this scales amazingly well, now, you can afford training these things with huge amounts of data. Like, for example, the entire internet [inaudible 00:14:00] kind of. Right? Which is kind of what we're doing with this model. So, we take the text on the internet. And then depending on the model we can go in, in a little more detail in there if it's a [inaudible 00:14:10] model or representation model. With smart techniques, you take that. You take … You mask that text, so the, so the model can try to guess either the missing words or the words that are happening after a given text. And by training that with that input, that you are almost not touching at all. Right? So, it's all self supervised, [inaudible 00:14:31] and, and all of that. The model can actually learn very complex concepts and relationships. Sam Charrington: [00:14:37] Mm-hmm [affirmative]. You mentioned different types of models. Elaborate on that a bit. David Carmona: [00:14:41] Yeah. So, I think, the way that … And, and we can talk that more about because at the end, these same concepts can apply beyond NLP. But if we focus just on NLP, they are the main families of models. One is that I think people are super excited also because of Turing NLG and because of GTP-3. Those models are generation models. So, they are a natural language generation model, so NLG. And in that case, what … The way that that model is trained, they are called ultra aggressive models because you train the model with the, a lot of text. But then you train it to guess what is gonna happen, what text goes after a particular text. Right? So, they generate … They are super good, generating text, like guessing the end of a sentence or guessing an entire document, or guessing how a movie will, will end, or whatever [laughs] we want to, to guess or [inaudible 00:15:37] text, things, things like those. And that's one big family of models. You have em … Again, like, GTP-3 is an example of that. Turing NLG is an example of that. And then you have another family, which is more about representation, so natural language representation models. And the goal of those is more like, representing the text. So, in that case, the architecture that is, that is used, instead of trying to guess … Or the way that it's trained. Instead of trying to guess what's next, what we do is that you mask some words in the text. And then the model will try to guess it. And they are called bidirectional because in that case, not only they look at what happened before a certain moment, but also after that. So, they will look at the words before and after a particular word to understand the context there. Right? So, those are really good to map like, text to representation, then I fine tune to do whatever I want. Right? So, from super basic sentiment analysis to question answering, or whatever I want to fine tune the model. So, those are like, the two big blocks. Then I like to go a little bit deeper 'cause for each of them, they are two other families that I think are very relevant to understand, which is how, … So, then there's more than one language in the world [laughs]. Right? So- Sam Charrington: [00:16:58] [crosstalk 00:16:59]. David Carmona: [00:16:59] You need to address that. Right? So, in particular, where you are creating real products. So, we are using these models in, in Office, for example. Office is working [inaudible 00:17:07], I feel like, 100 languages. So, imagine doing this for every language would be very [crosstalk 00:17:13]. Sam Charrington: [00:17:13] Mm-hmm [affirmative]. David Carmona: [00:17:13] And that would be the traditional approach of, of doing this. So, we, … And, and Microsoft has been a big believer on the need of doing this thing in an universal way. So, that creates a new family of models that are universal models, right, universal language models. And in the case of Turing, for example, we have both. We have a regular model. And then we have the universal language representation, ULR, so T, Turing ULR, universal language representation. And that is super powerful 'cause what allows us, for example, in, in Microsoft, is to implement features in Word using this, like, … I don't know. Em, semantic search. We don't need to train that feature or that model for every language. We just need to fine tune it for one language. And then you have the feature for free in 100 languages. Right? Sam Charrington: [00:18:03] [crosstalk 00:18:04]. David Carmona: [00:18:03] Which is super cool. So, very, very recommend them to use those models for that. Th- this was, by the way, for people who want to go deeper. There's a paper that I like a lot is [inaudible 00:18:14] 2017 where it explains this, this concept. And, the example that it uses is how you learn math. Right? So, you look at … Well, not me. I wouldn't consider me bilingual. I speak Spanish and a little bit of English, but [laughs] my kids are truly bilingual. And when they learn math, they don't need to learn that two plus is equal four in English, but then [Spanish 00:18:39] in Spanish. Right? So, they just need to learn math once. And then- Sam Charrington: [00:18:43] [crosstalk 00:18:44]. David Carmona: [00:18:43] … they can apply that in different languages. So- Sam Charrington: [00:18:46] Mm. David Carmona: [00:18:46] It's the same thing for models. So you can focus on teaching or training the core concepts, fine tuning for the concept. And then you have it for free in all the languages. Sam Charrington: [00:18:56] Mm-hmm [affirmative]. Yeah. [inaudible 00:18:57] I wanna dig into transfer learning and multitask. These are all things that are coming to mind as you're explaining this. But before we do that, we started out talking about language models as an example of these massive models that require a new way of thinking about, you know, AI at scale. And you mentioned, you know, the progression of the sizes of these models … And you know, it's 10X each time. GPT-3 is, you know, 10X Turing. And one question that occurs to me is, you know, is size the, you know, the most important or the only factor? You know, does it mean that each time we jump a generation, you know, "Let's just forget about the, you know … We shouldn't be using Turing anymore. Let's just use GPT-3 because it's 10X better." I think, you know, there are some obvious reasons why that might be the case, like if they're trained on, on different corpuses. Like, we know that GPT-3 has kind of a very broad public internet. And at least with GPT-2, like, there was a lot of critique about, you know, Reddit, you know, and, and the biases that get introduced there. So, the training set is going to be an obvious differentiator that separates from the size. But I'm wondering if there are other things that we need to be thinking about beyond just the size of the model. David Carmona: [00:20:24] Yeah. Yeah. No, you are right. And I think … So, it's a very simplistic thing to just discuss the models of … Or the parameters of a, of a model. [crosstalk 00:20:35]. Sam Charrington: [00:20:32] Mm-hmm [affirmative]. David Carmona: [00:20:33] There's way more. I have say, though, that the one thing that we are, we are seeing is that the more parameters that you add … Right now, we are not seeing the ceiling of this. So, we keep improving the accuracy and the generality of the, of the model. So, hey, parameters are important. But then at the same time, it is true that it really … So, there's not one model for everything. So, different models are good for different things. Right? And in our case, for example, we, we … Turing, our family of models. It's actually a family because of that. So, we don't believe that one model will … At least right now, will be useful for every single scenario that you are targeting. Right? So, in, in our case, we created that, that family of model, which are inclusive of, of many things, including many different language, like, this basic [inaudible 00:21:27] that I was providing before or, or this, these metrics- Sam Charrington: [00:21:30] Mm-hmm [affirmative]. David Carmona: [00:21:30] … of, of different models. You're gonna need a model for each of them, depending on what you want to accomplish. But then even beyond that, 'cause not everything that you do is NLP. So, in the family of Turing in Microsoft, we have models that are even multi-modal, that include image and text or that are focused on image. And that thing will keep growing. So, that's something important to keep in mind. The other thing is, of course, the eternal debate on the importance of the architectures, right, that, that you're using. So, I think there's a … And I don't have a super strong opinion. I think it's like everything. It will go through phases. It will get to a moment that just by adding brute force parameters, the thing will be very difficult to improve. And we'll need to be a little bit smarter on how we can improve those models. We can optimize those models in, in another different way. But again, I don't want to diminish the fact that we keep seeing that we add more parameters and, and we get more power. Right? One thing that you said, though, Sam, I, I want to, I want to double click on that 'cause it's super important. So, it's the responsible AI implications of the model. I think that will be an an area for models to differentiate and to keep in, in mind when you're using a model 'cause the reality is that, right now, these models, they have a lot of challenges from the bias, transparency, and, and, and others that, that we need to keep in mind. So, we need to just … So, we innovate on the power, accuracy and, you know, multitask aspect of generality of these models, we also need to innovate on the responsible side of them. And eh- Sam Charrington: [00:23:08] [crosstalk 00:23:09]. David Carmona: [00:23:09] As, as you said, the training corpus, that's important. I think right now, we are probably way too late in the pipeline to apply responsible AI principles to these models, meaning that we create things with these models. And then, just then, we apply those things like … I don't know. Like, you know, filtering or many, many other techniques that you can use there. I think we need to go earlier in the process, even at the point of the training, so we can make those models responsible by design. Sam Charrington: [00:23:41] Do have a sense for how we can do that? A lot of the power of these models comes from, essentially, taking the entire internet and building a language model based on it or, you know, large parts of the internet. How do you apply the, you know, how … What are the techniques that we can use to build responsibility earlier at that scale? David Carmona: [00:24:08] So just as an example, but one example in Microsoft could be the Office or the Outlook auto reply. Right? So, what is … So, that is the typical example of a massive NLP model that is taking as an input, an email and, as an output, is creating a likely reply that you want to, that want to do. Right? So- Sam Charrington: [00:24:28] Mm-hmm [affirmative]. David Carmona: [00:24:28] That [scenario on paper, it looks so simple [laughs] il- extremely simple. But when you get into the responsible side of [inaudible 00:24:37] extremely complex. And you need to, you need to pay a lot of attention. And it's not like a one-shot thing that you do, and done. You are, you are, you are golden. The reality is that you need to apply that across the entire lifecycle of the model from, as you said … So, you mentioned one that is important, which is the training data. So yes, of course, we need to get a subset of the training data to make sure that there's no toxic data that is training the model. But that is not, that is not enough. So, we need to keep in mind things like the privacy of the user. Right? So, think of, "How can we … " So, actually, for this feature, we use differential privacy to make sure that the instances that we use [inaudible 00:25:20] surface, they are not … They cannot identify a user or things like those. And you can also think of the input as something that we also manage, that we make sure that they are short answers, that they are not like, long emails [laughs], of course, things like those. So, it's something that you need to do at every stage. There's a ton of research, active research happening right now to really tackle this super complex challenge that we have with these models. Sam Charrington: [00:25:47] Mm-hmm [affirmative]. So, before we jump into how we achieve this kind of scale, you mentioned something in our pre-call that really stuck with me, is this idea that models are becoming a platform. And you know, transfer is a piece of that. Fine tuning is a piece of that. I'd love to hear you riff on, on that idea. I think it's a really interesting way to think about models. David Carmona: [00:26:14] Yeah, yeah. It's not a new concept. So definitely, we've been, seeing … So, you see our services [inaudible 00:26:23] services in Azure. And they support the concept of transfer learning. So, you don't need to train a model from scratch. Right? So, it's … But the reality is that a lot of what we do in AI is training models from scratch for your particular scenario. So, we're doing everything that we can to try to simplify that process because if we don't simplify that process, it's gonna be very difficult to really scale AI in an organization, in a, in a company. So, there are definitely many techniques to do that. I think in the area of NLP, fine tuning is the most relevant now. And then we can talk about some emerging ones that are super interesting and cool. But with the fine tuning process, the idea is that you pre-train … You can use a model that is pre-trained, like our Turing model, pre-train on that [inaudible 00:27:10] information from the internet, multi domain, totally general. And then you fine tune that model. So, fine tuning, meaning adding something to it. Like, for example, you want to fine tune the model to do a sentiment analysis. So, you would add then like, a classifier or something like that, a binary classifier. And then you use label data. In this case, you use like, sentences that are, you know, positive, negative sentiment. And then you fine tune. So, you train additionally. It's like extra steps of training that entire thing with your added classifier, in this case, for example, which is gonna update the weight. But it's not starting from scratch, meaning that you don't need that massive data and the skills because you don't need to change the architecture. You don't need to compute because it's not that much compute needed. So, that is certainly a huge step into democratizing these models. Right? So, that's, that's super important. And not only you can do that for fine tuning for specific tasks, you can also fine tune it for your domain. So, if you work in finance, or you work in health, or you are in any industry, and you want to find a law company … So, you want a law firm. You want to fine tune that model for the domain of your vertical. So, you don't need to train the whole thing. You just need to train for that particular domain. So, super, super important, but then what we're seeing is these models can go even beyond that. And that's a super interesting area. Right now, it's still in the beginnings. But what is the big difference with that approach? So, in this first approach, with fine tuning, you are training the model at some point. I mean- Sam Charrington: [00:28:51] Mm-hmm [affirmative]. David Carmona: [00:28:52] Not from scratch, but you're training it. You are changing the weight of, of the model. You're- Sam Charrington: [00:28:56] Mm-hmm [affirmative]. David Carmona: [00:28:56] You're updating that model. You need [inaudible 00:28:58] to train it. But then we have these other techniques. They are called like, zero-shot or few-shot, where you don't do that. So, the model can learn in [inaudible 00:29:08] time. So, you don't need to change the [inaudible 00:29:11] of the model. You have only a model. You don't change that model. Now, in [inaudible 00:29:15] time, where you are doing the inference of the model, you can … If you are doing a few-shot, then what you do is just provide a few examples of the task that you want to do, and then directly, the one that you want to solve. And the model will do it, which is mind blowing [laughs] that it can do that. But then you have zero-shot, which is like, the mind blowing times three [laughs], which is that you don't even need to provide examples. So, you can ask one of these models, "Hey, I want to translate this to French." And you provide the sentence. And the model will know how to do that. It will identify patterns in the corpus data that it was trained on. And it will know what it means to be, to do a translation. And it will do that translation. So, those techniques, what they are really doing, from fine tuning to few-shot to zero-shot, is making it much easier to really use these models in your particular scenarios for your particular domain, your particular task, or your particular modality. Super cool. Sam Charrington: [00:30:18] Mm. Awesome, awesome. We've talked about different kinds of models. Uh, just a few quick words on applications. Like, you know, what do you think are the most exciting applications of language models generally or, or Turing in particular, you know, within and outside of Microsoft? David Carmona: [00:30:38] Yeah. So what, what I can do because it's a [laughs], it's a big one. We can, we can talk for a long time. I can give you an overview of how we are using it in Microsoft. And then you can get a sense of, of the usages that, that it can have. So, in Microsoft, the way we look at this is like … We always look at these things, any technology is a stack. So, our goal always is to deliver a full stack. So, you just … And that's our approach to any technology. So, we do the research. But then we want to make sure that that research is available for others to, to use. And then we want to make sure that we keep adding layers [inaudible 00:31:19]. for example, the first one would be releasing that as open source. Right? So, we add another layer. We want that to be part of Azure, so you can train those models yourselves, which is the AI supercomputer that we are, providing in Azure to train those models. But then we keep building on that. On top of that, we have things like Azure machine learning. So, you have another abstraction layer that can improve your productivity, fine tuning those models, like [inaudible 00:31:44] mentioned before. But then we put another layer on top of that, which is [inaudible 00:31:49] services, which are end to end out-of-the-box services that you can use as [inaudible 00:31:54] points. And you can infuse directly into your application without worrying about doing anything with, with those models. And then on top of that, we build applications. So, we make them part of our products, like, Office, Dynamics. Or we create new products that were impossible before. So, that's the [inaudible 00:32:11] approach. I think if we focus on the application side, just to give you some, some examples of things that are already available, that people can use that are powered by these massive models [inaudible 00:32:21] a lot in Office. A lot of things in Office are powered by these models. So, you can think of, for example, semantic search in Office [inaudible 00:32:30] you open a Word document, you search for something in that Word document. And that is not the traditional find and replace [laughs] that we had before. This is semantic search. So, you can even ask questions to the document. And [laughs] the document will answer those, those questions. That is all powered by, by Turing. You have things like document summarization. So, you go to SharePoint, and you hover on a document. And you will see a summary of the document in there. That is a … It's an abstraction. So, it's not just taking parts of the document. That is generated with, with Turing. Things in Outlook, like Outlook auto-reply that I was mentioning before, or things like, … There's something meeting, Meeting Insights, that before a meeting, it will give you all the relevant information about that meeting. So, those are like, … In the taxonomy that we were talking about before, those would be Horizon 1. It's about making those applications better. But then we have these Horizon 2 things that are [inaudible 00:33:24] new opportunities that these models can open. And I think a good example of that would be Project Cortex. So, Project Cortex is part of the Microsoft 365 family. And the goal of that project is super cool. So, what it does is that it's able to get all your internal knowledge in your organization by looking at both the structure and the, and structure data in your organization. So, think of documents, meetings, PowerPoints, anything that you have in there, even images 'cause it's able to scan and do OCR on, on images. So, it's able to crawl all that information for your company, and then to extract knowledge out of that. So, what we do is that we create this concept of a knowledge entity. Like, imagine that, … I, I don't know. You are in a law firm. Imagine international, whatever, commerce. I don't know. I have no idea of, of law. But it's like a topic- Sam Charrington: [00:34:23] [crosstalk 00:34:24]. David Carmona: [00:34:23] … that then AI system was able to extract from your information. And it can, it can help you a lot. So, it can give you … It can provide you with a summary. It can give you, what are the most relevant documents for that particular subject in the company, what are the experts, so, who you should talk with about, about those topics. So, it's mind blowing [inaudible 00:34:45] knowledge basis. Right? So that, that you can get … It's extracting the DNA of your company. So, you can really make it available for the, for the rest of the employees. And like, those, I mean, I can [inaudible 00:34:57]. So, every, any product that you can mention [inaudible 00:35:00] use Bing. So, it's another, of course, super important one. Things like question and answer in Bing [inaudible 00:35:05] even the universal search. So, we use this trick of universal language representation in Bing. And those are all available in there as well. Yeah. So, we use it [inaudible 00:35:16]. But more on the business side, I would mention, in Dynamics 365, we use these models for a lot of different things. Very obvious one, of course, is anything that has to do with customer service understanding or, you know, sentiment analysis. All of that in customer service that is- Sam Charrington: [00:35:33] Mm-hmm [affirmative]. David Carmona: [00:35:33] … powered by these models. But then things that are more visionary. So, think of, for example … In Dynamics 365, one of the things that we can provide is suggestions to sellers in your company by looking at any interaction with that customer before, like emails or documents, phone calls, whatever. Right? So, it's able to understand that and structure information, and give you … It's like language generation. But in this case, to take the next steps to your, to your customs. Sam Charrington: [00:36:01] Hmm. David Carmona: [00:36:02] So, yeah. Super, super broad. We could talk for a while. Yeah [laughs]. Sam Charrington: [00:36:04] [laughs]. So, you know, let's maybe jump into what's happening that's enabling all of this to take place now. One of things that … You know, when we think about kind of the scale and size of these models … You know, we've talked about the scale of the compute that has been required to enable it. You know, how do you thi- … And you mentioned AI supercomputers. Like, what's that all about? How do you think about, you know, building out the infrastructure to scale and train these models? David Carmona: [00:36:36] Yeah. Le- let's say that the train model like this in your laptop will take probably thousands of centuries [laughs]. So, definitely, you need a lot of scale to train [crosstalk 00:36:48]. Sam Charrington: [00:36:48] Yeah. David Carmona: [00:36:48] And you need … I mean, it's amazing, the kind of challenges that you get when you grow a model like this. Like, fundamental challenges like, "Hey, the model doesn't fit in your GPU." [laughs] That's- Sam Charrington: [00:37:02] Mm-hmm [ David Carmona: [00:37:03] affirmative]. … Something that we wouldn't use before. Right? So, I think it is like … If you pass 1.3 parameters, something like that, then the model is not gonna fit. So, you better find new ways. But then it's just a computer. So, the time- Sam Charrington: [00:37:15] [crosstalk 00:37:16]. David Carmona: [00:37:16] … required to train one of these models, you need like, ultra [inaudible 00:37:19]. I, and, and I think … So, that's the main reason why we focus on … And like, always, like I was saying, in the beginning, we try to have a platform approach to it. So, not thinking of fixing this problem for Turing, for our models, but fixing this problem for our customers, so they can use this infrastructure as well. Sam Charrington: [00:37:38] Mm-hmm [affirmative]. David Carmona: [00:37:38] So, the approach that we took was building this massive infrastructure in Azure. So, these are massive clusters that are, that you can sting directly in Azure. And not only you can sting, then, of course, you have the complexity when you have … These are … I mean, imagine … For example, the one that we announced a year ago, that is a massive cluster of like, 10,000 GPUs. You have more 200,000 CPUs. So, it's massive scale. So, how do you manage that? You need things that allow you to manage that in a distributed way. And then what is even more challenging is, "Okay. So, I have my infrastructure completely managed. I can [inaudible 00:38:15]." It is integrated with Azure machine learning. So, you can like, launch like, jobs in that massive infrastructure. But then how would you actually do it? So, you have a model that is by definition, huge. So, how do you train that thing? How do you divide this task, this super complex task, into individual [inaudible 00:38:36] in your, in your massive cluster? And that's that's the other side of the coin, which is our work on these like, software systems that are meant to help you in that process. So, this was … At the same time that we announced the AI supercomputer, we also announced … It's called DeepSpeed. It's open source. So you can use it on, on top of anything. And it will help you do that for you. So, what it will do is that it will take this training. And it will distribute that training across a massive infrastructure. So, it will know how to do that in an efficient way. And it does it basically … It's like a three … We call a 3D distribution because it takes like three different [inaudible 00:39:18] to, let's say, chunk this task. Right? One, which is the most basic one, is the data distribution. So, you just [inaudible 00:39:27] your data in smaller chunks. And then you have [inaudible 00:39:30] each node is gonna take one of those chunks. But that is not enough. You need to go further than that. So, the other level of distribution that we use is [inaudible 00:39:39] distribution, which is [inaudible 00:39:41] because of the transformer architecture, that [inaudible 00:39:44] symmetry is [inaudible 00:39:46] to split the [inaudible 00:39:49] layers. So [inaudible 00:39:50] each node will take a different layer [inaudible 00:39:54] communication and optimization going on there that [inaudible 00:39:57] you need to take care. And then the last one is the [inaudible 00:40:00] which [inaudible 00:40:01] even for each of those layers, we can divide [inaudible 00:40:04] smaller chunk [inaudible 00:40:07] a different GPU. So [inaudible 00:40:09] what that allows you, it [inaudible 00:40:11] a lot of research involved [inaudible 00:40:13] this framework. [inaudible 00:40:14] you almost get like, a linear distribution, like, a linear growth in your model. So, you can [inaudible 00:40:20] number of parameters … And by the way, [inaudible 00:40:23] is able [inaudible 00:40:24] more than one [inaudible 00:40:25] parameters. So huh, you can train models that are not even [inaudible 00:40:29] existing today. And you see the line, and it's almost linear. So, it's exactly what you're, you are looking for in these systems. Sam Charrington: [00:40:35] Oh, wow. Wow. And what about on the hardware side? Microsoft announced this Brainwave Project some time ago to bring new hardware architectures to bear this problem. Can you share a little bit about that? David Carmona: [00:40:50] Yeah. So, yeah. We announced the [inaudible 00:40:53] maybe a little bit more ago. But it's fully available now. So, you go to Azure. And you go to Azure machine learning. And one of the options that you have to deploy your model is[inaudible 00:41:02]. And what, what that is gonna give you, especially [inaudible 00:41:05] inference time, is very low latency and a lot of, you know, efficiency in cost. Right? So, it's perfect for massive … I mean, I, I always use the same example. So, this feature in Word, one of the features powered in Word by Turing, is called predictive text. So, that means that, when you type, it's gonna give you suggestion, how the text will continue. Right? So [inaudible 00:41:29] think of [inaudible 00:41:30] intelligence, but, but for Word. 300 million users of Word. Imagine doing the inference of that model in every keystroke [laughs]. So, that's the- Sam Charrington: [00:41:39] Mm-hmm [affirmative]. David Carmona: [00:41:40] That's the scale that we're talking here. it's huge. So, you better optimize that a lot if you want to scale it to that, to that number. And we do that … I mean, you have to do it in, … Again, it's like a game that you have to tweak every single step. Of course, we don't go with this m- multi billion models on inference time. So, there's a lot of optimization to do there to reduce the number of parameters, to even using techniques to make it more efficient. And then there's the hardware. Right? So, we use the ONNX Runtime thing in Microsoft. That can optimize not only for the CPU … So, it has optimization for CPUs, but also for [FPA 00:42:21]. So, it's a way of [inaudible 00:42:23] from the hardware that you have, underneath. And it really allows you to bring all these things that are great to talk from the research point of view. But then putting [inaudible 00:42:33] in action, it requires all this level of detail that is a new level of complexity. Sam Charrington: [00:42:38] Mm. So, this is primarily focused on the inference side. Do you see any … Are there any particular innovations you're excited about on the hardware side for training? Or you, do you see it primarily being evolutions of today's GPUs? David Carmona: [00:42:55] I mean, when we see … I mean [inaudible 00:42:57] super evolving. So, we'll see … The reality right now is that you have to be flexible. So, we are not- Sam Charrington: [00:43:02] Mm-hmm [affirmative]. David Carmona: [00:43:02] … discarding any approach, any at all. Right? So, the reality is that FPA for the inference was super efficient because it allows you to change it. Right? So, it's programmable. So, that was very, very efficient [inaudible 00:43:16] and very agile. The combination of agility and efficiency was, was the right thing. But that may change at, at any moment. And as these things get more stable, then ASIC may be the way to go. And, and, yeah, of course, we are, we are not discarding any, any of those approaches. Sam Charrington: [00:43:32] So, how do you see this level of scale that we're dealing with today impacting the world for kind of users of AI? What, what changes? David Carmona: [00:43:43] I think that the main thing maybe bringing, bringing all of this together is how this will change the way that you develop AI. So, how this will open new ways of developing AI that we can, that we can use right now. So, that whole concept of creating more general multitask, multi-domain, multi-modality models, that then you can customize for your particular task, that is, that has huge implications on how you can … One, how you can scale AI in your organization and how AI can scale to other organizations, like smaller organizations. Right? So, that for us, it's a, it's a huge aspect of, of all of this. And the way that I see it is, is that uh, it's kind of what we experienced in the last 20 years for software. And this is very similar. So- Sam Charrington: [00:44:38] Mm-hmm [affirmative]. David Carmona: [00:44:38] Software at some moment, we had the hard lesson that software has to be super connected to [laughs] the business. So, you have a team of software developers in a basement [laughs] not connected to the- Sam Charrington: [00:44:51] [laughs]. David Carmona: [00:44:51] … business, that is not gonna work. I think we are ki- … AI is in a basement right now, kind of. Right? So, it's- Sam Charrington: [00:44:57] [laughs]. David Carmona: [00:44:57] We are not fully connected to the business [inaudible 00:45:01] because it requires so much like, skills so many skills and expertise that, that it's a very technical domain right now. We need to change that. So, we need to make sure that the business and a- AI come together. And, we learned that with software. It's called devops. It's about bringing the two together, and then doing a small iteration [inaudible 00:45:22]. It's coming to AI. We are all talking about MLops now. It's a huge area. It's our [inaudible 00:45:28] definitely in Microsoft to provide the platform to empower that collaboration and that continuous iteration, and trackability of everything that you do in your AI development cycle. [crosstalk 00:45:37] and that will be, massively be empowered by AI at scale. So, you have models that can really empower like, a more dynamic way, so you don't have to create from scratch, these models. You can iterate on them with the business and just focus on teaching your domain to the model instead of starting from scratch. That goes in that direction. We do think that there's one step beyond that. We are also seeing … We also saw it with software. That also needs to happen with AI, which is really going beyond the technology and the businesses, and getting to every employee. So, how every employee in an organization should be empowered with AI just like they can Excel right now to [inaudible 00:46:21] numbers [inaudible 00:46:21] that for AI. So, every employee can apply AI, and not only apply it, but also create, consume, mix and match [inaudible 00:46:31] of having some level of freedom to really apply AI to, to what they do. That's another huge area, like the augmented intelligence area. Sam Charrington: [00:46:41] Mm-hmm [affirmative]. David Carmona: [00:46:41] That [inaudible 00:46:42] models, we, we may see it happening sooner than later. Sam Charrington: [00:46:45] Awesome. Well, David, it's been wonderful to catch up with you and to dig into some of the work you're doing around AI at scale. Thanks so much for taking the time to chat with us. David Carmona: [00:46:58] Thank you so much, Sam. It was a pleasure. Sam Charrington: [00:47:00] My pleasure. David Carmona: [00:47:01] Thank you. Sam Charrington: [00:47:02] All right, everyone. That's our show for today. To learn more about today's guest or the topics mentioned in this interview, visit TWIMLAI.com of course, if you like what you hear on the podcast, please subscribe, rate, and review the show on your favorite podcatcher. Thank you so much for listening, and catch you next time.
Sam Charrington: Hey Everyone! Last week was the first week of our TWIMLcon: AI Platforms conference, and what a great first week it was! Following three days of informative sessions and workshops, we concluded the week with our inaugural TWIMLcon Executive Summit, a packed day featuring insightful and inspiring sessions with leaders from companies like BP, Walmart, Accenture, Qualcomm, Orangtheory Fitness, Cruise, and many more. If you’re not attending the conference and would like a sense of what’s been happening, check out twimlcon.com/blog for our daily recaps, and consider joining us for week two! Before we jump into today’s interview, I’d like to say thanks to our friends at Microsoft for their continued support of the podcast and their sponsorship of this series! Microsoft’s mission is to empower every single person on the planet to achieve more. We’re excited to partner with them on this series of shows, in which we share experiences at the intersection of AI and innovation to inspire customers to reimagine their businesses and the world. Learn more at Microsoft.com/ai and Microsoft.com/innovation Sam Charrington: [00:01:29] All right, everyone. I am here with Gurdeep Paul. Gurdeep is a corporate vice president with Microsoft. Gurdeep, welcome to the podcast! Gurdeep Pall: [00:01:38] Thank you, Sam. Really excited to be here. Sam Charrington: [00:01:40] I’m super excited for our conversation today! As is our typical flow, I’d love to have you start by introducing yourself. You’ve had quite a career at Microsoft culminating in your work in AI and autonomous systems. Tell us a little bit about your background and how you came to work in this field. Gurdeep Pall: [00:02:02] Thanks Sam. I’ve had a really nice long run at Microsoft, as you mentioned. And in fact, today is my 31st anniversary at Microsoft. Sam Charrington: [00:02:11] Wow. Gurdeep Pall: [00:02:12] So, yeah, it’s been a long career, but I really had a great time. In fact I feel like I’ve been into the candy store like three times. So my career can be divided into three parts. I worked on networking and operating systems. So that was sort of my first gig at Microsoft. I was very fortunate to work on a lot of the internet technologies when they were first rolled out in operating systems. I worked on VPNs, I’ve worked on remote access. And then I worked up to windows XP, I was the general manager for windows networking, where we shipped wifi for the first time in a general purpose operating system. And then at that time I moved over to work on communications and I started Microsoft’s communications business. So these are products that you may remember from the past, things like office communication server, which became link, which became Skype for Business, which is now Teams. So started that business from scratch, and all the way until we announced teams, in fact, a few days before we announced Teams, I was involved with that business. Though I’d had a stint in the middle on AI and I came back to work on AI. So it’s been, I would say, roughly three parts to my career and the latest being AI. And I’ve had lots of fun in all of them. Sam Charrington: [00:03:30] That’s awesome. I talked to so many people at Microsoft too, are working in AI and a lot of them started their careers working on Bing. You’re maybe one of the the outliers in that regard. Gurdeep Pall: [00:03:43] Well, the funny thing is that first stint had mentioned on AI was actually in the Bing team and I was running Microsoft speech. I was running some of our interesting explorations we were doing at Bing, recognizing objects. In fact, some of the image stabilization work we’ve mentioned to HoloLens actually came out of that group. So yeah, I worked on maps and lots of interesting stuff. Sam Charrington: [00:04:08] That’s awesome. So tell us a little bit about autonomous systems and some of the work you’re doing in that area. Gurdeep Pall: [00:04:14] Yeah. So, for the last four years or so, I’ve been focused on emerging technology and how it can be applied to interesting business problems. And, in that regard, I’ve worked on some interesting technology in the language space, language, understanding space. Worked on ambient intelligence where you could actually make sense of a space sort of make reality computable if you will. And then as I was exploring interesting emergency AI, which can solve business problems, we started focusing on autonomous systems. That was interesting to us, not just as a very interesting aspect of which AI was enabling, but also Microsoft didn’t have a lot of focus in that area before. So, when I talked to Satya and the time Harry Shum was here, we decided this was an area we were going to go invest in. Sam Charrington: [00:05:04] Interesting. And one of those investments was the acquisition of a company called Bonsai. This is a company that I know well. I interviewed one of the founders, Mark Hammond. This was back in 2017. It’s hard to believe it was that long ago. And the company had a really interesting take on using technologies that are still difficult for folks to put to productive use, namely reinforcement learning. Their take on it was this idea of machine teaching. Maybe you can tell us a little bit about that acquisition, the role that it plays in the way Microsoft thinks about autonomous systems and elaborate on this idea of machine teaching and some of the things that Bonsai brings to the table. Gurdeep Pall: [00:05:49] Sure. Absolutely. So, when we started focusing on autonomous systems, we were like trying to get our hands around this thing. People interpret the autonomous systems, many different ways. Some people think it’s only about autonomous driving, so let’s build a vertical stack. Some people think about robots, these humanoid robots with arms and joints and so on. And we’re thinking, what is our point of view? And, at the end of the day, we look at our own capabilities. We’re a software company, what is a software interpretation of the space? And it was with this sort of point of view that we started thinking about it. There was some work going on in Microsoft research at the time, which I’ll talk more about. And that’s when I first met Mark and team and we had a really good discussion and, as we finished the first meeting, I remember this thing going through my head, that this is like such a great approach. And it really fits into how we are starting to think about this space and makes sense to us. And then also thought, God, this feels like, just the wrong thing for a startup to do, building platforms and tools. It’s a tough thing. And Mark is such an incredible guy. I think you’ve talked to him, so you know that. So when we first finished the acquisition, he shared that with me too. He says, every VC I talked to, he says, why are you doing this? This is like the kind of thing Microsoft should be doing. So it was a marriage sort of made in heaven as it were, and C acquired that company. And it’s been really great, actually working with Mark and picking up from some incredible thinking that. You know, he and Keene had done and the team that was there, and then actually really expanding on that and really helping it realize its potential and also making it much more of an enterprise ready sort of an offering because this space is as mission critical and as important as it gets. So that’s been a very fun journey for the last two and a half years. Sam Charrington: [00:07:52] One of the ways I’ve heard you describe the way you’re approaching autonomous systems or that world broadly, and its two words and I still may butcher one of them, but it’s like this marriage of bits, and is it atoms that you say? Or molecules, or something else? But the idea is that,and this was something that was core to the way Bonsai Gurdeep Pall: [00:08:15] articulated what they Sam Charrington: [00:08:16] called then industrial AI. It’s a different problem when you’re applying AI solely in a software world, Gurdeep Pall: [00:08:23] recommendations on a website or looking at Sam Charrington: [00:08:27] customer churn, to when you’re actually trying to move physical goods or devices or systems. Elaborate on what you’ve seen in terms of the different requirements that come up in that world. Gurdeep Pall: [00:08:43] Absolutely. This is a very important point, when we start focusing on autonomous systems. I know people asking me about half the time, “oh, you’re talking about RPA, right?” No, I’m talking about RPA. Of course it doesn’t help when some of the RPA companies were calling their tech robots and, it could take action and so on. So it was in some ways, it was just a way for us to be clear about what we are doing. And we said, no, we’re actually focused on atoms, not things we just deal with bits. Of course, to digitize anything, you have to go from atoms to bits and then reason over it. But that became sort of the mainstay for us. The biggest difference, I would say, between those two worlds is that there is in the physical world, it is governed by some things like physics. The physical world, of course there’s Newtonian physics, and then you get into some of the multi-joint movements and you get into fluids, that’s a whole different kind of a physics which comes in. So you have to really think about modeling the real world and how then you can apply the tech towards that. The second thing I would say is that, most of the scenarios in autonomous systems pertain to taking action in the real world. And when you’re taking action in the real world, every time you take an action, the real world changes. And this is where reinforcement learning becomes a very natural mate as an AI technology for the problems that really apply to the real world, which is great because we have no other science which allows us to take a really sort of an unbounded state space and actually reason within it. And reinforcement learning becomes this really important piece in it. Lastly, I would say is that, every problem that we’ve looked at from an autonomous system space typically is one where there are experts who exist already. So far we haven’t been called to a problem where this is completely new and completely different and “oh, let’s solve it for the first time,” you know? And so tapping into the human expertise became a very important piece of this equation as well, which sometimes you don’t need to worry about, [inaudible] the data, you throw things at it and then maybe there is judging, certainly, if you want to sort of fine tune the models and so on, but that was another interesting aspect of this. Sam Charrington: [00:11:11] So we’ll be digging a little bit deeper into some of the technology that makes all this happen, but you started to mention some of the use case scenarios. Can you dig a little bit deeper into some specific scenarios that you’ve been working on? Gurdeep Pall: [00:11:27] Absolutely. And that’s, one of the things which makes this very, very interesting to me because it’s literally everything you see in the world around you can be a target for some of the technology that we’re building. Everything from smart climate controls. This is a field, HVAC control is a field that has, for the last 70 years, theres been very incremental improvement. Things like fuzzy logic and stuff like that has been used. And, we’ve seen incredible results using our approach. There things have plateaued out in performance. We were able to bring a much better performance, so energy savings or better climate control. We’ve seen oil drilling, horizontal drilling from companies like Shell, where you have these incredibly big machines and they look like these Bazookas, and you’re drilling with them. And these machines need a pretty high level of precision, so great human experts can do it, but you sometimes need more work than you can actually get that many trained experts on the problem. So being able to guide the drill bits through that. Cheeto extrusion is a very interesting, complicated process. You know, it’s very easy to eat, very hard to make. I always say, I know there are professional chefs out there, but certainly I cannot make the same kind of eggs every morning. Because even that simple task of heating the oil and getting it just right and putting the eggs in, you cannot replicate it every time. But if you’re Pepsi and you’re making Cheetos, that has to be consistent every time. When you open a bag of Cheetos, everybody’s familiar with the fluffiness and the crispness, and so everybody’s a judge and you have to win that every time. So very hard problem, because you have this corn meal, which was mixed with water. It’s impacted by the age of the machine which is extruding, sometimes impacted by humidity, temperature, all these things. So it’s a highly dynamical system and experts today, they sample and then they tweak, and then sample and then tweak, and they’re really, very stressful jobs of trying to keep that quality right. Otherwise the quality folks will come in and reject the material. So this is a problem we’ve been approved to apply our tools to, and basically consistently keep tweaking the parameters of this process so that you can have consistent Cheetos coming out on the other side. Chemical process control and other polymer manufacturing. Very, very hard problem. Some of these problems take six months to design the process for producing polymer for a particular grade. And, if you’ve been able to apply this problem, they’re both in the designing and the actual manufacturing process itself. Our favorite thing is flying things. Bell Flight is an incredible company, they have all kinds of commercial as well as a military applications for their vertical liftoff vehicles and so on. They’re trying to bring autonomous capability to those things. So we’ve been able to apply this towards that as well. So as you can see, anything which has control in the real world where you’re sensing and you’re picking an action, and you’re taking that action sensing again, this kind of a loop exists, this technology can be applied. Sam Charrington: [00:14:53] It’s been interesting over the past few years, just reflecting on some of the early conversations I had with Mark and the team at Bonsai around. There’s kind of this pendulum in the industry where we started out with kind of, rules, like physics and how things work. And we’ve kind of early on in the, in applying AI, we throw all those rules away and kind of leaned heavily on data and statistics. And over the past few years, there have been efforts, both in academia as well as what you’re doing, to kind of incorporate the rules and the human expertise back into the equation, without tossing everything that we’ve gained in applying data. One of the interesting challenges, when you layer on the physical world here is simulation, and how do you let an agent explore and learn without destroying helicopters and lots of Cheetos? Share a little bit about the challenge of simulation and how that’s evolved to help make some of these problems more tenable. Gurdeep Pall: [00:16:01] Yeah. Yeah. I think that’s such an important piece of this equation. Reinforcement learning is great, but reinforcement learning requires many, many, many steps, literally just to get a policy to be robust. You can be six 60 million cranks in before you start to see your policy start to develop at the appropriate level. So the question is, how do you go do that in the real world. And this is, one of the big insights I think the Bonsai folks came up with, and then this was some work that was happening at Microsoft Research coming at it from a very different direction, but they sort of merge together.   This is AirSim, and I can talk more about that, but the ability to model the appropriate aspects of the real world so that you can actually take action against them, get the right input back, and use that to train the model has been sort of the biggest insights here. Because really, what it says is you’re taking the physical world and you’re creating a mapping of it in the digital world, which then allows you to train the models quickly. And that’s where these simulators come in. Now simulators can be, depending on what they’re trying to simulate, can be very computationally intensive. And if you are nervous towards equations and things like that, cFDs. These are pretty long running simulations and some are, of course, faster. Now because we are using simulators for training AI, we want to crank this very, very quickly. So sometimes you end up with this problem where the physics, or at least how that physics is approached using these mathematical equations, actually becomes like a big piece of the problem. And so this is an area on how to take simulation, and how do you mate it with the training of the AI in a way that you can do it fast, you can do it cheap and you can frankly do it in parallel because that is one of the things, we have with some of the RL algorithms now is that you can actually take a policy, the last best known policy, you can explore in thousands of machines at the same time, you can take the samples and come back and update the policy. And then you take that, and again, you fan it out and you’ve got learners which are learning very quickly.  Getting all that figured out is actually one of the big things we managed to get done after the acquisition as well. And it’s all running on Azure and really allows us to do stuff efficiently. Sam Charrington: [00:18:33] You mentioned AirSim what is that, and what’s the role that it plays? Gurdeep Pall: [00:18:36] Yeah, so fierce them was a project in Microsoft research, which started off in a team that was exploring drones and how you bring autonomy to drones. And they had very similar experience. This was, I think they started in 2015. They would go out with their drone in the morning and they would come back with a broken drone in the evening and they will have very, very little data. And it’s like, how are we ever going to get enough data to actually get this thing to fly, to do even the basic tasks? So that’s when they looked at some of the work that is happening in, frankly, the gaming world. And they looked at some of the incredible scenes that could be rendered with unreal and unity and those kinds of things, which, if you’ve seen Forza and stuff like that, I mean, these things start to look pretty real. And they said, let’s create a simulator for perception oriented tasks, where you can create a scene and you can integrate physics into that scene for the different objects that are involved. There could be a flying object, it could be something with wheels, which is driving, et cetera.   And so you integrate the physics and now you’ve created in an environment in which you can train AI. Now it could be reinforcement learning where you’re sensing. So, you model the actual sensors inside this virtual environment, and you are able to use that for reinforcement learning and taking actions. Or you can use these sensors that are modeled inside of AirSim itself, and you can just generate lots of data on which you can do supervised learning offline. For both these purposes. So AirSim, they created this tool for themselves and they realized it’s so powerful, so they put it out as an open source utility. So today it has more than 10,000 stars on GitHub. It is really one of the most popular tools because others are realizing that, this idea of being able to simulate the reality is a very powerful approach. Sam Charrington: [00:20:35] So, can you maybe talk us through for some of the, any of the use cases you described when you go into an environment with a real customer, with, real problems. What’s the process to actually get something up and running and demonstrate value that they can build on meaning concrete value as opposed to theoretical POC value. What, what does it take to really do that? Gurdeep Pall: [00:21:02] I think, and this is something that we’ve been working on and we will continue to work on because our goal is to get this to a point where people are able to identify that this is a great tool for the problem that they have. It’s not like some sort of a speculative exploring exercise. They know that they’ll definitely get the results if they adopt this tool chain and going from there, to actually training the policy and to be able to export the brain, and actually start using it at the real world. That period is pretty short. So this is a journey for us, it started off fairly long. And now we are at a point where we are focusing on these so-called solution, accelerators, these areas where, the problem is very clear, what we are solving, how to solve it is very clear. And then some of the things that you need, like what simulators do you need sometimes, folks already have simulators, other cases, they need a simulator. And then the entire thing is stitched together and all they need to do is come in and create the variations for the problem, create the policy, and then go ahead and use it. But this is what is needed to take a customer from, “Hey, I’ve got a problem. I don’t know what this thing does. Maybe I’ll understand that.” No. Okay. Now I know kind of a problem. I don’t know if the problem can be solved with this or not. So this is what we’ve been targeting. And as we’ve gotten our solution explorations to be very crisp, our own how we talk to customers because there’s, as you’re alluding to. There’s an education thing here, there is a confidence thing here. So we have to address all those pieces and we’re bringing the customers along the journey. The great thing is, customers like Pepsi moment, one thing they thought successful. They looked around the factory and said, I can put this approach on many things and that’s the conversation we’re having right now. The same thing with Shell, same things at Dell. So, this is the journey. Sam Charrington: [00:23:01] I appreciate in that the idea that to the contrary of what you might think if you read popular reporting about AI, it’s not like a silver bullet, particularly in this domain where, you’ve got some tool chain and it applies to every problem that any customer might have. And it sounds like you’re being strategic, selective and building kind of expertise and supporting tools around specific areas, so that, to your point, at when you are engaging with someone, they can have a high degree of confidence that you’ve done this before, you know how it’s going to work and what the process is. Gurdeep Pall: [00:23:37] Exactly. And the other interesting thing that we found, which is I think a little unique compared to some of the other things we’ve done with AI, is that the experts that we end up talking to in the different industries and these application areas, they have never encountered AI before. Folks who went to engineering discipline schools, real engineers, not fake engineers like software engineers, like us. I mean, these are like mechanical chemical, what have you. And when they went through college, they did Matlab and they did learn Simulink and so on. And they have relied on a set of tools that have given them employment, giving them career success and stood the test of time. And here, these five guys walked in with a swag and, Hey, we got AI for you and it’s called reinforcement learning. You gotta it’s really awesome. You got to try it. I mean that just doesn’t work. You should really bring them along. And then they have some real, real things that we’ve had to sort of go and take in like safety. Even if this thing worked, they want to be able to assert that this thing is gonna do something crazy. I mean, when you have that horizontal drilling machine from shell, And I mean, this thing can drill through anything. I mean, it’s this huge thing. There was a wall street journal article about three years ago when we first did this project with a two years ago, we did the challenge and, for them, they want to make sure that this thing actually is going to be safe and I’m going to create another new problem while it solve one for one. Yeah. So it’s, it’s been a learning thing for us, but it’s the need for the education, the need for bringing these folks along. And this is one of the reasons we did this project more app, which is this very interesting device. It’s like a toy, basically. It’s the three robotic arms, if you will. And there’s a clear plate on top. And the task is to balance a ping pong ball on this device, on this plate. Now this problem, of course, they’ll image it. The engineers will go to pin, right? I mean, PID control is something, in college. And guess what? So we said first, let’s start with Pitt. He does a pretty good job. But then he said, okay, well, I’m going to toss the ball onto the plate and see if it catches it well, turns up it doesn’t catch it. So that starts, then he said, I’m going to add more complexity. How about we try and make the ball go around the edge of the plate. So as the problem progresses in complexity, You now realize that the only way you can solve it is if you had something like our tool chain, which we have with Bonsai, you create a simulator and you have policy that you’re training, and then you’re able to get to that level of performance. So we did this solely to bring engineers who are used to a particular way along and to start to believe, and to start to get excited about this. So we created the sort of metaphor in which we could connect together with them. Sam Charrington: [00:26:37] Interesting. Interesting. It reminds me of this idea of, why deep learning is, is so important and software 2.0 and how, what is, where, where it’s particularly powerful is. In solving problems that we didn’t know how to write the rules for like in computer vision. Like how do you identify a cat versus a dog, right. The rules for that, who knows how to do that, but the neural network and figure that out. And similarly, there is a, a range of problems that PID is easily applied to, but there’s also a level of complexity that it is difficult to apply it to. And that is where you’re finding. The value in applying RL. Gurdeep Pall: [00:27:18] Exactly, exactly. And, we’ve you seen that either. They were just too many moving parts. So the folks had achieved automation, but they have not issued autonomy. So either it’s that class of problems, wherever you’re getting traction or that with the existing methods, they’ve plateaued out and performance. You know, there is more performance to be had, and this is incredible. Like you would think like, we’ve figured everything out, right? I mean, as a society and with all the advancements that’s happened, but HVAC control in buildings, we’ve been able to get startling results. I mean, this is millions of dollars, like on a campus that you can save. And then also the green benefits that you get from that. So there’s just tremendous opportunity. Sam Charrington: [00:28:07] So maybe let’s drill into that example more because I do want to get to kind of a more concrete understanding of what is the process look like? I’ve got a data center or physical plant or something, and, I have my HVAC costs are through the roof and someone told me about this AI thing on an airplane. And I called her deep, like, what’s the first thing that I do and how do I get from there to some cost reduction or greater efficiency or whatever my goal is applying some of this. Yeah. Gurdeep Pall: [00:28:40] So in this particular case, that’s, we’re focusing one of our solution accelerators just on this use case. Okay. And so we are able to say with very high confidence that. If you can give us this information. Which is typically you can have data that you might have collected because a lot of these are now sort of IOT sort of devices, the data that you’ve collected, we’re able to go from that data to we ingest that. And then this case, which is sort of another double click on the simulation thing, we able to actually create a data-driven simulator and we are able to now start creating a policy. Now they do need to specify, and this is where machine teaching comes in. They need to specify to us what behavior they are desiring. Which means that, that specification can be, is fairly, flexible. So you could say things like, I want it to be really informed between these times of the day. Or you could say if the outside temperature, which becomes one of the state variables, which goes into creating the brain, if that variable is outside of this range, then I want this kind of a behavior, in somewhere I want it to be cooler and inventory, I want to be warmer. All those inputs that are there now create a policy for me, which automatically controls the HVAC system, which means turning on the fan or turning on the heat or turning on the cooling and to do it dynamically because once the brain is built, all you have to do is to connect the inputs and the actions. So inputs is where we are sampling the state. And actions is what you’re saying. Okay. Increase heat, decrease, heat fees, the fan done off the fan, et cetera. And by the way, it’s not just temperature in this case. It’s also the carbon dioxide and nitrogen levels. And so on, all those are making sense and then the actions will be taken based on that. So that is what the position we would have. And we, again, trying to make it as. Turn key, et cetera, but recognize that every building is different. So every building has its own climate sort of fingerprint. And so there is work required in creating the brains. So you could take a brain off the shelf and use it. You know, I can’t say whether that would work better. It might have better energy consumption, but then use the people are not as comfortable. So you have to sort of tweak it and the more efficient we can make this end to end thing, but sooner folks can realize the value and a brain in this case is essentially a model or an agent or something like that is that fair? Great question. I have had, lots of folks asked me, including bill Gates. Why do you call it brain? and I think it’s a really good question. So the way we talk about it is it’s actually a collection of models. Okay. So. autonomous system tasks, sometimes these be decomposed into different parts. Like for example, if sort of robotic hand, it had to pick up an object and to stack it, you can pick up, can reach, can be one action. Pickup can be another action in a move and then stack. These are all distinct actions. No, some are pretty easy. You can almost sort of program them, reaching as nowadays, obviously many program depending on the device you have, but some need to be trained. So now this whole collection of things has to be orchestrated. And the right piece has to be invoked at the right time. And each one of them either is programmed, or this is a model and it’s a deep learning model. The Deanna Lynn Swann, and putting all of it together, becomes the brain. In fact, that’s how the human brain works. So the name is actually quite great, the visual cortex, and then, that’s the one has a particular purpose of, then it gives us another piece which then does reasoning. And then, you want to take. The action and that invokes a different part of the brain. So that’s why we call it a brain. And, yeah. Sam Charrington: [00:32:33] Okay. Going back to the HVAC example, you mentioned that a data driven simulation, so I’m imagining you coming to my company, I guess since this is my scenario and I’ve got the data center, I probably don’t have a simulation that exists for my data center and HVAC. And so. That’s immediately a big challenge if I need that to train a brain, but you’ve got a way to generate that just from the data that I’ve collected. Gurdeep Pall: [00:33:01] Yes. And this was something that we are having to do a lot more of as we are swinging and talking to customers, some have a simulator. Interestingly, now, simulators, as have been used for designing, modeling, testing they’ve existed. But typically there’s been a human on one side of the simulator, driving the simulator for whatever purpose they want. You know, if it’s flight simulator, you’re, you’re flying it. But for our case, It’s the AI, which has been trained as sitting on the other end of the simulator. And so some cases, we were able to take their existing simulators and to actually change the use case and still make it work okay. In some cases that worked great. Now, in some cases it didn’t work great because their simulator was designed for really different booklets. Like if you do CFD. the purpose is to model this thing and you have to model it to high precision. I mean, this is going to be, a plane flying through rain. So, it has to be very precisely done, but each crank, they typically have like HPC setups for CFD simulation, but each crank can take so much. So how are we don’t crack it so fast that we could learn, right. So we said, Well, that doesn’t work or they just don’t have a similar at all, like your case. So that’s where our next step is. Can you give us data? And for many folks, they have the data. If they have the data, then we say, okay, let’s start how we can take data. And how do we can actually make it into something that we can meet with our system. That worked for certain class of problems. And then we said as a complexity of problems, started increasing, we realized that we need a new trick up our sleeve. there’s a research group as part of my team. And we started looking at how can we apply deep learning to learn from this data to create simulators there. We ran into the first insight, which is that, deep learning is designed for sort of inference, right? So you run one crank. And you get a prediction and you’re done well. It turns out the real world is not like that. You know, this real world is modeled with differential equations, differential equations. Basically, you’ve got time and you’ve got this thing, which is continue to change its behavior with time. Depending on the previous state and the actions are being taken. So there’s some work, great work that is being done right now. And we are publishing it right now. In fact, some of it is already out in deep simulation networks and basically it’s like a noodle competitional fabric where you have, it’s kind of like ordinance where. You have with every crank, you take the output and sort of feed it back into the next time cycle. Of course you have to have, so the sampling of time can be actually variable. So you have to that neural competition fabric has to do with that, which is a pretty big thing in itself, but it also allows you to have many different components inside the simulation each, which is sort of learning in a different way. For example, if you’re tossing a ball. The ball has it’s physics. And then there’s the environment that has physics, which is new for me in physics, but turns out the Newtonian physics doesn’t change. You can toss a ball, you can toss up a water. So if you are training those components, it’s give me some of these pre-trained components. If you will, that can be trained ones, then you can, maybe tweak it based on the, the object will have different physics. But now, so you did this noodle competition fabric, which plays out in time. You are now able to have multiple components and you train this thing. This new architecture we believe is a pretty transformative thing in simulation because it now allows us to offer any complex simulation space. Which basically has lots of differential equations that are sort of running around inside of it. And we can train it reasonably quickly. Really.  It’s kind of like a graph noodle network because you have time and you have space. If you look at the components that actually make space. So there’s message passing, which is happening between every stage and that allows the learning to happen. And this backpropagation, which happens in which each of the components, like eventually you’re able to get a trained model, which can run like a simulator. So you stopped at some state to take an action, distinct States changes and you’re able to crack it. So we’re really excited about it. We think this will be a big accelerant in the approach that we have. Again, we get the data, use it, we can go at it and this similarly, they can also learn from other simulators. So if you have something that is quite inefficient, in terms of competition and stuff like that, this thing can learn of it. And then it can execute very fast. Because once it learns the fundamental differential equations that are underlying, this is just inference. It’s not doing any kind of a big competition once a string. So that is an area that we’re really excited about right now. Sam Charrington: [00:38:09] Awesome. So first step is capture some data. Next step, use that to train a simulator using this idea of deep simulation networks, potentially. Then you mentioned kind of using that to create a brain. It sounds like part of that is you corrected me when I said it’s a model. So part of that I’m imagining is figuring out the right level of abstraction for these different components or pieces. And then individually, I guess one of the questions that I had around that was. And when we talk about reinforcement learning and kind of a academic sense and how difficult it is to put it to use in real world situations. A lot of it has to do with like carefully crafting this objective function or cost function and all of the issues associated with that. You described what the customer has to do as more, less about describing this objective function and maybe constraining what the solution looks like. Am I kind of reading that correctly? And maybe you can elaborate on that and help us understand. Gurdeep Pall: [00:39:17] Absolutely. And you’ve, you’ve hit the nail on head on with reinforcement learning the reward, specification, the reward function that he had, the specification of that becomes the next problem. In fact, we have a very famous researcher at Microsoft research. Blackford, he’ll tell you that. He says, if you have a problem, And you modeled it as a reinforcement learning problem. You don’t have to, it really gets to the core of it, this thing, which is that getting the reward function. Right. And there’s lots of funny stories about bad reward functions and unintended consequences, but we ran into that and they still allow that in our tool chain, you can specify the board function, but now we are actually. The machine teaching, we read exploring what are other ways for an expert to describe what they want done and we’ve come to the concert or goal. So they specify the goal, using a particular approach, the semantics of which are contained within the problem and the environment. And we will automatically generate the reward function. Under the covers based on the goal. And we found this to be a very, much more approachable thing for, for our customers. In fact, a lot of our new engagements with customers, most of the time we ended up using goals. So that’s been, you know, and like I said, you know, we’re on this learning thing ourselves. And, you know, we’re seeing what’s working, what’s not working how to enhance it and move from there. Sam Charrington: [00:40:45] And so some of these like classical challenges with reward functions, like delayed attribution and things like that, that you see in reinforcement learning does goals as an approach. Side skirt those in some ways, or are those still issues that you see in the autonomy systems world? Gurdeep Pall: [00:41:06] Yeah. I mean, those are still issues we see and separately the algorithms are getting pretty good too. So he, you know, there’s an active area of research and better algorithms coming up. we are, you know, we are, we stay on top of that and be an incorporating more and more algorithms now into our tool chain because there’s some albums. Better suited for certain class of problems. Others are better for suited for another other type of problems, which then of course moves the problem to the next layer, which is which one do you select for? Which kind of problem. And you don’t want, obviously folks who’ve never done programming or AI to say, Oh, you tell me, do you want SAC? Or do you want this. No idea. Right? So we are also trying to put in that intelligence, so that it’s a, it’s a meta reasoning thing, which says, you know, given this kind of a goal, given this kind of a problem, and this is a sampling rate. So state space let’s automatically select the best algorithm. And we will use that for training. So, you know, nobody ever has to know, like, you know what craziness you had walked under the covers, but staying on top of this has been a really important piece for us. You know, we use this framework called re which has come out of a lot of the book please. you know, still can source Facebook. We are one of the. Big users of it and contributors for it now, in fact, the rate team 13, which is building that my team in Berkeley are literally in the same building on one floor apart. So there’s a lot of good intermingling there as well. So because we using that framework V relive is how people are adding more and more algorithms, you know, being able to really tap into that and what we find, of course, sometimes, you know, people will write an algorithm to publish a paper, but it’s not really Production grade. So then these come back and do our own implementation of it and contribute that. Sam Charrington: [00:42:54] So, kind of in this journey, we started with data, we built a simulation, we built a brain out of that simulation. Then that brain is able to then help me control my data center. HVAC. I’m imagining in this scenario that, you know, I still care about the safety issue that you mentioned. Maybe not, you know, it’s not a drill, that’s going to destroy my data center, but you know, I don’t wouldn’t want the policy that you recommend to decrease the life of my coolers or chillers. And then there’s also maybe explainability issues that arise. Like, why are you telling me to, you know, my HVAC engineer has always set the XYZ at six and you’re saying it should be at eight. Why is that? Gurdeep Pall: [00:43:40] Yeah, no, this is, it’s such a great topic. And, I’ve talked to my team and given my, experience at Microsoft. I remember when we were building windows NT and putting, networking into it. And so on, we had no idea how stuff was going to be attacked when the internet was starting out In fact, I was the development manager for the TCP IP stack for windows from 95 to 2000. I still managed to keep some of my sanity, but I can tell you, there were folks on my team who really were pushing 20 updates a week because we were starting to get attacked with every layer bottom of the network, moving its way up. All the way up into sockets, you know, all the tear drop API’s and all that. And then when they got to the top layer, that’s what is really started the most sophisticated attacks. That’s where I don’t know if you remember back after windows XP shipped the entire team took one year to harden the system. Because it was no longer just my problem as the networking guy, it was everybody’s problem. People who do buffer overruns and they would insert code and all that. So literally every component had it So the reason I’m telling this story is that I think that safety is a problem like that. And when we came into it, Hey, we got really good control and I can show you it better performance, but then there’s all this hidden stuff that you have to deal with. That’s been a big realization for us. it’s a multifaceted approach. So the first thing is, you know, you talked about like the wear and tear of the machine or breaking it down. A bunch of our use cases right now with customers are with those are factored in, and actually they’re factored in at the time of the teaching. So when you talk about the state space and something that has to be specified so that the policy is taking that dork out, so that component gets handled. The hardest safety things that are, there are like when the brain is operating, like, are we really at the mercy of the, sort of a deep learning model, which is going to say, take this action. And then, you know, the consequences of that are actually out of scope for, for, for what we’re doing. And this is where we started, you know, this is going to be ongoing work. This is never done. You know, kind of like what cyber security right now, we’re learning. It’s never going to be done, but we want to take some pretty concrete steps. So one really important work. And there was a newspaper that is published on this is that he developed a policy and the policy suggests an action. What do you do is you introduce another layer after that to decide if the action is a safe action or not. Now what goes into deciding, is it a safe action or not? Can be many things can be predicate logic. It can be temporal logic, you know? So you can pretty much assert no. Yes, because it is outside some range or it actually can be trained things itself. Like imagine adversity. Models which go into that component. So now when you are specifying in machine teaching right upfront, you can now start to insert ways where, you know, safety can be specified and that actually follows a very different path. Some of it will actually follow the path of the policy building itself because some things can be caught there, but other things are actually more brought into bear at operation style. And that is very important because, you know, you probably heard about some of the discussions on how like level five autonomy is going to be rolled out in cities. And they’re saying, you know, these bus lanes and stuff like that. And I think it’s a wonderful idea because you’re solving the other side of the equation, which is you can control. So imagine like, you know, I always talk about this example and my team just sort of looks at me strange. So imagine you have the sort of armed robot and it is working the space with humans, also working. It is very common. You see this in machines in factories, they will have a red line or dotted red line around the protection. And the humans know they’re not going to go there. And now you’ve created a rule which says, regardless of what policy, what action, the policy tells you, if it is outside of radial, whatever distance that is. You will not take that action. So you’ve created an environment in which humans and this armed robot to swing around can actually co-exist in the same place. So it’s a very pragmatic approach, but it has to be part of your solution. Otherwise you don’t, the engineers are right. I mean, these crazies are showing up with reinforcement learning and it’s going to create all kinds of issues for, for us safety issues and so on. Sam Charrington: [00:48:33] Yeah. I love that analogy and just taking it one step further. It would be a lot more difficult to build into your kind of motion trajectories, for example, a way for this arm to avoid a human that stepped into the zone, then building something that determines that a human has stepped into the zone and just shuts everything down. And I think what I’m taking away from what you’re saying here is that. Safety is a multi-layered problem. And it’s not all about kind of making the neural net responsible for everything it’s about identifying, you know, how you can enforce safety in these different levels. And thinking about it as a system, like from an engineering person. Right. Gurdeep Pall: [00:49:16] Exactly. I think that has been a big learning for us as well, that, you know, it’s not just resolved the hardest they have problem and suddenly, you know, everything and they will come, right? No, you have to really think about it that way. And I think this, you know, the safety layer, which evaluates after every action is recommended, you know, it has to be this amazing, like. This is where a lot of the new capabilities will come in in the future adversity stuff. But you can imagine a completely separate model, which is basically trying to, this is going to give you this one or zero. If anybody human has stepped into the red line, it is going to give you a one and it shut off. Right. And that keeps improving the perception and things like that. So, yeah. So it is, it is a system thing as you, as you know, that’s, that’s very good to think of. Sam Charrington: [00:50:03] Right, right. So maybe to help us wrap up. It’s the very beginning of 2021 autonomous systems is a. Kind of a broad area, where do you see things going over the next few years? How does this all evolve? Gurdeep Pall: [00:50:18] Yeah. You know, we believe that we’re entering the era of autonomous systems and you know, it’s always hard to predict, right? This is famous billboard thing. Prediction is hard, especially about the future, but, you know, I remember looking on windows, NT, the networking of the internet, you know, these things just, they explode. And some right elements have to be there for this explosion to happen. And I think with the breakthroughs in AI, with the focus on solving business problems in a complete way, like we talked with safety with the industry coming along, like, you know, we’ve been spending a lot of time on data during simulators, but we believe that the simulation industry that is there, you know, we really want to partner with them. We’ve got great partners with MathWorks, you know, with you to bring them along. So that. Together. We can create an end to end tool chain in which these autonomous systems can be created without, you know, requiring, you know, the level of high level of expertise. That for example is going into a lot of the autonomous driving. I mean, the teams that are building this dominance, driving stacks are just super deep driving. There’s super experts and they’re building it all in the sort of silo way, very vertical way. We want it to be horizontal components. Then you’ll have some of the vendors of autonomous systems where anybody can come in, they come and describe the problems. They’re able to create the brain and employ it. That’s going to explode the number of autonomous systems that are out there. And I think this is great for many different things, including our climate, including, you know, resilience that we’ve seen during COVID where logistics and these things just have to continue. Production has to continue. So I think now’s the time and, you know, I think it’s going to happen. Sam Charrington: [00:52:05] Awesome. Awesome. Well, good deal. Thanks so much for taking the time to chat and sharing a bit about what you’re up to there. Gurdeep Pall: [00:52:13] Totally my pleasure. And you know, you have a great podcast, so it’s great to be here talking to you about my stuff. Sam Charrington: [00:52:25]Awesome. Thank you. Thank you. Take care. All right, everyone. That’s our show for today to learn more about today’s guest or the topics mentioned in this interview, visit twimlai.com. Of course, if you like what you hear on the podcast, please subscribe, rate, and review the show on your favorite pod catcher. Thanks so much for listening and catch you next time.
Sam Charrington: Today we're excited to continue the AI for the benefit of society series that we've partnered with Microsoft to bring to you. Today we're joined by Hanna Wallach principal researcher at Microsoft research. Hanna and I really dig into how bias and a lack of interpretability and transparency show up across machine learning. We discuss the role that human biases, even those that are inadvertent, play in tainting data, whether deployment of fair ML algorithms can actually be achieved in practice and much more. Along the way, Hannah points us to a ton of papers and resources to further explore the topic of fairness in ML. You'll definitely want to check out the show notes page for this episode, which you'll find at twimlai.com/talk/232. Before diving in I'd like to thank Microsoft for their support of the show and their sponsorship of this series. Microsoft is committed to ensuring the responsible development and use of AI and is empowering people around the world with this intelligent technology to help solve previously intractable societal challenges, spanning sustainability, accessibility and humanitarian action. Learn more about their plan at Microsoft.ai. Enjoy. Sam Charrington: [00:02:18] All right everyone, I am on the line with Hanna Wallack, Hanna is a principal researcher at Microsoft Research in New York City. Hanna, welcome to this week in Machine Learning and AI. Hanna Wallach:[00:00:11] Thanks, Sam. It's really awesome to be here. Sam Charrington: [00:00:14] It is a pleasure to have you on the show, and I'm really looking forward to this conversation. You are clearly very well known in the machine learning and AI space. Last year, you were the program chair at one of the largest conferences in the field, NeurIPS. In 2019, you'll be it's general chair. But for those who don't know about your background, tell us a little bit about how you got involved and started in ML and AI. Hanna Wallach:[00:00:48] Sure. Absolutely. So I am a machine learning researcher by training, as you might expect. I've been doing machine learning for about 17 years now. So since way before this stuff was even remotely fashionable, or popular, or cool, or whatever it is nowadays. In that time, we've really seen machine learning change a lot. It's sort of gone from this weirdo academic discipline only of interest to nerds like me, to something that's so mainstream that it's on billboards, it's in TV shows, and so on and so forth. It's been pretty incredible to see that shift over that time. I got into machine learning sort of by accident, I think that's often what happens. I had taken some undergrad classes on information theory and stuff like that, found that to be really interesting, but thought that I was probably going to go into human computer interaction research. But through a research assistantship during the summer between my undergrad degree and my Master's degree, I ended up discovering machine learning, and was completely blown away by it. I realized that this is what I wanted to do. I've been focusing on machine learning in various different forms since them. My PHD was specifically on Bayesian Latent Variable methods, typically for analyzing text and documents. So topic models, that kind of thing. But during my PHD, I really began to realize that I'm not particularly interested in analyzing documents for the sake of analyzing documents, I'm interested in analyzing documents because humans write documents to communicate with one another. It's really that underlying social process that I'm most interested in. So then during my postdoc, I started to shift direction from primarily looking at text and documents to thinking really about those social processes. So not just what are people saying, but also who’s interacting with whom, and thinking about machine learning methods for analyzing the structure and content of social processes in combination. I then dove into this much more when I got a faculty job, because I was hired as part of UMass Amherst’s Computational Social Science Initiative. So at that point I started focusing really in depth on this idea of using machine learning to study society. I established collaborations with a number of different social scientists, focusing on a number of different topics. Over the years, I've mostly ended up working with political scientists, and often study questions relating to government transparency, and still looking at sort of this whole idea of a social process consists of individuals, or groups of individuals interacting with one another, information that might be used in or arising from these interactions, and then the fact that these things might change over time. I often use one of these or two of these modalities, so structure, content, or dynamics, to learn about one or more of the other ones as well. As I continued to work in this space, I started to think more, not just about how we can use machine learning to study society, but the fact that machine learning is becoming much more prevalent within society. About four years ago, I started really thinking more about these issues of fairness, accountability, transparency, and ethics. It was a pretty natural fit for me to start moving in this direction. Not only was I already thinking about questions to do with people, but I've done a lot of diversity and inclusion work in my non research life. So I'm one of the co-founders of the Women in Machine Learning workshop, I also co-founded two organizations to get more women involved in free and open source software development. So issues related to fairness and stuff like that are really something that I tend to think about a lot in general. So I ended up making sort of this shift a little bit in my research focus. That's not to say that I don't still work on things to do with core computational social science, but increasingly my research is focusing on the ways that machine learning impacts society. So fairness, accountability, transparency, and ethics. Sam Charrington: [00:05:53] We will certainly dive deep into those topics. But before we do, you've mentioned a couple of times the term computational social science. That's not a term that I've heard before, I don't believe. Can you ... Is that ... I guess I'm curious how established that is as a field, or is it something that is specific to that institution that you were working at? Hanna Wallach:[00:06:19] Sure. So this is really a discipline that started emerging in maybe sort of 2009, 2008, that kind of time. By 2010, which is when I was hired at UMass, it really was sort of its own little emerging field with a bunch of different computer scientists and social scientists really committed to pushing this forward as a discipline. The basic idea, of course, is you know social scientists study society and social processes, and they've been doing this for decades. But often using qualitative methods. But of course, as more of society moves towards digitized interaction methods, and online platforms, and other kinds of things like that, we're beginning to see much more of this sort of digital data. At the same time, we've seen this massive increase, as I've said, in the popularity of machine learning and machine learning methods that are really suitable for analyzing data about social processes in society. So computational social science is really the sort of emerging discipline at the intersection of computer science, the social sciences, and statistics as well. The real goal is to develop and use computational and statistical methods, so machine learning methods, for example, to understand society, social processes, and answer questions that are substantively interesting to social scientists. At this point, there are people at a number of different institutions focusing on computational social science. So yes, of course, UMass, as I've mentioned before. But also Northwestern, Northeastern, University of Washington, in fact have been doing this for years, and of course, Microsoft Research is no exception in this regard. Part of the reason why I joined Microsoft Research was that we have a truly exceptional group of researchers in computational social science here. That was really very appealing to me. Sam Charrington: [00:08:31] Oh, awesome, awesome. So you talked about your transition to focusing on fairness, accountability, transparency, and ethics in machine learning and AI. Can you talk a little bit about what those terms mean to you, and your broader research? Hanna Wallach:[00:08:54] Yeah, absolutely. So I think the bulk of my own research in that sort of broad umbrella falls within two categories. So the first is fairness, and the second is what I would sort of describe as interpretability of machine learning. So in that fairness bucket, really, much of my research is focused on studying the ways in which machine learning can inadvertently harm or disadvantage groups of people or individual people in various different, usually unintended, ways. I'm interested in understanding not only why this occurs, but what we can do to mitigate it, and what we can do to really develop fairer machine learning systems. So systems that don't inadvertently harm individuals or groups of people. In the intelligibility bucket, so there, I'm really interested in how we can make machine learning methods that are interpretable to humans in different roles for particular purposes. There has been a lot of research in this area over the past few years, focusing on oftentimes developing simple machine learning models that can be easily understood my humans simply by exposing their internals, and also on developing methods that can generate explanations for either entire models or the predictions of models. Those models might be potentially very complex. My own work typically focuses really more on the human side of intelligibility, so what is it that might make a system intelligible or interpretable to a human trying to carry out some particular task? I do a lot of human subjects experiments to really try and understand some of those questions with a variety of different folks here at Microsoft Research. Sam Charrington: [00:11:01] On the topic of fairness and avoiding inadvertent harm, there are a lot of examples that I think many of our audience would be familiar with, the ProPublica work into the use of machine learning systems in the justice process, and others. Are there examples that come to mind for you that are maybe less well known, but that illustrate for you the importance of that type of work? Hanna Wallach:[00:11:36] Yes. So when I typically think about this space, I tend to think about this in terms of the types of different harms that can occur. I have some work with Aaron Shapiro, Solon Barocas, and Kate Crawford on the different types of harms that can occur. Kate Crawford actually did a fantastic job of talking about this work in her invited talk at the NeurIPS conference in 2017. But to give you some concrete examples, so many of the examples that people are most familiar with are these scenarios as you mentioned where machine learning systems are being used to allocate or withhold resources, opportunities, or information. So one example would be of the compass recidivism prediction system being used to make decisions about whether people should be released on bail. Another example would be from a story, a news story that happened in November where Amazon revealed that it had abandoned an automating hiring tool because of fears that the tool would reinforce existing gender imbalances in the workplace. So there you're looking at these existing gender imbalances, and seeing that this tool is perhaps withholding opportunities from women in the tech industry in an undesirable way. There was a lot of coverage about this very sensible decision that Amazon made to abandon that tool. Some other examples would be more related to quality of service issues even when no resources or opportunities are being allocated or withheld. So a great example there would be the work that Joy Buolamwini and Timnit Gebru did focusing on the ways that commercial gender classification systems might perform less well, so less accurate, for certain groups of people. Another example you might think of, let's say, speech recognition systems. You can imagine systems that work really well for people with certain types of accents, or for people with voices at certain pitches. But less well for other people, certainly for me. I'm British, and I have a lisp. I know that oftentimes speech recognition systems don't do a great job of understanding what I'm saying. This is much less of an issue nowadays, but you know, five or so years ago, this was really frustrating for me. Some other examples are things like stereotyping. So here the most famous example of stereotyping in machine learning is Latanya Sweeney's work from 2013, where she showed that advertisements that were being shown on web searches for different people's names would more typically be advertisements that reinforced stereotypes about black criminality when people searched for sort of black sounding names, than when people searched for stereotypically white sounding names. So there the issue is this sort of reinforcement of these negative stereotypes within society by the placement of particular ads for particular different types of searches. So another example of stereotyping in machine learning would be the work done by Joanna Bryson and others at Princeton University on stereotypes in word embeddings. There has also been some similar work done by my colleague, Adam Kalai, here at Microsoft Research. Both of these groups of researchers showed that if you train word embedding methods, so things like Word2Vec, that try and identify a low dimensional embedding for word types based on the surrounding words that are typically used in conjunction with them in sentences, you end up seeing that these word embeddings reinforce existing gender stereotypes. For example, so the word man ends up being embedded much closer to programmer and similarly woman ends up being embedded much closer to homemaker than vice versa. So that would be another kind of example. Then we see other kinds of examples of unfairness and harms within machine learning as well. So for example, over and under representation. So Matthew Kay and some others at the University of Washington have this really nice paper where they show that for professions with an equal or higher percentage of men than women, the image search results are much more heavily skewed towards images of men than reality. So that would be another kind of example. What you'll see from all of these examples that I've mentioned is that they affect a really wide range of systems and types of machine learning applications. The types of harms or unfairness that might occur are also pretty wide ranging as well, going from yes, sure, allocational withholding of resources, opportunities of information, but moving beyond that to stereotyping and representation and so on. Sam Charrington: [00:17:02] So often when thinking about fairness and bias in machine learning and the types of harm that can come about when unfair systems are developed, the kind of all roads lead back to the data itself, and the biases that are inherent in that data. Given that machine learning and AI is so dependent on data, and often much of the data that we have is biased, what can we do about that, and what are the kinds of things that your research is exploring to help us address these issues? Hanna Wallach:[00:17:41] Absolutely. Yeah, so you've hit on a really important point there, which is that in a lot of the sort of public discourse about fairness in machine learning, you have people making comments about algorithms being unfair, or algorithms being biased. Really, I think this misses some of the most fundamental points about why this is such a challenging landscape. So I want to just emphasize a couple of those here in response to your question. So the first thing is that machine learning is all about taking data, finding patterns in that data, and then often training systems to mimic the decisions that are represented within that data. Of course, we know that the society we live in is not fair. It is biased. There are structural disadvantages and discrimination all over the place. So it's pretty inevitable that if you take data from a society like that, and then train machine learning systems to find patterns expressed in that data, and to mimic the decisions made within that society, you will necessarily reproduce those structural disadvantages, that bias, that discrimination, and so on. So you're absolutely right that a lot of this does indeed come from data. But the other point that I want to make is that it's not just from data and it's not from algorithms per se. The issue is really, as I see it, and as my colleagues here at Microsoft Research see it, the issue is really about people and people's decisions at every point in that machine learning life cycle. So I've done some work on this with a number of people here at Microsoft, most recently I put together a tutorial on machine learning and fairness in collaboration with my colleague Jenn Wortman Vaughan. The way we really think about this is that you have to prioritize fairness at every stage of that machine learning lifecycle. You can't think about it as an afterthought. The reason why is that decisions that we make at every stage can fundamentally impact whether or not a system treats people fairly. So I think it's really important when we're thinking about fairness in machine learning to not just sort of make general statements about algorithms being unfair, or systems being unfair, but really to go back to those particular points and think about how unfairness can kind of creep in at any one of those stages. That might be as early as the task definition stage, so when you're sitting down to develop some machine learning system, it's really important to ask the question of who does this take power from, and who does this give power to? The answers to that question often reveal a lot about whether or not that technology should even be built in this first place. Sometimes the answer to addressing fairness in machine learning is simply, no, we should not be building that technology. But there are all kinds of other decisions and assumptions at other points in that machine learning life cycle as well. So the way we typically like to think about it is that a machine learning model, or method, is effectively an abstraction of the world. In making that abstraction, you necessarily have to make a bunch of assumptions about the world. Some of these assumptions will be more or less justified, some of these assumptions will be better fit for the reality than others. But if you're not thinking really carefully about what those assumptions are, when you are developing your machine learning system, this is one of the most obvious places that you can inadvertently end up introducing bias or unfairness. Sam Charrington: [00:21:42] Can you give us some concrete examples there? Hanna Wallach:[00:21:45] Yeah. Absolutely. One common example of this form would be stuff to do with teacher evaluation. So there have been a couple of high profile lawsuits about this kind of thing. But I think it illustrates the point nicely. So it's common for teachers to be evaluated based on a number of different factors, but including their student's test scores. Indeed, many of the methods that have been developed to analyze teacher quality using machine learning systems have really focused predominantly on student's test scores. But this assumes that student's test scores are in fact an accurate predictor of teacher quality. This isn't actually always the case. A good teacher should obviously do more than test prep. So any system that really looks just at test scores when trying to predict teacher quality is going to do a bad job of capturing these other properties. So that would be one example. Another example involves predictive policing. So a predictive policing system might make predictions about where crimes will be committed based on historic arrest data. But an implicit assumption here is that the number of arrests in an area is an accurate proxy for the amount of crime. It doesn't take into account the fact that policing practices can be racially biased, or there might be historic over policing in less affluent neighborhoods. I'll give you another example as well. So many machine learning methods work by defining some objective function, and then learning the parameters of the model so as to optimize that objective function. So for example, if you define an objective function in the context of, let's say, a search engine, that prioritizes user clicks, you may end up with search results that don't necessarily reflect what you want them to. This is because users may click on certain types of search results over other search results, and that might not be reflective of what you want to be showing when you show users a page of search results. So as a concrete example, many search engines, if you search for the word boy, you see a bunch of pictures of male children. But if you search for the world girl, you see a bunch of pictures of grown up women. These are pretty different to each other. This probably comes from the fact that search engines typically optimize for clicks among other metrics. This really shows how hard it can be to even address these kinds of fairness issues, because in different circumstances the word girl may be referring to a child or a woman, and users search for this term with different intentions. In this particular example, as you can probably imagine, one of these intentions might be more prevalent than the other. Sam Charrington: [00:24:57] You've identified lots of opportunities for pitfalls in the process of fielding systems going all the way back to the way you define your system, and state your intentions, and formulate the problem that you're going after. Beyond simply being mindful of the potential for bias and unfairness and just saying simply, I realize that that's not simple, that it's work to be mindful of this. But beyond that, what does your research offer in terms of how to overcome these kinds of issues? Hanna Wallach:[00:25:43] Yeah, this is a really good question. It's a question that I get a lot from people is what can we actually do in practice? There are a number of things that can be done in practice. Not all of them are easy things to do, as you say. So one of the most important things is that issues relating to fairness in machine learning are fundamentally socio-technical. They're not going to be addressed by computer scientists or developers alone. It's really important to involve a range of diverse stakeholders in these conversations when we're developing machine learning systems so that we have a bunch of different perspectives represented. So moving beyond just involving computer scientists and developers on teams, it's really important that we involve social scientists, lawyers, policy makers, end users, people who are going to be affected or impacted by these systems down the line, and so on and so forth. That's one really concrete thing you can do. There is a project that came out of the University of Washington called the Diverse Voices project. It provides a way of getting feedback from stakeholders on tech policy documents. It's really good, they have a great how-to guide that I definitely recommend checking out. But many of the things that they recommend doing there, you can also think about when you're trying to get feedback from stakeholders on, let's say, the definition of a machine learning system. So that task definition stage. Some of these could even potentially be expanded to consider other stages of that machine learning pipeline as well. So there are a number of things that you can do at every single stage of the machine learning pipeline. In fact, this tutorial that I mentioned earlier, that I worked on with my colleague Jenn Wortman Vaughan actually has guidelines for every single step of the pipeline. But to give you examples, here are some things, for instance, that you can do when you're selecting a data source. So for example, it's really important to think critically before even collecting any data. It's often very tempting to say, oh, there is already some dataset that I can probably repurpose for this. But it's really important to take that step back and before immediately acting based on availability to actually think about whether that data source is appropriate for the task you want to use it for. There is a number of reasons why it might not be, it could be to do with biases and the data source selection process. There might be societal biases present in the data source itself. It might be that the data source doesn't match the deployment context, that's a really important one that people really should be taking into account. Where are you thinking about deploying your machine learning system and does the data you have availability for training and development match that context? As another example, still related to data, it's really important to think about biases in the technology used to collect data. So as an example here, there was an app released in the city of Boston back in 2011, I think it was called Street Bump. The way it worked is it used iPhone data and specifically the sort of positional movement of iPhones as people were driving around, to gather data on where there were potholes that should be repaired by the city. But pretty quickly, the city of Boston figured out that this actually wasn't a great way to get that kind of data, because back in 2011, the people who had iPhones were typically quite affluent and only lived in certain neighborhoods. So that would be an example about thinking carefully about the technology even used to collect data. It's also really important to make sure that there is sufficient representation of different subpopulations who might be ultimately using or affected by your machine learning system to make sure that you really do have good representation overall. Moving onto things like the model, there is a number of different things that you can do there, for instance, as well. So in the case of a model, I mentioned a bit about assumptions being really important. It's great to really clearly define all of your assumptions about the model, and then to question whether there might be any explicit or implicit biases present in those assumptions. That's a really important thing to do when you're thinking about choosing any particular model or model structure. You could even, in some scenarios, include some quantitative notion of parity, for instance, in your model objective function as well. There have been a number of academic papers that take that approach in the literature over the past few years. Sam Charrington: [00:30:43] Can you give an example of that last point? Hanna Wallach:[00:30:46] Yeah, sure. So imagine you have some kind of a machine learning classifier that's going to make decisions of the form, let's say loan, no loan, hire, no hire, bail, no bail, and so on. The way we normally develop these classifiers is to take a bunch of labeled data, so data points labeled with, let's say, loan, no loan, and then we train a model, a machine learning model, a classifier, to optimize accuracy on that training data. So you end up setting the parameters of that model such that it does a good job of accurately predicting those labels from the training data. So the objective function that's typically used is one that considers, usually, only accuracy. But something else you can do is define some quantitative definition of fairness, some quantitative fairness metric, and then try to simultaneously optimize both of these objectives. So classifier accuracy and whatever your chosen fairness metric is. There is a number of these different quantitative metrics that have been proposed out there that all typically are looking at parity across groups of some sort. So I think it's really important to remember that even though these are often referred to as fairness metrics, they're really parity metrics. They neglect many of the really important other aspects of fairness, like justice, and due process, and so on and so forth. But, it is absolutely possible to take these parity metrics and to incorporate them into the objective function of, say, a classifier, and then to try and prioritize satisfying and optimizing that fairness metric at the same time as optimizing classifier accuracy. There have been a number of papers that focus on this kind of approach, many of them will focus on one particular type of classifier, so like SBMs, or neural networks, or something like that, and one particular fairness metric. There are a bunch of standard fairness metrics that people like to look at. I actually have some work with some colleagues here at Microsoft where we have a slightly more general way of doing this that will work with many different types of classifiers, and many different types of fairness metrics. So there is no reason to start again from scratch if you want to switch to a different classifier or a different fairness metric. We actually have some open source Python code available on GitHub that implements our approach. Sam Charrington: [00:33:27] So you've talked about the idea that kind of people are fundamentally the root of the issue, that these are societal issues, that they're not going to be solved by technological advancements or processes alone. At the same time, there has been a ton of new research happening in this area by folks in your group and elsewhere. Does that lead to a mismatch between what's happening in academia and on the technical side with the way this stuff actually gets put into practice? Hanna Wallach:[00:34:11] That's an awesome question. The simple answer is yes. This actually relates to one of my most recent research projects, which I'm really, really excited about. So last summer, some of my colleagues and I, specifically Jenn Wortman Vaughan, Miro Dudík, and Hal Daumé, along with our incredible intern, Ken Holstein from CMU, conducted the first systematic investigation of industry practitioner's challenges and needs for support relating to developing fairer machine learning systems. This work actually came about because we were thinking about ways of developing interfaces for that fair classification work that I mentioned a minute ago. Through a number of conversations with people in different product groups here at Microsoft and people at other companies, we realized that these kinds of classification tasks, while they're incredibly well studied within the fairness and machine learning literature, are maybe less common than we had thought in practice within industry. So that got us thinking about whether there might be, actually, a mismatch between the academic literature on fairness and machine learning, and practitioner's actual needs. What we ended up doing was this super interesting research project that was a pretty different style of research for me and for my colleagues. So I am a machine learning researcher, so is Jen, so is Howell, and so is Miro. Ken, our intern, is an HCI researcher. What we ended up doing was this qualitative HCI work to really understand what it is that practitioners are facing in reality when they try and develop fairer machine learning systems. To do this, we conducted semi structured interviews with 35 people, spanning 25 different teams, in 10 different companies. These people were in a number of different roles, ranging from social scientist, data labeler, product manager, program manager, to data scientists and researcher. Where possible, we tried to interview multiple people from the same team in order to get a variety of perspectives on that team's challenges and needs for support. We then took our findings from these interviews, and developed a survey which was then completed by another 267 industry practitioners, again, in a variety of different companies and a variety of different roles. What we found, at a high level, was that yes, there is a mismatch between the academic literature on fairness in machine learning and industry practitioner's actual challenges and needs for support on the ground. So firstly, much of the machine learning literature on fairness focuses on classification, and on supervised machine learning methods. In fact, what we found is that industry practitioners are grappling with fairness issues in a much wider range of applications beyond classification or prediction scenarios. In fact, many times the systems they're dealing with involve these really rich, complex interactions between users and the system. So for example, chat bots, or adaptive tutoring, or personalized retail, and so on and so forth. So as a result, they often struggle to use existing fairness research from the literature, because the things that they're facing are much less amenable to these quantitative fairness metrics. Indeed, very few teams have fairness KPIs or automated tests that they can use within their domain. One of the other things that we found is that the machine learning literature typically assumes access to sensitive attributes like race or gender, for the purpose of auditing systems for fairness. But in practice, many teams have no access to these kinds of attributes, and certainly not at the level of individuals. So they express needs for support in detecting biases and unfairness with access only to core screened, partial, or indirect information. This is something that we've seen much less focus on in the academic literature. Sam Charrington: [00:38:41] That last point is an interesting one, and one that I've brought up on the podcast previously. In many of the places you might want to use an approach like that, it's forbidden, from a regulatory perspective, to use the information that you want to use in your classifier to achieve fairness in any part of the decisioning process. Hanna Wallach:[00:39:04] Exactly. This sets up this really difficult tension between doing the right thing in practice from a machine learning perspective, and what is legally allowed. I'm actually working on a paper at the moment with a lawyer, Zack Conard, actually, a law student, Zack Conard, at Stanford University, on exactly this issue. This challenge between what you want to do from a machine learning perspective, and what you are required to do from a legal perspective, based on humans and how humans behave, and hundreds of years of law in that realm. It's really challenging, and there is this complicated trade off there that we really need to be thinking about. Sam Charrington: [00:39:48] It does make me wonder if techniques like or analogous to a differential privacy or something like that could be used to provide a regulatorily acceptable way to access protected attributes, so that they can be incorporated into algorithms like this. Hanna Wallach:[00:40:07] Yeah, so there was some work on exactly this kind of topic at the FAT ML Workshop colocated with ICML last year. This work was proposing the use of encryption and such like in order to collect and make available such information, but in a way that users would feel as if their privacy was being respected, and so that people who wanted to use that information would be able to use it for purposes such as auditing. I think that's a really promising approach, although there is obviously a bunch of non trivial challenges involved in thinking about how you might make that a reality. It's a really complicated landscape. But definitely one that's worth thinking about. Sam Charrington: [00:40:54] Was there a third area that you were about to mention? Hanna Wallach:[00:40:58] Yeah, so one of the main themes that we found in our work studying industry practitioners is a real mismatch between the focus on different points in the machine learning life cycle. So the machine learning literature typically assumes no agency over data collection. This makes sense, right? If you're a machine learning academic, you typically work with standard data sets that have been collected and made available for years. You don't typically think about having agency over that data collection process. But of course, in industry, that's exactly where practitioners often do have the most control. They are in charge of that data collection or data curation process, and in contrast, they often have much less control over the methods or models themselves, which often are embedded within much bigger systems. So it's much harder to intervene from a perspective of fairness with the models than it is with the data. We found that really interesting, this sort of difference in emphasis between models versus data in these different groups of people. Of course, many practitioners voiced needs for support in figuring out how to leverage that sort of agency over data collection to create fairer data sets for use in developing their systems. Sam Charrington: [00:42:20] So you mentioned the FAT ML workshop. I'm wondering as we come to a close, if there are any resources, events, pointers, I'm sure there are tons of things that you'd love to point people at. But what are your top three or four things that you would suggest people take a look at as they're trying to wrap their heads around this area, and how to either have an impact as a researcher, or how to make good use of it as a practitioner? Hanna Wallach:[00:42:55] Yeah. Absolutely. So there are a number of different places with resources to learn more about this kind of stuff. So first, I've mentioned a couple of times, this tutorial that I put together with Jen Waltman Vahn, that will be available publicly online very soon. It is in fact being broadcast next week, so it should be up by the time this podcast goes live. So I would definitely recommend that people check that out to really get a sense of how we, at Microsoft, are thinking about fairness in machine learning. Then moving beyond that, and thinking specifically on more of the academic literature, the FAT ML workshop maintains a list of resources on the workshop website. That's again, another really, really great place to look for things to read about this topic. The FAT Star conference is a relatively newly created conference on fairness accountability and transparency, not just in machine learning, but across all of computer science and computational systems. Again, there, I recommend checking out the website to see the publications that were there last year, and also the publications that will be there this year. There is a number of really interesting papers that I haven't read yet, but I'm super excited to read, being presented at this year's conference. That conference also has tutorials on a range of different subjects. So it's also worth looking at the various different tutorials there. So at last year's conference, Arvind Narayanan presented this amazing tutorial on quantitative fairness metrics, and why they're not a one size fits all solution, why there are trade offs between them, why you can't just sort of take one of these definitions, optimize for it, and call it quits. So I definitely recommend checking that out. Some other places that are worth looking for resources on this, the AI Now Institute, which was co-founded by Kate Crawford, who is also here at Microsoft Research, and Meredith Whitaker, who is also at Google, also has some incredibly awesome resources. They've put out a number of white papers and reports over the past couple of years that really get at the crux of why these are complicated socio-technical issues. So I strongly recommend reading pretty much everything that they put out. I would also recommend checking out some of the material put out by Data and Society, which is also an organization here in New York, led by Danah Boyd, and they too have a number of really interesting things that you can read about these different topics. Then the final thing I want to emphasize is the Partnership on AI, which was formed a couple of years ago by Microsoft and a bunch of other companies working in this space of AI to really foster cross company collaboration and moving forward in this space when thinking about these complicated societal issues that relate to AI and machine learning. So the partnership has been really ramping up over the past couple of years, and they also have some good resources that are worth checking out. Sam Charrington: [00:46:22] Oh, that's great. That is a great list that will keep us busy for a while. Hanna, thank you so much for taking the time to chat with us. It was really a great conversation, and I appreciate it. Hanna Wallach:[00:46:34] No problem. Thank you for having me. This has been really great. Sam Charrington: [00:46:38] Awesome, thank you.
In my recent podcast with Facebook AI research scientist Moustapha Cissé, Cissé shared the insightful quote, “you are what you eat and right now we feed our models junk food.” Well, just like you can’t eat better if you don’t know what‘s in your food, you can’t train less biased models if you don’t know what’s in your training data. That’s why the recent paper, Datasheets for Datasets, by Timnit Gebru (see her TWIML podcast and meetup) and her co-authors from Microsoft Research and elsewhere is so interesting. In this paper, Timnit and company propose the equivalent of food nutrition labeling for datasets. Given that many machine learning and deep learning model development efforts use public datasets such as ImageNet or COCO–or private datasets produced by others–it’s important to be able to convey the context, biases, and other material aspects of a training dataset to those interested in using it. The Datasheets for Datasets paper explores the idea of using standardized datasheets to communicate this information to users of datasets, commercialized APIs, and pre-trained models. In addition to helping to communicate data biases, the authors propose that such datasheets can improve transparency and provide a source of accountability. Beyond potential ethical issues, hidden data biases can cause unpredictability or failures in deployed systems when models trained on third-party data fail to generalize adequately to different contexts. Of course, the best option is to collect first-party data and use models built and trained by experts with deep domain knowledge. But widely available public datasets, more approachable machine learning tools, and readily accessible AI APIs and pre-built models are democratizing AI and enabling a broader group of developers to incorporate AI into their applications. The authors suggest that datasheets for AI datasets and tools could go a long way in providing essential information to engineers that might not have domain expertise, and in doing so help mitigate some of the issues associated with dataset misuse. This perspective echoes similar thoughts from Clare Gollnick in our discussion on the reproducibility crisis in science and AI. She expressed her concern for developers turning first to deeper, more complex models to solve their problems, noting that they often run into generalization issues when those models are moved into production. Rather, she finds that when AI problems are solved by capitalizing on some discovery found through a strong understanding of the domain at hand, the results are much more robust. Timnit and her co-authors suggest in the paper that AI has yet to undergo the safety regulations of emergent industries of the past, like the automobile, medicine, and electrical industries. The paper points out that, “When cars first became available in the United States, there were no speed limits, stop signs, traffic lights, driver education, or regulations pertaining to seat belts or drunk driving. Thus, the early 1900s saw many deaths and injuries due to collisions, speeding, and reckless driving.” Over the course of decades, the automobile industry and others iteratively developed regulations meant to protect the public good, while still allowing for innovation. The paper suggests that it’s not too early to start considering these types of regulations for AI, especially as it begins to be used in high-stakes applications like the health and public sectors. Such regulation will likely first apply to issues of privacy, bias, ethics, and transparency, and in fact, Europe’s impending General Data Protection Regulation (GDPR) takes on just these issues. The proposed datasheets take cues from those associated with electrical components. Every electrical component sold has an accompanying datasheet that lists the component’s function, features, operating voltages, physical details and more. These datasheets have become expected in the industry due to the need to understand a part’s behavior before purchase, as well as the liability issues that arise from a part’s misuse. The authors suggest that those offering datasets or APIs should provide a datasheet that addresses a set of standardized questions covering the following topics: The motivation for dataset creation The composition of the dataset The data collection process The preprocessing of the data How the dataset is distributed How the dataset is being maintained The legal and ethical considerations For the full breakdown of all of the questions check out the paper, it goes into a bunch of additional detail and provides an example datasheet for the UMass Labeled Faces in the Wild dataset. It’s a thorough and easy-to-use model that has the potential for big impact. Datasheets such as this will allow users to understand the strengths and limitations of the data that they’re using and guard against issues such as bias and overfitting. It can also be argued that simply having datasheets at all forces both dataset producers and consumers to think differently about their data sources and to understand that the data is not a de facto source of truth but rather a living, breathing resource that requires careful consideration and maintenance. Maybe it’s the electrical engineer in me, but I think this is a really interesting idea. What do you think? Do you think datasheets could help address the issues of bias and accountability in AI? Are there instances where you would have found this useful in your own work? Let me know via email or via the TWIML slack channel. Sign up for our Newsletter to receive this weekly to your inbox.
My travel comes in waves centered around the spring and fall conference seasons. A couple of weeks ago, in spite of there being no signs of a true springtime here in St. Louis, things shifted into high gear with me attending the Scaled ML conference at Stanford and Nvidia GTC over the course of a few days. Following me on Twitter is the best way to stay on top of the action as it happens, but for those who missed my live-tweeting, I thought I’d reflect a bit on Nvidia and GTC. (You’ll need to check out my #scaledmlconf tweets for my fleeting thoughts on that one.) In many ways, Nvidia is the beneficiary of having been in the right place at the right time with regards to AI. It just so happened that (a) a confluence of advances in computing, data, and algorithms led to explosive progress and interest in deep neural networks, and (b) that our current approach to training these depends pretty heavily on mathematical operations that Nvidia’s graphics cards happened to be really efficient at. That’s not to say that Nvidia hasn’t executed extremely well once the opportunity presented itself. To their credit, they recognized the trend early and invested heavily in it, before it really made sense for them to do so, besting the “innovator’s dilemma” that’s caused many a great (or formerly great) company to miss out. Nvidia has really excelled in developing software and ecosystems that take advantage of their hardware and are deeply tailored to the different domains in which it's being used. This was evidenced in full at GTC 2018, with the company rolling out a number of interesting new hardware, software, application, and ecosystem announcements for its deep learning customers.   A few of the announcements I found most interesting were: New DGX-2 deep learning supercomputer After announcing the doubling of the V100 GPU memory to 32GB, Nvidia unveiled the DGX-2, a deep-learning optimized server containing 16 V100s and a new high-performance interconnect called NVSwitch. The DGX-2delivers 2 petaFLOPS of compute power and offers significant cost and energy savings relative to traditional server architectures. For a challenging representative task like training a FAIRSeq neural machine translation (NMT) model, the DGX-2 completed the task in a day and a half, versus the previous generation DGX-1’s 15 days. Deep learning inference and TensorRT 4 Inference (using DL models, versus training them) was a big focus area for Nvidia CEO Jensen Huang. During his keynote, Jensen spoke to the rapid increase in complexity of AI models and offered a mnemonic for thinking about the needs of inference systems both in the datacenter and at the edge–PLASTER, for Programmability, Latency, Accuracy, Size, Throughput, Energy Efficiency, and Rate of Learning. To meet these needs, he announced the release of TensorRT 4, the latest version of its software for optimizing inference performance on Nvidia GPUs. The new version of TensorRT has been integrated with TensorFlow and also includes support for the ONNX deep learning interoperability framework, allowing it to be used with models developed with the PyTorch, Caffe2, MxNet, CNTK, and Chainer frameworks. The new version's performance was highlighted, including an 8x increase in TensorFlow performance when used with TensorRT 4 vs TensorFlow alone and 45x higher throughput vs. CPUs for certain network architectures. New Kubernetes support Kubernetes (K8s) is an open source platform for orchestrating workloads on public and private clouds. It came out of Google and is growing very rapidly. While the majority of Kubernetes deployments are focused on web application workloads, the software has been gaining popularity among deep learning users. (Check out my interviews with Matroid’s Reza Zadehand OpenAI’s Jonas Schneider for more.) To date, working with GPUs in Kubernetes has been pretty frustrating. According to the official K8s docs, “support for NVIDIA GPUs was added in v1.6 and has gone through multiple backwards incompatible iterations.” Yikes! Nvidia hopes its new GPU Device Plugin (confusingly referred to as “Kubernetes on GPUs” in Jensen’s keynote) will allow workloads to more easily target GPUs in a Kubernetes cluster. New applications: Project Clara and DRIVE Sim Combining its strengths in both graphics and deep learning, Nvidia shared a couple of interesting new applications it has developed. Project Clara is able to create rich cinematic renderings of medical imagery, allowing doctors to more easily diagnose medical conditions. Amazingly, it does this in the cloud using deep neural networks to enhance traditional images, without requiring updates to the three million imaging instruments currently installed at medical facilities. DRIVE Sim is a simulation platform for self-driving cars. There have been many efforts to train deep learning models for self-driving cars using simulation, including using commercial games like Grand Theft Auto. (In fact, the GTA publisher has shut several of these efforts down for copyright reasons). Training a learning algorithm on synthetic roads and cityscapes hasn’t been the big problem though. Rather, the challenge has been that models trained on synthetic roads haven’t generalized well to the real world. I spoke to Nvidia chief scientist Bill Dally about this and he says they’ve seen good generalization by incorporating a couple of techniques proven out in their research, namely by combining real and simulated data in the training set and by using domain adaptation techniques, including this one from NIPS 2017 based on coupled GANS. (See also the discussion around a related Apple paper presented at the very first TWIML Online meetup.) Impressively, for as much as Nvidia announced for the deep learning user, the conference and keynote also had a ton to offer their graphics, robotics and self-driving car users, as well as users from industries like healthcare, financial services, oil and gas, and others. Nvidia is not without challengers in the deep learning hardware space, as I’ve previously written, but the company seems to be doing all the right things. I’m already looking forward to next year’s GTC and seeing what the company is able to pull off in the next twelve months. Sign up for our Newsletter to receive this weekly to your inbox.
The Importance of Diversity in AI Hi there! Many have explored the link between corporate diversity and performance, notably including 2015 and 2017 studies by McKinsey. These studies, which looked at 366 and 1,000 public companies, respectively, across a range of countries and industries, found that gender, ethnic, and cultural diversity correlates to greater company profitability. Now, I’m not aware of any formal studies exploring the diversity of data science teams and the performance of their models, but I’d expect to see similar results, particularly in cases where models must be robust to societal and dataset biases. The importance of diversity in the machine learning and AI community came up in a recent conversation with Moustapha Cissé, a research scientist at Facebook AI Research. According to him, “you are what you eat, and right now we’re feeding our models junk food.” To address this, companies working in the field should strive for their teams to be as diverse as possible, “because only in this way will we notice the problems.” More broadly, Cissé emphasizes that “it’s important that we become more open and more diverse so that every [community] has the tools and techniques required to solve its own problems.” My interview with Moustapha kicked off a series of podcasts we debuted earlier this month called Black in AI. Many of the guests in this series are folks I met at the inaugural event by the same name at last year’s NIPS conference. Moustapha helped organize Black in AI along with Timnit Gebru, and others, to support and increase the representation of blacks in machine learning and artificial intelligence. Groups like Black in AI, Women in Machine Learning, and numerous others, several of which Moustapha mentioned in our conversation, are working to increase the diversity of the field as a whole, and for this are worthy of our support. Sign up for our Newsletter to receive this weekly to your inbox.
This is a recap of the TWIML Online Meetup, held on Jan 16, 2018, we focus on the paper “Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States” by Microsoft Research post doctoral researcher Timnit Gebru. We recap some of the major ML/AI Resolutions for 2018, community predictions for 2018, our favorite TWIML podcast episodes of 2017 and more. Thanks again to our presenter Timnit Gebru! Make sure you Like this video, and Subscribe to our channel below! https://youtu.be/pLdNx3SSxYo Full paper: "Using Deep Learning and Google Street View to Estimate the Demographic Makeup of Neighborhoods Across the United States" To register for the next meetup, visit twimlai.com/meetup Using Deep Learning to Estimate Demographic
Bits & Bytes Google’s new TensorFlow Lite targets mobile and embedded. Designed to be lightweight, cross-platform and fast, Tensorflow Lite is an architecture for converting TF models to run on resource-constrained hardware. Ultimately it is intended to replace the existing TensorFlow Mobile framework, but it’s currently much more limited in the number of operators it supports. MobileNet, Inception v3 image recognition models have been ported to TFL, along with a new model called Smart Reply for on-device conversational predictions a la Google Inbox. One scary note I saw in the coverage of the TFL launch was a reference to AioT or the “artificial intelligence of things.” Somehow I’d been lucky enough to have missed this one. Ugh. AWS announces ONNX import for MXNet. Back in September, Microsoft and Facebook announced ONNX, a platform-independent format for representing deep neural networks. With the announcement of ONNX-MXNet, Amazon adds their support for the format and makes it possible for developers to import ONNX models into Apache MXNet. The company promises increased operator coverage and support for exporting MXNet to ONNX in the future, and hopes to merge the ONNX-MXNet functionality into core MXNet APIs. Microsoft introduces Visual Studio Tools for AI beta. Microsoft has released Visual Studio Tools for AI, an extension for Visual Studio 2017 that helps developers and data scientists train deep learning models and embed them into applications. Visual Studio Tools for AI supports a variety of popular deep learning frameworks and also integrates to Azure for training against larger data sets. New Apple research on face detection and autonomous driving. Apple's latest Machine Learning Journal entry covers face detection and the iOS Vision framework. The article discusses Apple’s transition to deep learning for face detection in iCloud and iOS, and includes an interesting discussion of optimizing for on-device considerations. Apple researchers also published a paper on VoxelNet, a 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep neural network. South Korean industrial giants partner with Element AI to create $45m AI fund. SK Telecom, Hyundai Motor Company and Hanwha Group will create a joint fund to invest in AI startups, which will be advised by Yoshua Bengio’s Element AI. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & bytes End of an era. With the increasing traction of PyTorch and the resulting renewed vigor in the deep learning framework wars, Yoshua Bengio and the team at the University of Montreal’s MILA are throwing in the towel and terminating Theano development with the upcoming 1.0 release. Unifying reinforcement learning. Game platform developer Unity Technologies recently introduced Unity Machine Learning Agents, an SDK targeting both researchers and game developers for training intelligent agents using reinforcement learning and other techniques in game environments and virtual worlds. For background on this, check out TWIML Talk #24, where I spoke to Danny Lange, VP for ML & AI at Unity about the role of RL in gaming, and the huge opportunities available to game platforms. Training on a panoply of panoramic views. Interesting new dataset from 3D camera maker Matterport. The Matterport 3D dataset consists of 10,800 aligned 3D panoramic views (RGB + depth per pixel) from 194,400 RGB + depth images of 90 building-scale scenes, all hand-labeled with instance-level object segmentation. Trolling AI Twitter. AI luminary Pedro Domingos spent a few days trolling AI Twitter last week, yours truly included, on the topic of algorithmic ethics and discrimination. Personally, I get what he’s trying to say but think his comments amount to arguing semantics, and are fairly irresponsible for someone of his stature in the field. Does anyone out there know him and/or can get him on the podcast? Chip chat. Intel is touting a forthcoming “neuromorphic” chip design that takes inspiration from the human brain. NVIDIA announced the Deep Learning Accelerator, an open source hardware architecture for deep learning inference acceleration. CNBC is spreading rumors that Tesla has tapped AMD to collaborate on an AI chip. Meanwhile, Imagination Technologies will offer an AI chip design that chip makers can embed within their own designs, focused on image and signal processing. Show me the money. Baidu announced a $1.5 billion fund for investing in self-driving car startups. Video object detection startup Matroid raises a $10 million series A. Catch my conversation with founder Reza Zadeh on the pod. TalkIQ raises $14 million to help enterprises spy on analyze recorded voice conversations. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & Bytes Wats-on, wats-off. IBM has been making waves this week with the news that they’ve committed $240 million to establish a joint AI lab with MIT. IBM Watson has taken a number of PR hits recently, and unfortunately for IBM the  MIT announcement comes just days after a story chronicling Watson’s failure to live up to its promise in the oncology space. AI world war of words. A couple of weeks ago I highlighted a few articles mentioning the intensifying AI “arms race” between the US and China. Well, not to be left out, Russian President Vladimir Putin has joined the war of words, noting that “Whoever becomes the leader in [AI] will become the ruler of the world.” AI’s not a game, except when it is. Games have had a long history as a proving ground for AI. Recently deep learning researchers have continued this tradition, focusing on the use of video games for training, testing and demonstrating their models. The survey of research presented in the new paper Deep Learning for Video Game Playing is great way to learn more about what’s happening in this field. Pristine platform. A great complement to this week’s podcast is the recent post on Uber’s engineering blog on their Michelangelo platform. In it they describe the environment they’ve created to host and manage their production machine learning models. AI + bitcoin for good. Interesting use case by a Berkeley PhD student explores the use of machine learning to help map cybercriminal markets and thwart human trafficking [PDF]. Blue river runs green. John Deere announced the $305 million acquisition of Blue River Technology, a company perhaps best known for its LettuceBot system which uses cameras mounted on a tractor rig and deep learning to identify and target weeds in a bed of lettuce for extermination via precision doses of robotically-applied herbicide. Funding the drone army. Autonomous drone maker Airobotics announced the close of a $33 million series C round to fund its expansion into the defense industry. Where’s that UN resolution when you need it? Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & Bytes AI getting stylish. Amazon’s Lab126 has developed (using GANs) an algorithm that learns fashion styles from images and can generate new items in similar styles from scratch. Zalando has released Fashion-MNIST, a dataset consisting of 70k 28x28 images of clothing and accessories designed to be a drop-in replacement for the ubiquitous MNIST database of handwritten digits. Ride the Brainwave. Microsoft announced a deep learning hardware platform called Brainwave. Brainwave includes a neural network processing unit (DPU) based on FPGAs, an architecture for building distributed systems around the DPUs, and the requisite software toolchain for using the system, which supports Microsoft’s CNTK as well at TensorFlow. Killer robot ban. Elon Musk and a group of researchers and AI experts from 26 countries have called for a UN ban on armed autonomous robots. It’s not the first time, but sounds like a good idea to me. Reinforcing RL Didn’t get enough RL in our Industrial AI series? Check out this newly published paper which reviews research in the field: A Brief Survey of Deep Reinforcement Learning. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & Bytes Researchers from the University of Pennsylvania’s Institute for Biomedical Informatics benchmark 13 state of the art ML algorithms on 165 publicly available classification problems, and present the results in Data-Driven Advice for Applying Machine Learning to Bioinformatics Problems. In the end they learn what every Kaggler knows… Gradient Boosted Decision Trees work really well. AI gets really interesting when we can write software that creates better models faster for us, which is why Neural Optimizer Search with Reinforcement Learning [PDF], a Google Brain paper presented at the recent ICML conference, is so interesting. The team used reinforcement learning to train an RNN to search for better ways to train small convolutional networks. There’s been a bunch of chatter in the press of late about who’s winning the “AI war.” See America Can't Afford to Lose the Artificial Intelligence War, How Baidu Will Win China’s Ai Race—And, Maybe, The World’s, and China’s Plan for World Domination in AI Isn’t So Crazy After All for more. As I’ve said before, proficiency in AI will bode well for individuals and companies; the same holds true for nations. Sony’s getting in on the deep learning framework action with its Neural Network Console, a tool for training, evaluating and designing neural nets. I’ve not played with it, but it’s got an interesting GUI for designing deep neural networks. A follow-on to the Facebook AI language-creating-bots kerfuffle: Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog. Do yourself a favor and follow @KaggleDatasets on Twitter. You don’t want to miss out on goodies like this 360k favicon dataset. A new post on Apple’s ML research blog talks about how they’ve used Deep Learning to give Siri a more natural, smoother voice in iOS 10 and 11: Deep Learning for Siri’s Voice: On-device Deep Mixture Density Networks for Hybrid Unit Selection Synthesis. I’ve mentioned Fast.ai’s Deep Learning for Coders online course before, and they also offer an in-person version. Last week they announced the availability of a diversity scholarship to help address the diversity crisis in AI. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & Bytes AI luminary Andrew Ng launched a successor to his famous Stanford online machine learning course. The new course—actually a sequence of courses—is focused on deep learning and hosted over on Coursera. I’m planning to work through it. Let me know if you are too! According to an SEC filing, Ng is also planning to launch a $150 million VC fund called AI Fund, L.P. It’ll be interesting to see the deals he funds via this fund. An OpenAI bot trained using reinforcement learning beats professional Dota 2 video game players in 1v1 matches. There’s some discussion in the community as to whether the bot’s use of hardcoded rules and the Dota bot API take away from the victory. Also good discussion on the significance of the feat, .e.g. as compared to AlphaGo. Google presented a paper at the recent KDD conference on Vizier, a system for hyperparameter optimization, i.e. model tuning. A nice piece in the NY Times highlights the work of researchers from OpenAI, Berkeley and elsewhere on AI safety. To dig deeper into the topic, consider the materials cited in 80,000 hours’ AI safety syllabus. MIT CSAIL researchers developed Pensieve, a system that uses reinforcement learning to optimize streaming video delivery and reduce buffering. Pensieve will be presented at next week’s SIGCOMM conference. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & Bytes Microsoft made headlines last week when it formally added AI to its corporate vision statement, dropping references to “mobile-first” "Our strategic vision is to compete and grow by building best-in-class platforms and productivity services for an intelligent cloud and an intelligent edge infused with AI." BuzzFeed trains a random forest ML algorithm to identify instances of spy planes in flight tracking data in “BuzzFeed News Trained A Computer To Search For Hidden Spy Planes. This Is What We Found.” Very interesting read, despite the characteristically clickbaity title. The FastText team at Facebook AI Research release a new set of pre-trained word vectors trained on Wikipedia, news and web crawl data. Facebook also announced that they were transitioning entirely to neural networks for language translation, from a phrase-based statistical system. The new system uses sequence-to-sequence LSTMs with attention. Stay tuned for a deep dive into LSTMs coming later this month! OpenAI released RL-Teacher, an open-source interface for training reinforcement learning based AIs via occasional human feedback rather than mathematically expressed reward functions. As a New Yorker, I consider sarcasm to be somewhat of an art form. A group of researchers from MIT Media Lab and elsewhere published a paper on DeepMoji [PDF], a deep learning model that can detect sarcasm, sentiment, and emotion from emoji. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & Bytes A Facebook research program into multi-agent negotiations resulted in bots creating their own streamlined ways to communicate. The truth behind the ballyhoo about those language-inventing Facebook bots. Meanwhile, Google DeepMind researchers are working on developing agents that imagine the future and plan their tasks in two new papers focused on “imagination-based planning.” If all this freaks you out, researchers from Cornell, Univ. of Montreal and Univ. of Louisville published “Guidelines for Artificial Intelligence Containment,” in which they propose guidelines for helping AI researchers develop reliable sandboxing software for intelligent programs. In January of this year, Google Cloud hired Fei-Fei Li, director of the Stanford AI & Vision labs, as its chief scientist for AI & ML. Dr. Li is perhaps best known for her role in creating the ImageNet dataset upon which much of the recent progress in object recognition has been based. A couple of months later, in March, at their Next conference, Google Cloud announced its acquisition of Kaggle. With this context, it’s not too surprising that Kaggle just announced that it’ll be hosting all 3 ImageNet challenges for the first time ever. Fast.ai announced the availability of Part 2 of their highly regarded (and FREE) online course, Deep Learning for Coders. Topics include TensorFlow and style transfer, generative models and GANs, memory networks, attentional models and more. Things continue to heat up in the AI startup ecosystem: Google announced Launchpad Studio, a new accelerator program for AI & ML startups, while autonomous vehicle startup Momenta closed a $46M series B financing, and AI-for-robots company Vicarious closed a $50M series C. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & Bytes Apple launched their Machine Learning Journal site to publish their research. The first article is on a technique for improving the realism of fake images using a technique similar to GANs. Google updated their 9 million large Open Images dataset to add some 2 million bounding boxes and several million more labels. OpenAI published research on a new approach to reinforcement learning called Proximal Policy Optimization (PPO). PPO aims to outperform the current state-of-the-art methods while being simpler to implement. In a paper published in Neuron, Google DeepMind founder Demis Hassabis and co-authors argue that understanding human intelligence is the key to creating artificial intelligence. An interesting discussion of some ways technical debt is accumulatedin machine learning projects. Harvard Business Review features a nice profile of Facebook’s Applied Machine Learning group. The IEEE Computer Vision and Pattern Recognition (CVPR) conference just ended. Best Paper winners were Densely Connected Convolutional Networks and Learning from Simulated and Unsupervised Images through Adversarial Training. Perhaps someone reading this would like to present one of these papers at an upcoming meetup? Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
A number of you have expressed interest in participating in a TWIML paper reading group and I'm excited to share the details of the inaugural TWIML & AI meetup! The focus of the meetup will be discussing academic papers and other texts in the machine learning and AI space, though I hope we get to see some implementation demos from time to time as well. Our first presenter will be Joshua Manela, who has also stepped up along with a couple of other community members (thanks Duncan, Joshua and Nikola!) to help organize the meetup in general. Joshua will be presenting one of the Best Paper award winners from this year's CVPR conference! Topic: Learning from Simulated and Unsupervised Images through Adversarial Training, by authors from Apple. (See also Improving the Realism of Synthetic Images. Wednesday, August 16th, 2017 11:00 AM US Pacific Time / 2:00 PM Eastern Time Check HERE for your timezone.