We could not locate the page you were looking for.

Below we have generated a list of search results based on the page you were trying to reach.

404 Error
My research lies at the intersection of machine learning and optimization, and targets applications in computer vision and signal processing. I work at the boundary between theory and practice, leveraging mathematical foundations, complex models, and efficient hardware to build practical, high-performance systems. I design optimization methods for a wide range of platforms ranging from powerful cluster/cloud computing environments to resource limited integrated circuits and FPGAs. Before joining the faculty at Maryland, I completed my PhD in Mathematics at UCLA, and was a research scientist at Rice University and Stanford University. I have been the recipient of several awards, including SIAM’s DiPrima Prize, a DARPA Young Faculty Award, and a Sloan Fellowship.
Nick has been a data scientist since the early 2000s. After obtaining an undergraduate degree in geology at Cambridge University in England (2000), he completed Masters (2001) and PhD (2004) degrees in Astronomy at the University of Sussex, then moved to North America, completing postdoctoral positions in Astronomy at the University of Illinois at Urbana-Champaign (2004-9, joint with the National Center for Supercomputing Applications), and the Herzberg Institute of Astrophysics in Victoria, BC, Canada (2009-2013). He joined Skytree, a startup company specializing in machine learning, in 2012,and in 2017 the Skytree technology and team was acquired by infosys. Machine learning has been part of his work since 2000, first applying it to large astronomical datasets, followed by wide ranges of application as a generalist data scientist at Skytree, Infosys, Oracle, and now Dotscience.
Genevera Allen is an Associate Professor of Electrical and Computer Engineering, Statistics, and Computer Science at Rice University and an investigator at the Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital and Baylor College of Medicine. She is also the Founder and Faculty Director of the Center for Transforming Data to Knowledge, informally called the Rice D2K Lab. Dr. Allen's research focuses on developing statistical machine learning tools to help scientists make reproducible data-driven discoveries. Her work lies in the areas of interpretable machine learning, optimization, data integration, modern multivariate analysis, and graphical models with applications in neuroscience and bioinformatics. Selected Awards and Honors: - Elected Member, International Statistics Institute, 2021. - Charles W. Duncan Achievement Award for Outstanding Faculty, Rice University, 2021. - Curriculum Innovation Award, George R. Brown School of Engineering, Rice University, 2020. - Research and Teaching Excellence Award, George R. Brown School of Engineering, Rice University, 2017. - NSF Career Award, 2016. - Forbes 30 under 30: Science & Healthcare, 2014. Publication List: https://scholar.google.com/citations?user=gIUd12QAAAAJ&hl=en
I studied physics in Munich at the University of Technology, Munich, at the Universita degli Studi di Pavia and at AT&T Research in Holmdel. During this time I was at the Maximilianeum München and the Collegio Ghislieri in Pavia. In 1996 I received the Master degree at the University of Technology, Munich and in 1998 the Doctoral Degree in computer science at the University of Technology Berlin. Until 1999 I was a researcher at the IDA Group of the GMD Institute for Software Engineering and Computer Architecture in Berlin (now part of the Fraunhofer Geselschaft). After that, I worked as a Researcher and Group Leader at the Research School for Information Sciences and Engineering of the Australian National University. From 2004 onwards I worked as a Senior Principal Researcher and Program Leader at the Statistical Machine Learning Program at NICTA. From 2008 to 2012 I worked at Yahoo Research. In spring of 2012 I moved to Google Research to spend a wonderful year in Mountain View and I continued working there until the end of 2014. From 2013-2017 I was professor at Carnegie Mellon University. I co-founded Marianas Labs in early 2015. In July 2016 I moved to Amazon Web Services to help build AI and Machine Learning tools for everyone.
There are few things I love more than cuddling up with an exciting new book. There are always more things I want to learn than time I have in the day, and I think books are such a fun, long-form way of engaging (one where I won’t be tempted to check Twitter partway through). This book roundup is a selection from the last few years of TWIML guests, counting only the ones related to ML/AI published in the past 10 years. We hope that some of their insights are useful to you! If you liked their book or want to hear more about them before taking the leap into longform writing, check out the accompanying podcast episode (linked on the guest’s name). (Note: These links are affiliate links, which means that ordering through them helps support our show!) Adversarial ML Generative Adversarial Learning: Architectures and Applications (2022), Jürgen Schmidhuber AI Ethics Sex, Race, and Robots: How to Be Human in the Age of AI (2019), Ayanna Howard Ethics and Data Science (2018), Hilary Mason AI Sci-Fi AI 2041: Ten Visions for Our Future (2021), Kai-Fu Lee AI Analysis AI Superpowers: China, Silicon Valley, And The New World Order (2018), Kai-Fu Lee Rebooting AI: Building Artificial Intelligence We Can Trust (2019), Gary Marcus Artificial Unintelligence: How Computers Misunderstand the World (The MIT Press) (2019), Meredith Broussard Complexity: A Guided Tour (2011), Melanie Mitchell Artificial Intelligence: A Guide for Thinking Humans (2019), Melanie Mitchell Career Insights My Journey into AI (2018), Kai-Fu Lee Build a Career in Data Science (2020), Jacqueline Nolis Computational Neuroscience The Computational Brain (2016), Terrence Sejnowski Computer Vision Large-Scale Visual Geo-Localization (Advances in Computer Vision and Pattern Recognition) (2016), Amir Zamir Image Understanding using Sparse Representations (2014), Pavan Turaga Visual Attributes (Advances in Computer Vision and Pattern Recognition) (2017), Devi Parikh Crowdsourcing in Computer Vision (Foundations and Trends(r) in Computer Graphics and Vision) (2016), Adriana Kovashka Riemannian Computing in Computer Vision (2015), Pavan Turaga Databases Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases (2021), Xin Luna Dong Big Data Integration (Synthesis Lectures on Data Management) (2015), Xin Luna Dong Deep Learning The Deep Learning Revolution (2016), Terrence Sejnowski Dive into Deep Learning (2021), Zachary Lipton Introduction to Machine Learning A Course in Machine Learning (2020), Hal Daume III Approaching (Almost) Any Machine Learning Problem (2020), Abhishek Thakur Building Machine Learning Powered Applications: Going from Idea to Product (2020), Emmanuel Ameisen ML Organization Data Driven (2015), Hilary Mason The AI Organization: Learn from Real Companies and Microsoft’s Journey How to Redefine Your Organization with AI (2019), David Carmona MLOps Effective Data Science Infrastructure: How to make data scientists productive (2022), Ville Tuulos Model Specifics An Introduction to Variational Autoencoders (Foundations and Trends(r) in Machine Learning) (2019), Max Welling NLP Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics (2013), Emily M. Bender Robotics What to Expect When You’re Expecting Robots (2021), Julie Shah The New Breed: What Our History with Animals Reveals about Our Future with Robots (2021), Kate Darling Software How To Kernel-based Approximation Methods Using Matlab (2015), Michael McCourt
Who is Edward Raff? Edward Raff works as a head scientist at the consulting firm Booz Allen Hamilton (BAH). As Edward describes it, their business model is “renting out people’s brains” to business and government organizations. Edward sees BAH research as both a way to establish expertise in their field and a way to train staff in how to solve interesting problems. Introduction to Malware and Machine Learning The interesting thing about using machine learning to detect malware is that it’s a totally different problem than other forms of ML. Edward describes other ML applications, like computer vision or language processing, as “things near each other are related,” meaning that a model is trained to recognize similar things and distinguish things that are different. Malware is a totally different ball game. With malware, an algorithm isn’t learning to distinguish pixels or words, it’s learning to recognize code, which can come in a number of different forms. In addition, the scale of data used to train malware is way larger than other ML applications. To give you an idea of the scale at play, a single data point a model is trained on can be an entire application that is 30MB large. Since each case is so unique and the data is so large and varied, cybersecurity is still figuring out best practices when it comes to data collection and data sets. “The malware author, they don’t need to abide by the rules. That’s part of the whole point, is they’re trying to break the rules. If there’s a spec that says, ‘Oh, you don’t set this flag in the executable because it will behave poorly.’ Well, if that helps the malware author, they’re going to do it.” Fighting Malware with Adversarial Machine Learning Recent work in the malware detection field has focused on the use of machine learning models to detect malicious code. ML has the advantage of being more robust than the static rulesets traditionally used to detect malware, but it is still susceptible to the game of cat-and-mouse that plagues the field, wherein bad actors aren’t static, but rather continually modify their code to try to evade detection. In addition, machine learning approaches open up a new vector of evasion, namely adversarial attacks against the models themselves, in which noise or other patterns are created and injected into the input data with the goal of causing the model to make an incorrect classification. Generally, the best defense against adversarial attacks is a technique called adversarial training. Since 2017, this has been established in the literature as the most effective way to train models to prevent adversarial attacks. A common approach to evaluating the effectiveness of adversarial attacks in cybersecurity research papers is to assume that an attacker has access to the same data and classes as the software they’re aiming to attack. Edward takes issue with this approach because in practice there’s no reason malware authors would have access to the same training data and labels as the “victim” code. In Edward’s most recent paper, “Adversarial Transfer Attacks With Unknown Data and Class Overlap,” he and his team decided to see if there was a difference in attack outcomes with varying degrees of data and class overlap. Overall, the team found that less class overlap and less data overlap decreased the attack success rate. However, this wasn’t a perfectly linear distribution — the results varied as the percentage of data changed, so this finding can’t necessarily be generalized. Figure 4 from “Adversarial Transfer Attacks With Unknown Data and Class Overlap” Real-World Factors of Malware One of the downsides of these experiments is that they’re very expensive models to run. Edward is ideating on ways to optimize the process, like trying to classify a region instead of a single data point. When consulting for BAH clients, Edward’s job is to narrow down the nitty-gritty malware author specifics in order to improve his model’s performance. The first step is identifying specific models that could be attacked, like a fraud detection model at a bank, and focusing only on protecting those models. This then allows Edward to build specific models that best fit the problem at hand. “The way that I help our clients approach these kinds of problems is to really focus on that scope and narrow it down. Where do we actually need to do this? Let’s not panic and think that everything’s under attack all the time because that’s not realistic and you’re going to give yourself a heart attack.” Graph Neural Networks and Adversarial Malware Edward is especially excited about applying graph neural networks to cybersecurity problems. While they’re not yet fast or scalable enough to handle the scale of malware use cases, Edward sees potential in their ability to connect executable features on nodes and edges of the graph that could be meaningful later. To learn more about malware and adversarial ML, you can listen to the full episode here!
Sam Charrington: [00:00:00] Welcome to the TWIML AI podcast. I'm your host, Sam Charrington. Hey, what's up, everyone. Before we jump into today's interview, I'd like to give a huge thanks to our friends at Microsoft for their continued support of the podcast. Microsoft's mission is to empower every single person on the planet to achieve more, to inspire customers to reimagine their businesses and the world. Learn more at Microsoft.com/AI and Microsoft.com/innovation. And now, onto the show. All right, everyone. I am here with David Carmona. David is the general manager of artificial intelligence and innovation at Microsoft. David, welcome to the TWIML AI podcast. David Carmona: [00:01:01] Thank you, Sam. Pleasure to be here with you. Sam Charrington: [00:01:04] It is great to have you on the show. And I'm looking forward to digging into our conversation, which will focus on AI at scale and large scale language models, and a bunch of really interesting things you're doing there. Before we jump into the topic, though, I'd love to have you share a little bit about your background and how you came to work on all this cool stuff. David Carmona: [00:01:25] Yeah. Well, I've been in Microsoft for almost 20 years, 19 and a half. Sam Charrington: [00:01:30] Wow. David Carmona: [00:01:30] So, almost getting to that magical [laughs], magical moment. And it's funny because my beginning with Microsoft, I was [inaudible 00:01:37] to Microsoft. That was 20 years ago. So, that was the big Windows moment. Right? But actually, I didn't come to Microsoft because of Windows. I came to Microsoft because of, … At that time, my favorite product, which was Visual Studio. So, I was a developer. I still am a developer. I will always be a developer no matter what I am. Sam Charrington: [00:01:57] [laughs]. David Carmona: [00:01:58] And for me, working in Visual Studio has been like my entire career. So, [inaudible 00:02:04] I started with AI and, and VR probably way too early [laughs]. That didn't end well. So, I ended in traditional development. And I had a ton of fun with that. And I, when I move … I'm originally from Spain. When I moved here to the US [inaudible 00:02:17], I worked in, in, in Visual Studio. So, I ended managing the business for Visual Studio and all our tools like .NET and, and all of that. It was a super fun time because it was that big transition in Microsoft to open development. So, I was lucky to do things like launching TypeScript. Right? Or- Sam Charrington: [00:02:36] Oh, wow. David Carmona: [00:02:36] … open-sourcing .NET or making it cross-platform, or releasing Visual Studio code. Right? So, super fun stuff. But then like five years ago, this AI thing started to become super real. So, [laughs] I was, I was offered to lead a new team in Microsoft, focused on the business, on creating a new business for AI. And I, I didn't think about it twice. So, yeah, that's where I am. So, it's interesting … So, as you can see, my career is always like, between technology and businesses. I think … I, I mean, knock on wood, but I think I'm in, in that great balance right now [laughs]. So, I have both. I'm super fortunate to have both because I work, connecting with Microsoft research and, and the entire organization of technology and research in, Microsoft. My goal, my team's goal is really to connect that with the business. So, we work on … We define it as themes, like bigger themes of innovation in Microsoft. And then we connect those themes to actual real products and technologies that we can take to market. it's super cool. And one of those things … We have many, but one of them … I think like, probably the start of the themes is, is AI at scale. Sam Charrington: [00:03:46] Okay. And so is the role primarily focused on taking innovations that are happening in research to existing Microsoft products? Or is it more focused on creating new business opportunities? Or is there some balance between the two? David Carmona: [00:04:01] Yeah. It's a balance. So, we have … The way that we work in Microsoft on our framework for innovation is based on Horizon. So, we have … We refer to them as the three [inaudible 00:04:10] Horizon. Right? So, we have Horizon 1, two, and three. Three, Horizon 3 are the like, the moonshots, right? Like, longer-term new business creation, new category creation for Microsoft. A lot of that is, driven by curiosity, in most cases, in research. So, we leave a lot of room for researchers to work on those themes. But then we go all the way to Horizon 2, which are things that are really about opening new opportunities or creating new opportunities for existing products. And you can go to Horizon 1 even, which is extending existing products. Right? So, making them better. So, we work in that, in that balance, between the three. Sam Charrington: [00:04:52] Nice. And so you mentioned AI at scale as being one of your big focus areas. What exactly does that mean at Microsoft? David Carmona: [00:05:00] Yeah. So, AI at scale, I mean, we, we named that as a new category. So, it's not that it's a product or anything like that. So, it's how we refer to what we believe is a huge change in the way that we are going to see people developing AI. And it's driven by m- many different things, many different trends and technology breakthroughs. But I think the most important one is this concept of massive models and, and what they mean. Right? So, this, this ability to create now, like, this huge [laughs], massive models with billions of, of parameters. And beyond the technical achievement, the reality is that those massive models are opening new opportunities that go beyond the technology and get into the business. Right? So, we can discuss it today. So, [inaudible 00:05:47] … So, we can spend a lot of time on the technology behind it. And then- Sam Charrington: [00:05:47] Mm-hmm [affirmative]. David Carmona: [00:05:47] … we can, we can focus a little bit on, "Hey, but what does it really mean?" So, how is this going to change the way that any company can develop AI? Right? And, and [inaudible 00:05:59] it's really interesting. And then there's a whole ecosystem around this concept like, that, that you need to, for example, train these models, you need an AI supercomputer. So, that's another piece of the puzzle, right, for AI at scale. Sam Charrington: [00:06:14] So, we talk a lot about the increasing size of models and, you know, particularly in the context of NLP and language models. But help us contextualize that. You know, we throw around, you know, millions of parameters and, you know, hundreds of layers, and things like that. How is it shaking out? Or how do you think of this progression towards larger-size models? David Carmona: [00:06:41] Yeah. I think in, in a sense, you probably remember [laughs] [inaudible 00:06:45] ImageNet moment for, [laughs]- Sam Charrington: [00:06:46] [laughs]. David Carmona: [00:06:47] … for [inaudible 00:06:48] learning. Right? So eh- Sam Charrington: [00:06:49] Uh-huh [affirmative]. David Carmona: [00:06:49] That was, … I mean, [inaudible 00:06:51] many people referring to this moment, like the ImageNet moment for NLP. Right? So, because we get to a point that there's something that allows us to increase the size of the model. So, we go for it. And then we see, "Hey, wait a second. This is getting better. So, the more parameters that I add, the better that this is getting." Right? So, that was the moment in ImageNet with ResNet, for example. Right? That we added so many layers, and, "Hey, this, this image classifier is, is working so much better." So, we are kind of in the same place, but at a totally different scale, right, or order of magnitude. Right? For example, that model, the ResNet model for ImageNet, I think had like 60 million parameters. I mean, a completely different domain. That was computer vision. Now, we're talking about billions of parameters. And, and, and when we see progression, it's being like, very [laughs], very quick. So, [crosstalk 00:07:44]- Sam Charrington: [00:07:46] Mm-hmm [affirmative]. David Carmona: [00:07:46] I don't know. GPT-2. So, the first version was like 100 million parameters. Then, I think BERT was like 300. Then you have Turing NLR. I think it, at that time, was like 1.2 billion. Then you have GPT-2, 1.5. Then you have Turing NLG. That was 17 billion parameters. That was last year [laughs]. We're not talking months ago. That, … We're not talking about, about years ago. And then we had just, just a couple of months after that, GPT-3 with 175 billion [laughs] parameters. Right? So- Sam Charrington: [00:08:18] Yeah. David Carmona: [00:08:18] Every step is 10 times [laughs] [inaudible 00:08:21]. It's a new order of magnitude [crosstalk 00:08:22]- Sam Charrington: [00:08:22] Mm-hmm [affirmative]. David Carmona: [00:08:22] … which is super impressive [laughs]. Sam Charrington: [00:08:24] So, we've kind of transitioned from … In the domain of Vision, you know, we would always talk about the number of layers as an indication of the size and complexity of the model. And now, when we talk about these language models, we tend to talk about parameters. What is that? And how does that tie to the architecture of these models? David Carmona: [00:08:45] Yeah. I mean, behind … It's not that we didn't want to build these massive models before. It's that we couldn't [laughs]. That's the reality. Sam Charrington: [00:08:52] Mm-hmm [affirmative]. David Carmona: [00:08:52] And I think the big breakthrough to really enable these, these sizes of the model is the transformer architecture. And yeah, definitely a lot of say about that. But, yeah, the transformer architecture, it has … I mean, it's also based in layers. In this case, they are symmetric. So, it scales very well because it always has the same number of inputs and outputs. So, you can stack up all the layers. And, and it was a huge change because that broadened the blocker that we had before with scaling these NLP models, is that we were using techniques as, as you know, as recurrent neural networks. Right? Like, LSTM and things like those. And those things are great because it allows you to connect, for example, in a text, the words between words. You can have some kind of memory. So, a word right now can be impacted by words in the text before. Right? And, and you keep that memory. The problem is that the way that we were doing that was very sequential. So, and I mean, by definition, a recurrent neural network taking the previous step as an input. So, you need to finish that step to go to the next one. So, that impacted on the scalability of the models. So, I think with the transformer architecture, we kind of broke that ceiling because now, suddenly, we don't have an architecture that is [inaudible 00:10:05]. So now, in this case, it's all in parallel. We take the, all the inputs in parallel and with some techniques, in particular, … I think the most important one [inaudible 00:10:16] I would highlight too. But definitely, for that work, two things have to happen. One, it's the concept of the positional embedding, so how every word needs to get an input in the, in the model, the position somehow, a flag of an indication of where that word is because that's [laughs], of course, important [laughs]. It's very important- Sam Charrington: [00:10:36] Mm-hmm [affirmative]. David Carmona: [00:10:37] … Where a word is in a sentence to understand the sentence. But then the second thing is this concept of attention or, in this case, self attention, which is a way to kind of replicate that concept of connecting or changing the meaning of words, depending on the words that were happening before, or even in the case of bidirectional [inaudible 00:10:56] words are happening after that. Right? And that's, that's a whole new construct applied to NLP that is proving to be, not only super scalable, but even, performing even better [inaudible 00:11:08] the traditional approach to NLP. Sam Charrington: [00:10:43] Hmm. And so how should we think about how attention works in these kinds of models? David Carmona: [00:10:43] So, I, I, I mean, it's a very simplistic view, but I like to think of it … Because attention is not new. So, we've been using attention- Sam Charrington: [00:10:44] Mm-hmm [affirmative]. David Carmona: [00:11:23] … in, in others … Even in other domains. Right? Like, vision or i- image generation, or … I mean, the most simple example that I use all the time is movie recommendation. Right? So, how do you know if, if a user is gonna like a movie or not? So, the way that you do that is that you take a vector defining the movie in, you know, in any dimensional space. And then you take another vector defining the taste of the user. And then you multiply those vectors, right, to get the distance, the, like, the cosine distance or similarity between those two vectors. And that's an indication how much the user will like the movie. That's that's attention, but in the case, of two different entities. Right? My taste and the movie. In this case, self attention is like doing similar, but with a sentence with itself or with a text with itself. Right? So, but in this case, the w- the attention that we want to measure is the connection between the words. So, how one word is related or connected to the rest of the words. And at the end, you're gonna have like, a heat map, right, so, where every word is connected in some manner with other words. So, if you're saying, "The kid hit the ball, and he was happy." So, he will be super connected with the boy. Right? So, I mean, super simple because at the end, you have multi [inaudible 00:12:42] attention blocks. And, and then you have all these different layers. It's like trying to understand [inaudible 00:12:49] networks. After three layers, you're lost [laughs]. You are completely lost on [crosstalk 00:12:53]. Sam Charrington: [00:12:53] [laughs]. David Carmona: [00:12:53] But I mean, that's the core principle of it. Sam Charrington: [00:12:56] Mm-hmm [affirmative]. Part of what's interesting here is that, you know, we've transitioned from an approach to NPL that was, like you mentioned … Prior to capturing positionality, you know, we'd take a bag of words of things that was at document level, didn't capture where those words were, didn't really do a good job of capturing the relationships, but we're just looking at the statistical properties of a document or sentence or- David Carmona: [00:13:22] Yeah. Sam Charrington: [00:13:23] … corpus to now looking at the relationships between all of these entities that make up language. Is that part of the power of this [crosstalk 00:13:31]? David Carmona: [00:13:32] Yeah. Yeah. E- exactly. I would say that and then the concept of, of training these models with self supervised algorithms. Right? So- Sam Charrington: [00:13:42] Mm-hmm [affirmative]. David Carmona: [00:13:42] [inaudible 00:13:43] supervised training. I think that's the other thing that, that … It was the explosion in all these models, is how now, … Because this scales amazingly well, now, you can afford training these things with huge amounts of data. Like, for example, the entire internet [inaudible 00:14:00] kind of. Right? Which is kind of what we're doing with this model. So, we take the text on the internet. And then depending on the model we can go in, in a little more detail in there if it's a [inaudible 00:14:10] model or representation model. With smart techniques, you take that. You take … You mask that text, so the, so the model can try to guess either the missing words or the words that are happening after a given text. And by training that with that input, that you are almost not touching at all. Right? So, it's all self supervised, [inaudible 00:14:31] and, and all of that. The model can actually learn very complex concepts and relationships. Sam Charrington: [00:14:37] Mm-hmm [affirmative]. You mentioned different types of models. Elaborate on that a bit. David Carmona: [00:14:41] Yeah. So, I think, the way that … And, and we can talk that more about because at the end, these same concepts can apply beyond NLP. But if we focus just on NLP, they are the main families of models. One is that I think people are super excited also because of Turing NLG and because of GTP-3. Those models are generation models. So, they are a natural language generation model, so NLG. And in that case, what … The way that that model is trained, they are called ultra aggressive models because you train the model with the, a lot of text. But then you train it to guess what is gonna happen, what text goes after a particular text. Right? So, they generate … They are super good, generating text, like guessing the end of a sentence or guessing an entire document, or guessing how a movie will, will end, or whatever [laughs] we want to, to guess or [inaudible 00:15:37] text, things, things like those. And that's one big family of models. You have em … Again, like, GTP-3 is an example of that. Turing NLG is an example of that. And then you have another family, which is more about representation, so natural language representation models. And the goal of those is more like, representing the text. So, in that case, the architecture that is, that is used, instead of trying to guess … Or the way that it's trained. Instead of trying to guess what's next, what we do is that you mask some words in the text. And then the model will try to guess it. And they are called bidirectional because in that case, not only they look at what happened before a certain moment, but also after that. So, they will look at the words before and after a particular word to understand the context there. Right? So, those are really good to map like, text to representation, then I fine tune to do whatever I want. Right? So, from super basic sentiment analysis to question answering, or whatever I want to fine tune the model. So, those are like, the two big blocks. Then I like to go a little bit deeper 'cause for each of them, they are two other families that I think are very relevant to understand, which is how, … So, then there's more than one language in the world [laughs]. Right? So- Sam Charrington: [00:16:58] [crosstalk 00:16:59]. David Carmona: [00:16:59] You need to address that. Right? So, in particular, where you are creating real products. So, we are using these models in, in Office, for example. Office is working [inaudible 00:17:07], I feel like, 100 languages. So, imagine doing this for every language would be very [crosstalk 00:17:13]. Sam Charrington: [00:17:13] Mm-hmm [affirmative]. David Carmona: [00:17:13] And that would be the traditional approach of, of doing this. So, we, … And, and Microsoft has been a big believer on the need of doing this thing in an universal way. So, that creates a new family of models that are universal models, right, universal language models. And in the case of Turing, for example, we have both. We have a regular model. And then we have the universal language representation, ULR, so T, Turing ULR, universal language representation. And that is super powerful 'cause what allows us, for example, in, in Microsoft, is to implement features in Word using this, like, … I don't know. Em, semantic search. We don't need to train that feature or that model for every language. We just need to fine tune it for one language. And then you have the feature for free in 100 languages. Right? Sam Charrington: [00:18:03] [crosstalk 00:18:04]. David Carmona: [00:18:03] Which is super cool. So, very, very recommend them to use those models for that. Th- this was, by the way, for people who want to go deeper. There's a paper that I like a lot is [inaudible 00:18:14] 2017 where it explains this, this concept. And, the example that it uses is how you learn math. Right? So, you look at … Well, not me. I wouldn't consider me bilingual. I speak Spanish and a little bit of English, but [laughs] my kids are truly bilingual. And when they learn math, they don't need to learn that two plus is equal four in English, but then [Spanish 00:18:39] in Spanish. Right? So, they just need to learn math once. And then- Sam Charrington: [00:18:43] [crosstalk 00:18:44]. David Carmona: [00:18:43] … they can apply that in different languages. So- Sam Charrington: [00:18:46] Mm. David Carmona: [00:18:46] It's the same thing for models. So you can focus on teaching or training the core concepts, fine tuning for the concept. And then you have it for free in all the languages. Sam Charrington: [00:18:56] Mm-hmm [affirmative]. Yeah. [inaudible 00:18:57] I wanna dig into transfer learning and multitask. These are all things that are coming to mind as you're explaining this. But before we do that, we started out talking about language models as an example of these massive models that require a new way of thinking about, you know, AI at scale. And you mentioned, you know, the progression of the sizes of these models … And you know, it's 10X each time. GPT-3 is, you know, 10X Turing. And one question that occurs to me is, you know, is size the, you know, the most important or the only factor? You know, does it mean that each time we jump a generation, you know, "Let's just forget about the, you know … We shouldn't be using Turing anymore. Let's just use GPT-3 because it's 10X better." I think, you know, there are some obvious reasons why that might be the case, like if they're trained on, on different corpuses. Like, we know that GPT-3 has kind of a very broad public internet. And at least with GPT-2, like, there was a lot of critique about, you know, Reddit, you know, and, and the biases that get introduced there. So, the training set is going to be an obvious differentiator that separates from the size. But I'm wondering if there are other things that we need to be thinking about beyond just the size of the model. David Carmona: [00:20:24] Yeah. Yeah. No, you are right. And I think … So, it's a very simplistic thing to just discuss the models of … Or the parameters of a, of a model. [crosstalk 00:20:35]. Sam Charrington: [00:20:32] Mm-hmm [affirmative]. David Carmona: [00:20:33] There's way more. I have say, though, that the one thing that we are, we are seeing is that the more parameters that you add … Right now, we are not seeing the ceiling of this. So, we keep improving the accuracy and the generality of the, of the model. So, hey, parameters are important. But then at the same time, it is true that it really … So, there's not one model for everything. So, different models are good for different things. Right? And in our case, for example, we, we … Turing, our family of models. It's actually a family because of that. So, we don't believe that one model will … At least right now, will be useful for every single scenario that you are targeting. Right? So, in, in our case, we created that, that family of model, which are inclusive of, of many things, including many different language, like, this basic [inaudible 00:21:27] that I was providing before or, or this, these metrics- Sam Charrington: [00:21:30] Mm-hmm [affirmative]. David Carmona: [00:21:30] … of, of different models. You're gonna need a model for each of them, depending on what you want to accomplish. But then even beyond that, 'cause not everything that you do is NLP. So, in the family of Turing in Microsoft, we have models that are even multi-modal, that include image and text or that are focused on image. And that thing will keep growing. So, that's something important to keep in mind. The other thing is, of course, the eternal debate on the importance of the architectures, right, that, that you're using. So, I think there's a … And I don't have a super strong opinion. I think it's like everything. It will go through phases. It will get to a moment that just by adding brute force parameters, the thing will be very difficult to improve. And we'll need to be a little bit smarter on how we can improve those models. We can optimize those models in, in another different way. But again, I don't want to diminish the fact that we keep seeing that we add more parameters and, and we get more power. Right? One thing that you said, though, Sam, I, I want to, I want to double click on that 'cause it's super important. So, it's the responsible AI implications of the model. I think that will be an an area for models to differentiate and to keep in, in mind when you're using a model 'cause the reality is that, right now, these models, they have a lot of challenges from the bias, transparency, and, and, and others that, that we need to keep in mind. So, we need to just … So, we innovate on the power, accuracy and, you know, multitask aspect of generality of these models, we also need to innovate on the responsible side of them. And eh- Sam Charrington: [00:23:08] [crosstalk 00:23:09]. David Carmona: [00:23:09] As, as you said, the training corpus, that's important. I think right now, we are probably way too late in the pipeline to apply responsible AI principles to these models, meaning that we create things with these models. And then, just then, we apply those things like … I don't know. Like, you know, filtering or many, many other techniques that you can use there. I think we need to go earlier in the process, even at the point of the training, so we can make those models responsible by design. Sam Charrington: [00:23:41] Do have a sense for how we can do that? A lot of the power of these models comes from, essentially, taking the entire internet and building a language model based on it or, you know, large parts of the internet. How do you apply the, you know, how … What are the techniques that we can use to build responsibility earlier at that scale? David Carmona: [00:24:08] So just as an example, but one example in Microsoft could be the Office or the Outlook auto reply. Right? So, what is … So, that is the typical example of a massive NLP model that is taking as an input, an email and, as an output, is creating a likely reply that you want to, that want to do. Right? So- Sam Charrington: [00:24:28] Mm-hmm [affirmative]. David Carmona: [00:24:28] That [scenario on paper, it looks so simple [laughs] il- extremely simple. But when you get into the responsible side of [inaudible 00:24:37] extremely complex. And you need to, you need to pay a lot of attention. And it's not like a one-shot thing that you do, and done. You are, you are, you are golden. The reality is that you need to apply that across the entire lifecycle of the model from, as you said … So, you mentioned one that is important, which is the training data. So yes, of course, we need to get a subset of the training data to make sure that there's no toxic data that is training the model. But that is not, that is not enough. So, we need to keep in mind things like the privacy of the user. Right? So, think of, "How can we … " So, actually, for this feature, we use differential privacy to make sure that the instances that we use [inaudible 00:25:20] surface, they are not … They cannot identify a user or things like those. And you can also think of the input as something that we also manage, that we make sure that they are short answers, that they are not like, long emails [laughs], of course, things like those. So, it's something that you need to do at every stage. There's a ton of research, active research happening right now to really tackle this super complex challenge that we have with these models. Sam Charrington: [00:25:47] Mm-hmm [affirmative]. So, before we jump into how we achieve this kind of scale, you mentioned something in our pre-call that really stuck with me, is this idea that models are becoming a platform. And you know, transfer is a piece of that. Fine tuning is a piece of that. I'd love to hear you riff on, on that idea. I think it's a really interesting way to think about models. David Carmona: [00:26:14] Yeah, yeah. It's not a new concept. So definitely, we've been, seeing … So, you see our services [inaudible 00:26:23] services in Azure. And they support the concept of transfer learning. So, you don't need to train a model from scratch. Right? So, it's … But the reality is that a lot of what we do in AI is training models from scratch for your particular scenario. So, we're doing everything that we can to try to simplify that process because if we don't simplify that process, it's gonna be very difficult to really scale AI in an organization, in a, in a company. So, there are definitely many techniques to do that. I think in the area of NLP, fine tuning is the most relevant now. And then we can talk about some emerging ones that are super interesting and cool. But with the fine tuning process, the idea is that you pre-train … You can use a model that is pre-trained, like our Turing model, pre-train on that [inaudible 00:27:10] information from the internet, multi domain, totally general. And then you fine tune that model. So, fine tuning, meaning adding something to it. Like, for example, you want to fine tune the model to do a sentiment analysis. So, you would add then like, a classifier or something like that, a binary classifier. And then you use label data. In this case, you use like, sentences that are, you know, positive, negative sentiment. And then you fine tune. So, you train additionally. It's like extra steps of training that entire thing with your added classifier, in this case, for example, which is gonna update the weight. But it's not starting from scratch, meaning that you don't need that massive data and the skills because you don't need to change the architecture. You don't need to compute because it's not that much compute needed. So, that is certainly a huge step into democratizing these models. Right? So, that's, that's super important. And not only you can do that for fine tuning for specific tasks, you can also fine tune it for your domain. So, if you work in finance, or you work in health, or you are in any industry, and you want to find a law company … So, you want a law firm. You want to fine tune that model for the domain of your vertical. So, you don't need to train the whole thing. You just need to train for that particular domain. So, super, super important, but then what we're seeing is these models can go even beyond that. And that's a super interesting area. Right now, it's still in the beginnings. But what is the big difference with that approach? So, in this first approach, with fine tuning, you are training the model at some point. I mean- Sam Charrington: [00:28:51] Mm-hmm [affirmative]. David Carmona: [00:28:52] Not from scratch, but you're training it. You are changing the weight of, of the model. You're- Sam Charrington: [00:28:56] Mm-hmm [affirmative]. David Carmona: [00:28:56] You're updating that model. You need [inaudible 00:28:58] to train it. But then we have these other techniques. They are called like, zero-shot or few-shot, where you don't do that. So, the model can learn in [inaudible 00:29:08] time. So, you don't need to change the [inaudible 00:29:11] of the model. You have only a model. You don't change that model. Now, in [inaudible 00:29:15] time, where you are doing the inference of the model, you can … If you are doing a few-shot, then what you do is just provide a few examples of the task that you want to do, and then directly, the one that you want to solve. And the model will do it, which is mind blowing [laughs] that it can do that. But then you have zero-shot, which is like, the mind blowing times three [laughs], which is that you don't even need to provide examples. So, you can ask one of these models, "Hey, I want to translate this to French." And you provide the sentence. And the model will know how to do that. It will identify patterns in the corpus data that it was trained on. And it will know what it means to be, to do a translation. And it will do that translation. So, those techniques, what they are really doing, from fine tuning to few-shot to zero-shot, is making it much easier to really use these models in your particular scenarios for your particular domain, your particular task, or your particular modality. Super cool. Sam Charrington: [00:30:18] Mm. Awesome, awesome. We've talked about different kinds of models. Uh, just a few quick words on applications. Like, you know, what do you think are the most exciting applications of language models generally or, or Turing in particular, you know, within and outside of Microsoft? David Carmona: [00:30:38] Yeah. So what, what I can do because it's a [laughs], it's a big one. We can, we can talk for a long time. I can give you an overview of how we are using it in Microsoft. And then you can get a sense of, of the usages that, that it can have. So, in Microsoft, the way we look at this is like … We always look at these things, any technology is a stack. So, our goal always is to deliver a full stack. So, you just … And that's our approach to any technology. So, we do the research. But then we want to make sure that that research is available for others to, to use. And then we want to make sure that we keep adding layers [inaudible 00:31:19]. for example, the first one would be releasing that as open source. Right? So, we add another layer. We want that to be part of Azure, so you can train those models yourselves, which is the AI supercomputer that we are, providing in Azure to train those models. But then we keep building on that. On top of that, we have things like Azure machine learning. So, you have another abstraction layer that can improve your productivity, fine tuning those models, like [inaudible 00:31:44] mentioned before. But then we put another layer on top of that, which is [inaudible 00:31:49] services, which are end to end out-of-the-box services that you can use as [inaudible 00:31:54] points. And you can infuse directly into your application without worrying about doing anything with, with those models. And then on top of that, we build applications. So, we make them part of our products, like, Office, Dynamics. Or we create new products that were impossible before. So, that's the [inaudible 00:32:11] approach. I think if we focus on the application side, just to give you some, some examples of things that are already available, that people can use that are powered by these massive models [inaudible 00:32:21] a lot in Office. A lot of things in Office are powered by these models. So, you can think of, for example, semantic search in Office [inaudible 00:32:30] you open a Word document, you search for something in that Word document. And that is not the traditional find and replace [laughs] that we had before. This is semantic search. So, you can even ask questions to the document. And [laughs] the document will answer those, those questions. That is all powered by, by Turing. You have things like document summarization. So, you go to SharePoint, and you hover on a document. And you will see a summary of the document in there. That is a … It's an abstraction. So, it's not just taking parts of the document. That is generated with, with Turing. Things in Outlook, like Outlook auto-reply that I was mentioning before, or things like, … There's something meeting, Meeting Insights, that before a meeting, it will give you all the relevant information about that meeting. So, those are like, … In the taxonomy that we were talking about before, those would be Horizon 1. It's about making those applications better. But then we have these Horizon 2 things that are [inaudible 00:33:24] new opportunities that these models can open. And I think a good example of that would be Project Cortex. So, Project Cortex is part of the Microsoft 365 family. And the goal of that project is super cool. So, what it does is that it's able to get all your internal knowledge in your organization by looking at both the structure and the, and structure data in your organization. So, think of documents, meetings, PowerPoints, anything that you have in there, even images 'cause it's able to scan and do OCR on, on images. So, it's able to crawl all that information for your company, and then to extract knowledge out of that. So, what we do is that we create this concept of a knowledge entity. Like, imagine that, … I, I don't know. You are in a law firm. Imagine international, whatever, commerce. I don't know. I have no idea of, of law. But it's like a topic- Sam Charrington: [00:34:23] [crosstalk 00:34:24]. David Carmona: [00:34:23] … that then AI system was able to extract from your information. And it can, it can help you a lot. So, it can give you … It can provide you with a summary. It can give you, what are the most relevant documents for that particular subject in the company, what are the experts, so, who you should talk with about, about those topics. So, it's mind blowing [inaudible 00:34:45] knowledge basis. Right? So that, that you can get … It's extracting the DNA of your company. So, you can really make it available for the, for the rest of the employees. And like, those, I mean, I can [inaudible 00:34:57]. So, every, any product that you can mention [inaudible 00:35:00] use Bing. So, it's another, of course, super important one. Things like question and answer in Bing [inaudible 00:35:05] even the universal search. So, we use this trick of universal language representation in Bing. And those are all available in there as well. Yeah. So, we use it [inaudible 00:35:16]. But more on the business side, I would mention, in Dynamics 365, we use these models for a lot of different things. Very obvious one, of course, is anything that has to do with customer service understanding or, you know, sentiment analysis. All of that in customer service that is- Sam Charrington: [00:35:33] Mm-hmm [affirmative]. David Carmona: [00:35:33] … powered by these models. But then things that are more visionary. So, think of, for example … In Dynamics 365, one of the things that we can provide is suggestions to sellers in your company by looking at any interaction with that customer before, like emails or documents, phone calls, whatever. Right? So, it's able to understand that and structure information, and give you … It's like language generation. But in this case, to take the next steps to your, to your customs. Sam Charrington: [00:36:01] Hmm. David Carmona: [00:36:02] So, yeah. Super, super broad. We could talk for a while. Yeah [laughs]. Sam Charrington: [00:36:04] [laughs]. So, you know, let's maybe jump into what's happening that's enabling all of this to take place now. One of things that … You know, when we think about kind of the scale and size of these models … You know, we've talked about the scale of the compute that has been required to enable it. You know, how do you thi- … And you mentioned AI supercomputers. Like, what's that all about? How do you think about, you know, building out the infrastructure to scale and train these models? David Carmona: [00:36:36] Yeah. Le- let's say that the train model like this in your laptop will take probably thousands of centuries [laughs]. So, definitely, you need a lot of scale to train [crosstalk 00:36:48]. Sam Charrington: [00:36:48] Yeah. David Carmona: [00:36:48] And you need … I mean, it's amazing, the kind of challenges that you get when you grow a model like this. Like, fundamental challenges like, "Hey, the model doesn't fit in your GPU." [laughs] That's- Sam Charrington: [00:37:02] Mm-hmm [ David Carmona: [00:37:03] affirmative]. … Something that we wouldn't use before. Right? So, I think it is like … If you pass 1.3 parameters, something like that, then the model is not gonna fit. So, you better find new ways. But then it's just a computer. So, the time- Sam Charrington: [00:37:15] [crosstalk 00:37:16]. David Carmona: [00:37:16] … required to train one of these models, you need like, ultra [inaudible 00:37:19]. I, and, and I think … So, that's the main reason why we focus on … And like, always, like I was saying, in the beginning, we try to have a platform approach to it. So, not thinking of fixing this problem for Turing, for our models, but fixing this problem for our customers, so they can use this infrastructure as well. Sam Charrington: [00:37:38] Mm-hmm [affirmative]. David Carmona: [00:37:38] So, the approach that we took was building this massive infrastructure in Azure. So, these are massive clusters that are, that you can sting directly in Azure. And not only you can sting, then, of course, you have the complexity when you have … These are … I mean, imagine … For example, the one that we announced a year ago, that is a massive cluster of like, 10,000 GPUs. You have more 200,000 CPUs. So, it's massive scale. So, how do you manage that? You need things that allow you to manage that in a distributed way. And then what is even more challenging is, "Okay. So, I have my infrastructure completely managed. I can [inaudible 00:38:15]." It is integrated with Azure machine learning. So, you can like, launch like, jobs in that massive infrastructure. But then how would you actually do it? So, you have a model that is by definition, huge. So, how do you train that thing? How do you divide this task, this super complex task, into individual [inaudible 00:38:36] in your, in your massive cluster? And that's that's the other side of the coin, which is our work on these like, software systems that are meant to help you in that process. So, this was … At the same time that we announced the AI supercomputer, we also announced … It's called DeepSpeed. It's open source. So you can use it on, on top of anything. And it will help you do that for you. So, what it will do is that it will take this training. And it will distribute that training across a massive infrastructure. So, it will know how to do that in an efficient way. And it does it basically … It's like a three … We call a 3D distribution because it takes like three different [inaudible 00:39:18] to, let's say, chunk this task. Right? One, which is the most basic one, is the data distribution. So, you just [inaudible 00:39:27] your data in smaller chunks. And then you have [inaudible 00:39:30] each node is gonna take one of those chunks. But that is not enough. You need to go further than that. So, the other level of distribution that we use is [inaudible 00:39:39] distribution, which is [inaudible 00:39:41] because of the transformer architecture, that [inaudible 00:39:44] symmetry is [inaudible 00:39:46] to split the [inaudible 00:39:49] layers. So [inaudible 00:39:50] each node will take a different layer [inaudible 00:39:54] communication and optimization going on there that [inaudible 00:39:57] you need to take care. And then the last one is the [inaudible 00:40:00] which [inaudible 00:40:01] even for each of those layers, we can divide [inaudible 00:40:04] smaller chunk [inaudible 00:40:07] a different GPU. So [inaudible 00:40:09] what that allows you, it [inaudible 00:40:11] a lot of research involved [inaudible 00:40:13] this framework. [inaudible 00:40:14] you almost get like, a linear distribution, like, a linear growth in your model. So, you can [inaudible 00:40:20] number of parameters … And by the way, [inaudible 00:40:23] is able [inaudible 00:40:24] more than one [inaudible 00:40:25] parameters. So huh, you can train models that are not even [inaudible 00:40:29] existing today. And you see the line, and it's almost linear. So, it's exactly what you're, you are looking for in these systems. Sam Charrington: [00:40:35] Oh, wow. Wow. And what about on the hardware side? Microsoft announced this Brainwave Project some time ago to bring new hardware architectures to bear this problem. Can you share a little bit about that? David Carmona: [00:40:50] Yeah. So, yeah. We announced the [inaudible 00:40:53] maybe a little bit more ago. But it's fully available now. So, you go to Azure. And you go to Azure machine learning. And one of the options that you have to deploy your model is[inaudible 00:41:02]. And what, what that is gonna give you, especially [inaudible 00:41:05] inference time, is very low latency and a lot of, you know, efficiency in cost. Right? So, it's perfect for massive … I mean, I, I always use the same example. So, this feature in Word, one of the features powered in Word by Turing, is called predictive text. So, that means that, when you type, it's gonna give you suggestion, how the text will continue. Right? So [inaudible 00:41:29] think of [inaudible 00:41:30] intelligence, but, but for Word. 300 million users of Word. Imagine doing the inference of that model in every keystroke [laughs]. So, that's the- Sam Charrington: [00:41:39] Mm-hmm [affirmative]. David Carmona: [00:41:40] That's the scale that we're talking here. it's huge. So, you better optimize that a lot if you want to scale it to that, to that number. And we do that … I mean, you have to do it in, … Again, it's like a game that you have to tweak every single step. Of course, we don't go with this m- multi billion models on inference time. So, there's a lot of optimization to do there to reduce the number of parameters, to even using techniques to make it more efficient. And then there's the hardware. Right? So, we use the ONNX Runtime thing in Microsoft. That can optimize not only for the CPU … So, it has optimization for CPUs, but also for [FPA 00:42:21]. So, it's a way of [inaudible 00:42:23] from the hardware that you have, underneath. And it really allows you to bring all these things that are great to talk from the research point of view. But then putting [inaudible 00:42:33] in action, it requires all this level of detail that is a new level of complexity. Sam Charrington: [00:42:38] Mm. So, this is primarily focused on the inference side. Do you see any … Are there any particular innovations you're excited about on the hardware side for training? Or you, do you see it primarily being evolutions of today's GPUs? David Carmona: [00:42:55] I mean, when we see … I mean [inaudible 00:42:57] super evolving. So, we'll see … The reality right now is that you have to be flexible. So, we are not- Sam Charrington: [00:43:02] Mm-hmm [affirmative]. David Carmona: [00:43:02] … discarding any approach, any at all. Right? So, the reality is that FPA for the inference was super efficient because it allows you to change it. Right? So, it's programmable. So, that was very, very efficient [inaudible 00:43:16] and very agile. The combination of agility and efficiency was, was the right thing. But that may change at, at any moment. And as these things get more stable, then ASIC may be the way to go. And, and, yeah, of course, we are, we are not discarding any, any of those approaches. Sam Charrington: [00:43:32] So, how do you see this level of scale that we're dealing with today impacting the world for kind of users of AI? What, what changes? David Carmona: [00:43:43] I think that the main thing maybe bringing, bringing all of this together is how this will change the way that you develop AI. So, how this will open new ways of developing AI that we can, that we can use right now. So, that whole concept of creating more general multitask, multi-domain, multi-modality models, that then you can customize for your particular task, that is, that has huge implications on how you can … One, how you can scale AI in your organization and how AI can scale to other organizations, like smaller organizations. Right? So, that for us, it's a, it's a huge aspect of, of all of this. And the way that I see it is, is that uh, it's kind of what we experienced in the last 20 years for software. And this is very similar. So- Sam Charrington: [00:44:38] Mm-hmm [affirmative]. David Carmona: [00:44:38] Software at some moment, we had the hard lesson that software has to be super connected to [laughs] the business. So, you have a team of software developers in a basement [laughs] not connected to the- Sam Charrington: [00:44:51] [laughs]. David Carmona: [00:44:51] … business, that is not gonna work. I think we are ki- … AI is in a basement right now, kind of. Right? So, it's- Sam Charrington: [00:44:57] [laughs]. David Carmona: [00:44:57] We are not fully connected to the business [inaudible 00:45:01] because it requires so much like, skills so many skills and expertise that, that it's a very technical domain right now. We need to change that. So, we need to make sure that the business and a- AI come together. And, we learned that with software. It's called devops. It's about bringing the two together, and then doing a small iteration [inaudible 00:45:22]. It's coming to AI. We are all talking about MLops now. It's a huge area. It's our [inaudible 00:45:28] definitely in Microsoft to provide the platform to empower that collaboration and that continuous iteration, and trackability of everything that you do in your AI development cycle. [crosstalk 00:45:37] and that will be, massively be empowered by AI at scale. So, you have models that can really empower like, a more dynamic way, so you don't have to create from scratch, these models. You can iterate on them with the business and just focus on teaching your domain to the model instead of starting from scratch. That goes in that direction. We do think that there's one step beyond that. We are also seeing … We also saw it with software. That also needs to happen with AI, which is really going beyond the technology and the businesses, and getting to every employee. So, how every employee in an organization should be empowered with AI just like they can Excel right now to [inaudible 00:46:21] numbers [inaudible 00:46:21] that for AI. So, every employee can apply AI, and not only apply it, but also create, consume, mix and match [inaudible 00:46:31] of having some level of freedom to really apply AI to, to what they do. That's another huge area, like the augmented intelligence area. Sam Charrington: [00:46:41] Mm-hmm [affirmative]. David Carmona: [00:46:41] That [inaudible 00:46:42] models, we, we may see it happening sooner than later. Sam Charrington: [00:46:45] Awesome. Well, David, it's been wonderful to catch up with you and to dig into some of the work you're doing around AI at scale. Thanks so much for taking the time to chat with us. David Carmona: [00:46:58] Thank you so much, Sam. It was a pleasure. Sam Charrington: [00:47:00] My pleasure. David Carmona: [00:47:01] Thank you. Sam Charrington: [00:47:02] All right, everyone. That's our show for today. To learn more about today's guest or the topics mentioned in this interview, visit TWIMLAI.com of course, if you like what you hear on the podcast, please subscribe, rate, and review the show on your favorite podcatcher. Thank you so much for listening, and catch you next time.
Sam Charrington: Hey Everyone! Last week was the first week of our TWIMLcon: AI Platforms conference, and what a great first week it was! Following three days of informative sessions and workshops, we concluded the week with our inaugural TWIMLcon Executive Summit, a packed day featuring insightful and inspiring sessions with leaders from companies like BP, Walmart, Accenture, Qualcomm, Orangtheory Fitness, Cruise, and many more. If you’re not attending the conference and would like a sense of what’s been happening, check out twimlcon.com/blog for our daily recaps, and consider joining us for week two! Before we jump into today’s interview, I’d like to say thanks to our friends at Microsoft for their continued support of the podcast and their sponsorship of this series! Microsoft’s mission is to empower every single person on the planet to achieve more. We’re excited to partner with them on this series of shows, in which we share experiences at the intersection of AI and innovation to inspire customers to reimagine their businesses and the world. Learn more at Microsoft.com/ai and Microsoft.com/innovation Sam Charrington: [00:01:29] All right, everyone. I am here with Gurdeep Paul. Gurdeep is a corporate vice president with Microsoft. Gurdeep, welcome to the podcast! Gurdeep Pall: [00:01:38] Thank you, Sam. Really excited to be here. Sam Charrington: [00:01:40] I’m super excited for our conversation today! As is our typical flow, I’d love to have you start by introducing yourself. You’ve had quite a career at Microsoft culminating in your work in AI and autonomous systems. Tell us a little bit about your background and how you came to work in this field. Gurdeep Pall: [00:02:02] Thanks Sam. I’ve had a really nice long run at Microsoft, as you mentioned. And in fact, today is my 31st anniversary at Microsoft. Sam Charrington: [00:02:11] Wow. Gurdeep Pall: [00:02:12] So, yeah, it’s been a long career, but I really had a great time. In fact I feel like I’ve been into the candy store like three times. So my career can be divided into three parts. I worked on networking and operating systems. So that was sort of my first gig at Microsoft. I was very fortunate to work on a lot of the internet technologies when they were first rolled out in operating systems. I worked on VPNs, I’ve worked on remote access. And then I worked up to windows XP, I was the general manager for windows networking, where we shipped wifi for the first time in a general purpose operating system. And then at that time I moved over to work on communications and I started Microsoft’s communications business. So these are products that you may remember from the past, things like office communication server, which became link, which became Skype for Business, which is now Teams. So started that business from scratch, and all the way until we announced teams, in fact, a few days before we announced Teams, I was involved with that business. Though I’d had a stint in the middle on AI and I came back to work on AI. So it’s been, I would say, roughly three parts to my career and the latest being AI. And I’ve had lots of fun in all of them. Sam Charrington: [00:03:30] That’s awesome. I talked to so many people at Microsoft too, are working in AI and a lot of them started their careers working on Bing. You’re maybe one of the the outliers in that regard. Gurdeep Pall: [00:03:43] Well, the funny thing is that first stint had mentioned on AI was actually in the Bing team and I was running Microsoft speech. I was running some of our interesting explorations we were doing at Bing, recognizing objects. In fact, some of the image stabilization work we’ve mentioned to HoloLens actually came out of that group. So yeah, I worked on maps and lots of interesting stuff. Sam Charrington: [00:04:08] That’s awesome. So tell us a little bit about autonomous systems and some of the work you’re doing in that area. Gurdeep Pall: [00:04:14] Yeah. So, for the last four years or so, I’ve been focused on emerging technology and how it can be applied to interesting business problems. And, in that regard, I’ve worked on some interesting technology in the language space, language, understanding space. Worked on ambient intelligence where you could actually make sense of a space sort of make reality computable if you will. And then as I was exploring interesting emergency AI, which can solve business problems, we started focusing on autonomous systems. That was interesting to us, not just as a very interesting aspect of which AI was enabling, but also Microsoft didn’t have a lot of focus in that area before. So, when I talked to Satya and the time Harry Shum was here, we decided this was an area we were going to go invest in. Sam Charrington: [00:05:04] Interesting. And one of those investments was the acquisition of a company called Bonsai. This is a company that I know well. I interviewed one of the founders, Mark Hammond. This was back in 2017. It’s hard to believe it was that long ago. And the company had a really interesting take on using technologies that are still difficult for folks to put to productive use, namely reinforcement learning. Their take on it was this idea of machine teaching. Maybe you can tell us a little bit about that acquisition, the role that it plays in the way Microsoft thinks about autonomous systems and elaborate on this idea of machine teaching and some of the things that Bonsai brings to the table. Gurdeep Pall: [00:05:49] Sure. Absolutely. So, when we started focusing on autonomous systems, we were like trying to get our hands around this thing. People interpret the autonomous systems, many different ways. Some people think it’s only about autonomous driving, so let’s build a vertical stack. Some people think about robots, these humanoid robots with arms and joints and so on. And we’re thinking, what is our point of view? And, at the end of the day, we look at our own capabilities. We’re a software company, what is a software interpretation of the space? And it was with this sort of point of view that we started thinking about it. There was some work going on in Microsoft research at the time, which I’ll talk more about. And that’s when I first met Mark and team and we had a really good discussion and, as we finished the first meeting, I remember this thing going through my head, that this is like such a great approach. And it really fits into how we are starting to think about this space and makes sense to us. And then also thought, God, this feels like, just the wrong thing for a startup to do, building platforms and tools. It’s a tough thing. And Mark is such an incredible guy. I think you’ve talked to him, so you know that. So when we first finished the acquisition, he shared that with me too. He says, every VC I talked to, he says, why are you doing this? This is like the kind of thing Microsoft should be doing. So it was a marriage sort of made in heaven as it were, and C acquired that company. And it’s been really great, actually working with Mark and picking up from some incredible thinking that. You know, he and Keene had done and the team that was there, and then actually really expanding on that and really helping it realize its potential and also making it much more of an enterprise ready sort of an offering because this space is as mission critical and as important as it gets. So that’s been a very fun journey for the last two and a half years. Sam Charrington: [00:07:52] One of the ways I’ve heard you describe the way you’re approaching autonomous systems or that world broadly, and its two words and I still may butcher one of them, but it’s like this marriage of bits, and is it atoms that you say? Or molecules, or something else? But the idea is that,and this was something that was core to the way Bonsai Gurdeep Pall: [00:08:15] articulated what they Sam Charrington: [00:08:16] called then industrial AI. It’s a different problem when you’re applying AI solely in a software world, Gurdeep Pall: [00:08:23] recommendations on a website or looking at Sam Charrington: [00:08:27] customer churn, to when you’re actually trying to move physical goods or devices or systems. Elaborate on what you’ve seen in terms of the different requirements that come up in that world. Gurdeep Pall: [00:08:43] Absolutely. This is a very important point, when we start focusing on autonomous systems. I know people asking me about half the time, “oh, you’re talking about RPA, right?” No, I’m talking about RPA. Of course it doesn’t help when some of the RPA companies were calling their tech robots and, it could take action and so on. So it was in some ways, it was just a way for us to be clear about what we are doing. And we said, no, we’re actually focused on atoms, not things we just deal with bits. Of course, to digitize anything, you have to go from atoms to bits and then reason over it. But that became sort of the mainstay for us. The biggest difference, I would say, between those two worlds is that there is in the physical world, it is governed by some things like physics. The physical world, of course there’s Newtonian physics, and then you get into some of the multi-joint movements and you get into fluids, that’s a whole different kind of a physics which comes in. So you have to really think about modeling the real world and how then you can apply the tech towards that. The second thing I would say is that, most of the scenarios in autonomous systems pertain to taking action in the real world. And when you’re taking action in the real world, every time you take an action, the real world changes. And this is where reinforcement learning becomes a very natural mate as an AI technology for the problems that really apply to the real world, which is great because we have no other science which allows us to take a really sort of an unbounded state space and actually reason within it. And reinforcement learning becomes this really important piece in it. Lastly, I would say is that, every problem that we’ve looked at from an autonomous system space typically is one where there are experts who exist already. So far we haven’t been called to a problem where this is completely new and completely different and “oh, let’s solve it for the first time,” you know? And so tapping into the human expertise became a very important piece of this equation as well, which sometimes you don’t need to worry about, [inaudible] the data, you throw things at it and then maybe there is judging, certainly, if you want to sort of fine tune the models and so on, but that was another interesting aspect of this. Sam Charrington: [00:11:11] So we’ll be digging a little bit deeper into some of the technology that makes all this happen, but you started to mention some of the use case scenarios. Can you dig a little bit deeper into some specific scenarios that you’ve been working on? Gurdeep Pall: [00:11:27] Absolutely. And that’s, one of the things which makes this very, very interesting to me because it’s literally everything you see in the world around you can be a target for some of the technology that we’re building. Everything from smart climate controls. This is a field, HVAC control is a field that has, for the last 70 years, theres been very incremental improvement. Things like fuzzy logic and stuff like that has been used. And, we’ve seen incredible results using our approach. There things have plateaued out in performance. We were able to bring a much better performance, so energy savings or better climate control. We’ve seen oil drilling, horizontal drilling from companies like Shell, where you have these incredibly big machines and they look like these Bazookas, and you’re drilling with them. And these machines need a pretty high level of precision, so great human experts can do it, but you sometimes need more work than you can actually get that many trained experts on the problem. So being able to guide the drill bits through that. Cheeto extrusion is a very interesting, complicated process. You know, it’s very easy to eat, very hard to make. I always say, I know there are professional chefs out there, but certainly I cannot make the same kind of eggs every morning. Because even that simple task of heating the oil and getting it just right and putting the eggs in, you cannot replicate it every time. But if you’re Pepsi and you’re making Cheetos, that has to be consistent every time. When you open a bag of Cheetos, everybody’s familiar with the fluffiness and the crispness, and so everybody’s a judge and you have to win that every time. So very hard problem, because you have this corn meal, which was mixed with water. It’s impacted by the age of the machine which is extruding, sometimes impacted by humidity, temperature, all these things. So it’s a highly dynamical system and experts today, they sample and then they tweak, and then sample and then tweak, and they’re really, very stressful jobs of trying to keep that quality right. Otherwise the quality folks will come in and reject the material. So this is a problem we’ve been approved to apply our tools to, and basically consistently keep tweaking the parameters of this process so that you can have consistent Cheetos coming out on the other side. Chemical process control and other polymer manufacturing. Very, very hard problem. Some of these problems take six months to design the process for producing polymer for a particular grade. And, if you’ve been able to apply this problem, they’re both in the designing and the actual manufacturing process itself. Our favorite thing is flying things. Bell Flight is an incredible company, they have all kinds of commercial as well as a military applications for their vertical liftoff vehicles and so on. They’re trying to bring autonomous capability to those things. So we’ve been able to apply this towards that as well. So as you can see, anything which has control in the real world where you’re sensing and you’re picking an action, and you’re taking that action sensing again, this kind of a loop exists, this technology can be applied. Sam Charrington: [00:14:53] It’s been interesting over the past few years, just reflecting on some of the early conversations I had with Mark and the team at Bonsai around. There’s kind of this pendulum in the industry where we started out with kind of, rules, like physics and how things work. And we’ve kind of early on in the, in applying AI, we throw all those rules away and kind of leaned heavily on data and statistics. And over the past few years, there have been efforts, both in academia as well as what you’re doing, to kind of incorporate the rules and the human expertise back into the equation, without tossing everything that we’ve gained in applying data. One of the interesting challenges, when you layer on the physical world here is simulation, and how do you let an agent explore and learn without destroying helicopters and lots of Cheetos? Share a little bit about the challenge of simulation and how that’s evolved to help make some of these problems more tenable. Gurdeep Pall: [00:16:01] Yeah. Yeah. I think that’s such an important piece of this equation. Reinforcement learning is great, but reinforcement learning requires many, many, many steps, literally just to get a policy to be robust. You can be six 60 million cranks in before you start to see your policy start to develop at the appropriate level. So the question is, how do you go do that in the real world. And this is, one of the big insights I think the Bonsai folks came up with, and then this was some work that was happening at Microsoft Research coming at it from a very different direction, but they sort of merge together.   This is AirSim, and I can talk more about that, but the ability to model the appropriate aspects of the real world so that you can actually take action against them, get the right input back, and use that to train the model has been sort of the biggest insights here. Because really, what it says is you’re taking the physical world and you’re creating a mapping of it in the digital world, which then allows you to train the models quickly. And that’s where these simulators come in. Now simulators can be, depending on what they’re trying to simulate, can be very computationally intensive. And if you are nervous towards equations and things like that, cFDs. These are pretty long running simulations and some are, of course, faster. Now because we are using simulators for training AI, we want to crank this very, very quickly. So sometimes you end up with this problem where the physics, or at least how that physics is approached using these mathematical equations, actually becomes like a big piece of the problem. And so this is an area on how to take simulation, and how do you mate it with the training of the AI in a way that you can do it fast, you can do it cheap and you can frankly do it in parallel because that is one of the things, we have with some of the RL algorithms now is that you can actually take a policy, the last best known policy, you can explore in thousands of machines at the same time, you can take the samples and come back and update the policy. And then you take that, and again, you fan it out and you’ve got learners which are learning very quickly.  Getting all that figured out is actually one of the big things we managed to get done after the acquisition as well. And it’s all running on Azure and really allows us to do stuff efficiently. Sam Charrington: [00:18:33] You mentioned AirSim what is that, and what’s the role that it plays? Gurdeep Pall: [00:18:36] Yeah, so fierce them was a project in Microsoft research, which started off in a team that was exploring drones and how you bring autonomy to drones. And they had very similar experience. This was, I think they started in 2015. They would go out with their drone in the morning and they would come back with a broken drone in the evening and they will have very, very little data. And it’s like, how are we ever going to get enough data to actually get this thing to fly, to do even the basic tasks? So that’s when they looked at some of the work that is happening in, frankly, the gaming world. And they looked at some of the incredible scenes that could be rendered with unreal and unity and those kinds of things, which, if you’ve seen Forza and stuff like that, I mean, these things start to look pretty real. And they said, let’s create a simulator for perception oriented tasks, where you can create a scene and you can integrate physics into that scene for the different objects that are involved. There could be a flying object, it could be something with wheels, which is driving, et cetera.   And so you integrate the physics and now you’ve created in an environment in which you can train AI. Now it could be reinforcement learning where you’re sensing. So, you model the actual sensors inside this virtual environment, and you are able to use that for reinforcement learning and taking actions. Or you can use these sensors that are modeled inside of AirSim itself, and you can just generate lots of data on which you can do supervised learning offline. For both these purposes. So AirSim, they created this tool for themselves and they realized it’s so powerful, so they put it out as an open source utility. So today it has more than 10,000 stars on GitHub. It is really one of the most popular tools because others are realizing that, this idea of being able to simulate the reality is a very powerful approach. Sam Charrington: [00:20:35] So, can you maybe talk us through for some of the, any of the use cases you described when you go into an environment with a real customer, with, real problems. What’s the process to actually get something up and running and demonstrate value that they can build on meaning concrete value as opposed to theoretical POC value. What, what does it take to really do that? Gurdeep Pall: [00:21:02] I think, and this is something that we’ve been working on and we will continue to work on because our goal is to get this to a point where people are able to identify that this is a great tool for the problem that they have. It’s not like some sort of a speculative exploring exercise. They know that they’ll definitely get the results if they adopt this tool chain and going from there, to actually training the policy and to be able to export the brain, and actually start using it at the real world. That period is pretty short. So this is a journey for us, it started off fairly long. And now we are at a point where we are focusing on these so-called solution, accelerators, these areas where, the problem is very clear, what we are solving, how to solve it is very clear. And then some of the things that you need, like what simulators do you need sometimes, folks already have simulators, other cases, they need a simulator. And then the entire thing is stitched together and all they need to do is come in and create the variations for the problem, create the policy, and then go ahead and use it. But this is what is needed to take a customer from, “Hey, I’ve got a problem. I don’t know what this thing does. Maybe I’ll understand that.” No. Okay. Now I know kind of a problem. I don’t know if the problem can be solved with this or not. So this is what we’ve been targeting. And as we’ve gotten our solution explorations to be very crisp, our own how we talk to customers because there’s, as you’re alluding to. There’s an education thing here, there is a confidence thing here. So we have to address all those pieces and we’re bringing the customers along the journey. The great thing is, customers like Pepsi moment, one thing they thought successful. They looked around the factory and said, I can put this approach on many things and that’s the conversation we’re having right now. The same thing with Shell, same things at Dell. So, this is the journey. Sam Charrington: [00:23:01] I appreciate in that the idea that to the contrary of what you might think if you read popular reporting about AI, it’s not like a silver bullet, particularly in this domain where, you’ve got some tool chain and it applies to every problem that any customer might have. And it sounds like you’re being strategic, selective and building kind of expertise and supporting tools around specific areas, so that, to your point, at when you are engaging with someone, they can have a high degree of confidence that you’ve done this before, you know how it’s going to work and what the process is. Gurdeep Pall: [00:23:37] Exactly. And the other interesting thing that we found, which is I think a little unique compared to some of the other things we’ve done with AI, is that the experts that we end up talking to in the different industries and these application areas, they have never encountered AI before. Folks who went to engineering discipline schools, real engineers, not fake engineers like software engineers, like us. I mean, these are like mechanical chemical, what have you. And when they went through college, they did Matlab and they did learn Simulink and so on. And they have relied on a set of tools that have given them employment, giving them career success and stood the test of time. And here, these five guys walked in with a swag and, Hey, we got AI for you and it’s called reinforcement learning. You gotta it’s really awesome. You got to try it. I mean that just doesn’t work. You should really bring them along. And then they have some real, real things that we’ve had to sort of go and take in like safety. Even if this thing worked, they want to be able to assert that this thing is gonna do something crazy. I mean, when you have that horizontal drilling machine from shell, And I mean, this thing can drill through anything. I mean, it’s this huge thing. There was a wall street journal article about three years ago when we first did this project with a two years ago, we did the challenge and, for them, they want to make sure that this thing actually is going to be safe and I’m going to create another new problem while it solve one for one. Yeah. So it’s, it’s been a learning thing for us, but it’s the need for the education, the need for bringing these folks along. And this is one of the reasons we did this project more app, which is this very interesting device. It’s like a toy, basically. It’s the three robotic arms, if you will. And there’s a clear plate on top. And the task is to balance a ping pong ball on this device, on this plate. Now this problem, of course, they’ll image it. The engineers will go to pin, right? I mean, PID control is something, in college. And guess what? So we said first, let’s start with Pitt. He does a pretty good job. But then he said, okay, well, I’m going to toss the ball onto the plate and see if it catches it well, turns up it doesn’t catch it. So that starts, then he said, I’m going to add more complexity. How about we try and make the ball go around the edge of the plate. So as the problem progresses in complexity, You now realize that the only way you can solve it is if you had something like our tool chain, which we have with Bonsai, you create a simulator and you have policy that you’re training, and then you’re able to get to that level of performance. So we did this solely to bring engineers who are used to a particular way along and to start to believe, and to start to get excited about this. So we created the sort of metaphor in which we could connect together with them. Sam Charrington: [00:26:37] Interesting. Interesting. It reminds me of this idea of, why deep learning is, is so important and software 2.0 and how, what is, where, where it’s particularly powerful is. In solving problems that we didn’t know how to write the rules for like in computer vision. Like how do you identify a cat versus a dog, right. The rules for that, who knows how to do that, but the neural network and figure that out. And similarly, there is a, a range of problems that PID is easily applied to, but there’s also a level of complexity that it is difficult to apply it to. And that is where you’re finding. The value in applying RL. Gurdeep Pall: [00:27:18] Exactly, exactly. And, we’ve you seen that either. They were just too many moving parts. So the folks had achieved automation, but they have not issued autonomy. So either it’s that class of problems, wherever you’re getting traction or that with the existing methods, they’ve plateaued out and performance. You know, there is more performance to be had, and this is incredible. Like you would think like, we’ve figured everything out, right? I mean, as a society and with all the advancements that’s happened, but HVAC control in buildings, we’ve been able to get startling results. I mean, this is millions of dollars, like on a campus that you can save. And then also the green benefits that you get from that. So there’s just tremendous opportunity. Sam Charrington: [00:28:07] So maybe let’s drill into that example more because I do want to get to kind of a more concrete understanding of what is the process look like? I’ve got a data center or physical plant or something, and, I have my HVAC costs are through the roof and someone told me about this AI thing on an airplane. And I called her deep, like, what’s the first thing that I do and how do I get from there to some cost reduction or greater efficiency or whatever my goal is applying some of this. Yeah. Gurdeep Pall: [00:28:40] So in this particular case, that’s, we’re focusing one of our solution accelerators just on this use case. Okay. And so we are able to say with very high confidence that. If you can give us this information. Which is typically you can have data that you might have collected because a lot of these are now sort of IOT sort of devices, the data that you’ve collected, we’re able to go from that data to we ingest that. And then this case, which is sort of another double click on the simulation thing, we able to actually create a data-driven simulator and we are able to now start creating a policy. Now they do need to specify, and this is where machine teaching comes in. They need to specify to us what behavior they are desiring. Which means that, that specification can be, is fairly, flexible. So you could say things like, I want it to be really informed between these times of the day. Or you could say if the outside temperature, which becomes one of the state variables, which goes into creating the brain, if that variable is outside of this range, then I want this kind of a behavior, in somewhere I want it to be cooler and inventory, I want to be warmer. All those inputs that are there now create a policy for me, which automatically controls the HVAC system, which means turning on the fan or turning on the heat or turning on the cooling and to do it dynamically because once the brain is built, all you have to do is to connect the inputs and the actions. So inputs is where we are sampling the state. And actions is what you’re saying. Okay. Increase heat, decrease, heat fees, the fan done off the fan, et cetera. And by the way, it’s not just temperature in this case. It’s also the carbon dioxide and nitrogen levels. And so on, all those are making sense and then the actions will be taken based on that. So that is what the position we would have. And we, again, trying to make it as. Turn key, et cetera, but recognize that every building is different. So every building has its own climate sort of fingerprint. And so there is work required in creating the brains. So you could take a brain off the shelf and use it. You know, I can’t say whether that would work better. It might have better energy consumption, but then use the people are not as comfortable. So you have to sort of tweak it and the more efficient we can make this end to end thing, but sooner folks can realize the value and a brain in this case is essentially a model or an agent or something like that is that fair? Great question. I have had, lots of folks asked me, including bill Gates. Why do you call it brain? and I think it’s a really good question. So the way we talk about it is it’s actually a collection of models. Okay. So. autonomous system tasks, sometimes these be decomposed into different parts. Like for example, if sort of robotic hand, it had to pick up an object and to stack it, you can pick up, can reach, can be one action. Pickup can be another action in a move and then stack. These are all distinct actions. No, some are pretty easy. You can almost sort of program them, reaching as nowadays, obviously many program depending on the device you have, but some need to be trained. So now this whole collection of things has to be orchestrated. And the right piece has to be invoked at the right time. And each one of them either is programmed, or this is a model and it’s a deep learning model. The Deanna Lynn Swann, and putting all of it together, becomes the brain. In fact, that’s how the human brain works. So the name is actually quite great, the visual cortex, and then, that’s the one has a particular purpose of, then it gives us another piece which then does reasoning. And then, you want to take. The action and that invokes a different part of the brain. So that’s why we call it a brain. And, yeah. Sam Charrington: [00:32:33] Okay. Going back to the HVAC example, you mentioned that a data driven simulation, so I’m imagining you coming to my company, I guess since this is my scenario and I’ve got the data center, I probably don’t have a simulation that exists for my data center and HVAC. And so. That’s immediately a big challenge if I need that to train a brain, but you’ve got a way to generate that just from the data that I’ve collected. Gurdeep Pall: [00:33:01] Yes. And this was something that we are having to do a lot more of as we are swinging and talking to customers, some have a simulator. Interestingly, now, simulators, as have been used for designing, modeling, testing they’ve existed. But typically there’s been a human on one side of the simulator, driving the simulator for whatever purpose they want. You know, if it’s flight simulator, you’re, you’re flying it. But for our case, It’s the AI, which has been trained as sitting on the other end of the simulator. And so some cases, we were able to take their existing simulators and to actually change the use case and still make it work okay. In some cases that worked great. Now, in some cases it didn’t work great because their simulator was designed for really different booklets. Like if you do CFD. the purpose is to model this thing and you have to model it to high precision. I mean, this is going to be, a plane flying through rain. So, it has to be very precisely done, but each crank, they typically have like HPC setups for CFD simulation, but each crank can take so much. So how are we don’t crack it so fast that we could learn, right. So we said, Well, that doesn’t work or they just don’t have a similar at all, like your case. So that’s where our next step is. Can you give us data? And for many folks, they have the data. If they have the data, then we say, okay, let’s start how we can take data. And how do we can actually make it into something that we can meet with our system. That worked for certain class of problems. And then we said as a complexity of problems, started increasing, we realized that we need a new trick up our sleeve. there’s a research group as part of my team. And we started looking at how can we apply deep learning to learn from this data to create simulators there. We ran into the first insight, which is that, deep learning is designed for sort of inference, right? So you run one crank. And you get a prediction and you’re done well. It turns out the real world is not like that. You know, this real world is modeled with differential equations, differential equations. Basically, you’ve got time and you’ve got this thing, which is continue to change its behavior with time. Depending on the previous state and the actions are being taken. So there’s some work, great work that is being done right now. And we are publishing it right now. In fact, some of it is already out in deep simulation networks and basically it’s like a noodle competitional fabric where you have, it’s kind of like ordinance where. You have with every crank, you take the output and sort of feed it back into the next time cycle. Of course you have to have, so the sampling of time can be actually variable. So you have to that neural competition fabric has to do with that, which is a pretty big thing in itself, but it also allows you to have many different components inside the simulation each, which is sort of learning in a different way. For example, if you’re tossing a ball. The ball has it’s physics. And then there’s the environment that has physics, which is new for me in physics, but turns out the Newtonian physics doesn’t change. You can toss a ball, you can toss up a water. So if you are training those components, it’s give me some of these pre-trained components. If you will, that can be trained ones, then you can, maybe tweak it based on the, the object will have different physics. But now, so you did this noodle competition fabric, which plays out in time. You are now able to have multiple components and you train this thing. This new architecture we believe is a pretty transformative thing in simulation because it now allows us to offer any complex simulation space. Which basically has lots of differential equations that are sort of running around inside of it. And we can train it reasonably quickly. Really.  It’s kind of like a graph noodle network because you have time and you have space. If you look at the components that actually make space. So there’s message passing, which is happening between every stage and that allows the learning to happen. And this backpropagation, which happens in which each of the components, like eventually you’re able to get a trained model, which can run like a simulator. So you stopped at some state to take an action, distinct States changes and you’re able to crack it. So we’re really excited about it. We think this will be a big accelerant in the approach that we have. Again, we get the data, use it, we can go at it and this similarly, they can also learn from other simulators. So if you have something that is quite inefficient, in terms of competition and stuff like that, this thing can learn of it. And then it can execute very fast. Because once it learns the fundamental differential equations that are underlying, this is just inference. It’s not doing any kind of a big competition once a string. So that is an area that we’re really excited about right now. Sam Charrington: [00:38:09] Awesome. So first step is capture some data. Next step, use that to train a simulator using this idea of deep simulation networks, potentially. Then you mentioned kind of using that to create a brain. It sounds like part of that is you corrected me when I said it’s a model. So part of that I’m imagining is figuring out the right level of abstraction for these different components or pieces. And then individually, I guess one of the questions that I had around that was. And when we talk about reinforcement learning and kind of a academic sense and how difficult it is to put it to use in real world situations. A lot of it has to do with like carefully crafting this objective function or cost function and all of the issues associated with that. You described what the customer has to do as more, less about describing this objective function and maybe constraining what the solution looks like. Am I kind of reading that correctly? And maybe you can elaborate on that and help us understand. Gurdeep Pall: [00:39:17] Absolutely. And you’ve, you’ve hit the nail on head on with reinforcement learning the reward, specification, the reward function that he had, the specification of that becomes the next problem. In fact, we have a very famous researcher at Microsoft research. Blackford, he’ll tell you that. He says, if you have a problem, And you modeled it as a reinforcement learning problem. You don’t have to, it really gets to the core of it, this thing, which is that getting the reward function. Right. And there’s lots of funny stories about bad reward functions and unintended consequences, but we ran into that and they still allow that in our tool chain, you can specify the board function, but now we are actually. The machine teaching, we read exploring what are other ways for an expert to describe what they want done and we’ve come to the concert or goal. So they specify the goal, using a particular approach, the semantics of which are contained within the problem and the environment. And we will automatically generate the reward function. Under the covers based on the goal. And we found this to be a very, much more approachable thing for, for our customers. In fact, a lot of our new engagements with customers, most of the time we ended up using goals. So that’s been, you know, and like I said, you know, we’re on this learning thing ourselves. And, you know, we’re seeing what’s working, what’s not working how to enhance it and move from there. Sam Charrington: [00:40:45] And so some of these like classical challenges with reward functions, like delayed attribution and things like that, that you see in reinforcement learning does goals as an approach. Side skirt those in some ways, or are those still issues that you see in the autonomy systems world? Gurdeep Pall: [00:41:06] Yeah. I mean, those are still issues we see and separately the algorithms are getting pretty good too. So he, you know, there’s an active area of research and better algorithms coming up. we are, you know, we are, we stay on top of that and be an incorporating more and more algorithms now into our tool chain because there’s some albums. Better suited for certain class of problems. Others are better for suited for another other type of problems, which then of course moves the problem to the next layer, which is which one do you select for? Which kind of problem. And you don’t want, obviously folks who’ve never done programming or AI to say, Oh, you tell me, do you want SAC? Or do you want this. No idea. Right? So we are also trying to put in that intelligence, so that it’s a, it’s a meta reasoning thing, which says, you know, given this kind of a goal, given this kind of a problem, and this is a sampling rate. So state space let’s automatically select the best algorithm. And we will use that for training. So, you know, nobody ever has to know, like, you know what craziness you had walked under the covers, but staying on top of this has been a really important piece for us. You know, we use this framework called re which has come out of a lot of the book please. you know, still can source Facebook. We are one of the. Big users of it and contributors for it now, in fact, the rate team 13, which is building that my team in Berkeley are literally in the same building on one floor apart. So there’s a lot of good intermingling there as well. So because we using that framework V relive is how people are adding more and more algorithms, you know, being able to really tap into that and what we find, of course, sometimes, you know, people will write an algorithm to publish a paper, but it’s not really Production grade. So then these come back and do our own implementation of it and contribute that. Sam Charrington: [00:42:54] So, kind of in this journey, we started with data, we built a simulation, we built a brain out of that simulation. Then that brain is able to then help me control my data center. HVAC. I’m imagining in this scenario that, you know, I still care about the safety issue that you mentioned. Maybe not, you know, it’s not a drill, that’s going to destroy my data center, but you know, I don’t wouldn’t want the policy that you recommend to decrease the life of my coolers or chillers. And then there’s also maybe explainability issues that arise. Like, why are you telling me to, you know, my HVAC engineer has always set the XYZ at six and you’re saying it should be at eight. Why is that? Gurdeep Pall: [00:43:40] Yeah, no, this is, it’s such a great topic. And, I’ve talked to my team and given my, experience at Microsoft. I remember when we were building windows NT and putting, networking into it. And so on, we had no idea how stuff was going to be attacked when the internet was starting out In fact, I was the development manager for the TCP IP stack for windows from 95 to 2000. I still managed to keep some of my sanity, but I can tell you, there were folks on my team who really were pushing 20 updates a week because we were starting to get attacked with every layer bottom of the network, moving its way up. All the way up into sockets, you know, all the tear drop API’s and all that. And then when they got to the top layer, that’s what is really started the most sophisticated attacks. That’s where I don’t know if you remember back after windows XP shipped the entire team took one year to harden the system. Because it was no longer just my problem as the networking guy, it was everybody’s problem. People who do buffer overruns and they would insert code and all that. So literally every component had it So the reason I’m telling this story is that I think that safety is a problem like that. And when we came into it, Hey, we got really good control and I can show you it better performance, but then there’s all this hidden stuff that you have to deal with. That’s been a big realization for us. it’s a multifaceted approach. So the first thing is, you know, you talked about like the wear and tear of the machine or breaking it down. A bunch of our use cases right now with customers are with those are factored in, and actually they’re factored in at the time of the teaching. So when you talk about the state space and something that has to be specified so that the policy is taking that dork out, so that component gets handled. The hardest safety things that are, there are like when the brain is operating, like, are we really at the mercy of the, sort of a deep learning model, which is going to say, take this action. And then, you know, the consequences of that are actually out of scope for, for, for what we’re doing. And this is where we started, you know, this is going to be ongoing work. This is never done. You know, kind of like what cyber security right now, we’re learning. It’s never going to be done, but we want to take some pretty concrete steps. So one really important work. And there was a newspaper that is published on this is that he developed a policy and the policy suggests an action. What do you do is you introduce another layer after that to decide if the action is a safe action or not. Now what goes into deciding, is it a safe action or not? Can be many things can be predicate logic. It can be temporal logic, you know? So you can pretty much assert no. Yes, because it is outside some range or it actually can be trained things itself. Like imagine adversity. Models which go into that component. So now when you are specifying in machine teaching right upfront, you can now start to insert ways where, you know, safety can be specified and that actually follows a very different path. Some of it will actually follow the path of the policy building itself because some things can be caught there, but other things are actually more brought into bear at operation style. And that is very important because, you know, you probably heard about some of the discussions on how like level five autonomy is going to be rolled out in cities. And they’re saying, you know, these bus lanes and stuff like that. And I think it’s a wonderful idea because you’re solving the other side of the equation, which is you can control. So imagine like, you know, I always talk about this example and my team just sort of looks at me strange. So imagine you have the sort of armed robot and it is working the space with humans, also working. It is very common. You see this in machines in factories, they will have a red line or dotted red line around the protection. And the humans know they’re not going to go there. And now you’ve created a rule which says, regardless of what policy, what action, the policy tells you, if it is outside of radial, whatever distance that is. You will not take that action. So you’ve created an environment in which humans and this armed robot to swing around can actually co-exist in the same place. So it’s a very pragmatic approach, but it has to be part of your solution. Otherwise you don’t, the engineers are right. I mean, these crazies are showing up with reinforcement learning and it’s going to create all kinds of issues for, for us safety issues and so on. Sam Charrington: [00:48:33] Yeah. I love that analogy and just taking it one step further. It would be a lot more difficult to build into your kind of motion trajectories, for example, a way for this arm to avoid a human that stepped into the zone, then building something that determines that a human has stepped into the zone and just shuts everything down. And I think what I’m taking away from what you’re saying here is that. Safety is a multi-layered problem. And it’s not all about kind of making the neural net responsible for everything it’s about identifying, you know, how you can enforce safety in these different levels. And thinking about it as a system, like from an engineering person. Right. Gurdeep Pall: [00:49:16] Exactly. I think that has been a big learning for us as well, that, you know, it’s not just resolved the hardest they have problem and suddenly, you know, everything and they will come, right? No, you have to really think about it that way. And I think this, you know, the safety layer, which evaluates after every action is recommended, you know, it has to be this amazing, like. This is where a lot of the new capabilities will come in in the future adversity stuff. But you can imagine a completely separate model, which is basically trying to, this is going to give you this one or zero. If anybody human has stepped into the red line, it is going to give you a one and it shut off. Right. And that keeps improving the perception and things like that. So, yeah. So it is, it is a system thing as you, as you know, that’s, that’s very good to think of. Sam Charrington: [00:50:03] Right, right. So maybe to help us wrap up. It’s the very beginning of 2021 autonomous systems is a. Kind of a broad area, where do you see things going over the next few years? How does this all evolve? Gurdeep Pall: [00:50:18] Yeah. You know, we believe that we’re entering the era of autonomous systems and you know, it’s always hard to predict, right? This is famous billboard thing. Prediction is hard, especially about the future, but, you know, I remember looking on windows, NT, the networking of the internet, you know, these things just, they explode. And some right elements have to be there for this explosion to happen. And I think with the breakthroughs in AI, with the focus on solving business problems in a complete way, like we talked with safety with the industry coming along, like, you know, we’ve been spending a lot of time on data during simulators, but we believe that the simulation industry that is there, you know, we really want to partner with them. We’ve got great partners with MathWorks, you know, with you to bring them along. So that. Together. We can create an end to end tool chain in which these autonomous systems can be created without, you know, requiring, you know, the level of high level of expertise. That for example is going into a lot of the autonomous driving. I mean, the teams that are building this dominance, driving stacks are just super deep driving. There’s super experts and they’re building it all in the sort of silo way, very vertical way. We want it to be horizontal components. Then you’ll have some of the vendors of autonomous systems where anybody can come in, they come and describe the problems. They’re able to create the brain and employ it. That’s going to explode the number of autonomous systems that are out there. And I think this is great for many different things, including our climate, including, you know, resilience that we’ve seen during COVID where logistics and these things just have to continue. Production has to continue. So I think now’s the time and, you know, I think it’s going to happen. Sam Charrington: [00:52:05] Awesome. Awesome. Well, good deal. Thanks so much for taking the time to chat and sharing a bit about what you’re up to there. Gurdeep Pall: [00:52:13] Totally my pleasure. And you know, you have a great podcast, so it’s great to be here talking to you about my stuff. Sam Charrington: [00:52:25]Awesome. Thank you. Thank you. Take care. All right, everyone. That’s our show for today to learn more about today’s guest or the topics mentioned in this interview, visit twimlai.com. Of course, if you like what you hear on the podcast, please subscribe, rate, and review the show on your favorite pod catcher. Thanks so much for listening and catch you next time.
Today we're joined by Jürgen Schmidhuber, Co-Founder and Chief Scientist of NNAISENSE, the Scientific Director at IDSIA, as well as a Professor of AI at USI and SUPSI in Switzerland. Jürgen's lab is well known for creating the Long Short-Term Memory (LSTM) network which has become a prevalent neural network, used commonly devices such as smartphones, which we discuss in detail in our first conversation with Jürgen back in 2017. In this conversation, we dive into some of Jürgen's more recent work, including his recent paper, Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions.
Today we're joined by Sergey Levine, an Assistant Professor in the Department of Electrical Engineering and Computer Science at UC Berkeley. We last heard from Sergey back in 2017, where we explored Deep Robotic Learning. We caught up with Sergey at NeurIPS 2019, where Sergey and his team presented 12 different papers -- which means a lot of ground to cover! Sergey and his lab's recent efforts have been focused on contributing to a future where machines can be "out there in the real world, learning continuously through their own experience." Sergey shares how many of the papers presented at the most recent NeurIPS conference are working to make that happen. Some of the major developments have been in the research fields of model-free reinforcement learning, causality and imitation learning, and offline reinforcement learning.
In my recent podcast with Facebook AI research scientist Moustapha Cissé, Cissé shared the insightful quote, “you are what you eat and right now we feed our models junk food.” Well, just like you can’t eat better if you don’t know what‘s in your food, you can’t train less biased models if you don’t know what’s in your training data. That’s why the recent paper, Datasheets for Datasets, by Timnit Gebru (see her TWIML podcast and meetup) and her co-authors from Microsoft Research and elsewhere is so interesting. In this paper, Timnit and company propose the equivalent of food nutrition labeling for datasets. Given that many machine learning and deep learning model development efforts use public datasets such as ImageNet or COCO–or private datasets produced by others–it’s important to be able to convey the context, biases, and other material aspects of a training dataset to those interested in using it. The Datasheets for Datasets paper explores the idea of using standardized datasheets to communicate this information to users of datasets, commercialized APIs, and pre-trained models. In addition to helping to communicate data biases, the authors propose that such datasheets can improve transparency and provide a source of accountability. Beyond potential ethical issues, hidden data biases can cause unpredictability or failures in deployed systems when models trained on third-party data fail to generalize adequately to different contexts. Of course, the best option is to collect first-party data and use models built and trained by experts with deep domain knowledge. But widely available public datasets, more approachable machine learning tools, and readily accessible AI APIs and pre-built models are democratizing AI and enabling a broader group of developers to incorporate AI into their applications. The authors suggest that datasheets for AI datasets and tools could go a long way in providing essential information to engineers that might not have domain expertise, and in doing so help mitigate some of the issues associated with dataset misuse. This perspective echoes similar thoughts from Clare Gollnick in our discussion on the reproducibility crisis in science and AI. She expressed her concern for developers turning first to deeper, more complex models to solve their problems, noting that they often run into generalization issues when those models are moved into production. Rather, she finds that when AI problems are solved by capitalizing on some discovery found through a strong understanding of the domain at hand, the results are much more robust. Timnit and her co-authors suggest in the paper that AI has yet to undergo the safety regulations of emergent industries of the past, like the automobile, medicine, and electrical industries. The paper points out that, “When cars first became available in the United States, there were no speed limits, stop signs, traffic lights, driver education, or regulations pertaining to seat belts or drunk driving. Thus, the early 1900s saw many deaths and injuries due to collisions, speeding, and reckless driving.” Over the course of decades, the automobile industry and others iteratively developed regulations meant to protect the public good, while still allowing for innovation. The paper suggests that it’s not too early to start considering these types of regulations for AI, especially as it begins to be used in high-stakes applications like the health and public sectors. Such regulation will likely first apply to issues of privacy, bias, ethics, and transparency, and in fact, Europe’s impending General Data Protection Regulation (GDPR) takes on just these issues. The proposed datasheets take cues from those associated with electrical components. Every electrical component sold has an accompanying datasheet that lists the component’s function, features, operating voltages, physical details and more. These datasheets have become expected in the industry due to the need to understand a part’s behavior before purchase, as well as the liability issues that arise from a part’s misuse. The authors suggest that those offering datasets or APIs should provide a datasheet that addresses a set of standardized questions covering the following topics: The motivation for dataset creation The composition of the dataset The data collection process The preprocessing of the data How the dataset is distributed How the dataset is being maintained The legal and ethical considerations For the full breakdown of all of the questions check out the paper, it goes into a bunch of additional detail and provides an example datasheet for the UMass Labeled Faces in the Wild dataset. It’s a thorough and easy-to-use model that has the potential for big impact. Datasheets such as this will allow users to understand the strengths and limitations of the data that they’re using and guard against issues such as bias and overfitting. It can also be argued that simply having datasheets at all forces both dataset producers and consumers to think differently about their data sources and to understand that the data is not a de facto source of truth but rather a living, breathing resource that requires careful consideration and maintenance. Maybe it’s the electrical engineer in me, but I think this is a really interesting idea. What do you think? Do you think datasheets could help address the issues of bias and accountability in AI? Are there instances where you would have found this useful in your own work? Let me know via email or via the TWIML slack channel. Sign up for our Newsletter to receive this weekly to your inbox.
Last week on the podcast I interviewed Clare Gollnick, CTO of Terbium Labs, on the reproducibility crisis in science and its implications for data scientists. We also got into an interesting conversation about the philosophy of data, a topic I hadn’t previously thought much about. The interview seemed to really resonate with listeners, judging by the number of comments we’ve received via the show notes page and Twitter. I think there are several reasons for this. I’d recommend listening to the interview if you haven't already. It’s incredibly informative and Clare does an excellent job explaining some of the main points of the reproducibility crisis. The short of it though is that many researchers in the natural and social sciences report not being able to reproduce each other’s findings. A 2016 “Nature” survey demonstrated that more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments. This concerning finding has far-reaching implications for the way scientific studies are performed. Gollnick suggests that one contributing factor is the idea of “p-hacking”–that is, examining one’s experimental data until patterns are found that meet the criteria for statistical significance, before determining a specific hypothesis about the underlying causal relationship. P-hacking is also known as “data fishing” for a reason: You’re working backward from your data to a pattern, which breaks the assumptions upon which statistical significance is determined in the first place. Clare points out that data fishing is exactly what machine learning algorithms do though–they work backward from data to patterns or relationships. Data scientists can thus fall victim to the same errors made by natural scientists. P-hacking in the sciences, in particular, is similar to developing overfitted machine learning models. Fortunately for data scientists, it is well understood that cross-validation, by which a hypothesis is generated on a training dataset and then tested on a validation dataset, is a necessary practice. As Gollnick points out, testing on the validation set is a lot like making a very specific prediction that’s unlikely to occur unless your hypothesis is true, which is essentially the scientific method at its purest. Beyond the sciences, there’s growing concern about a reproducibility crisis in machine learning as well. A recent blog post by Pete Warden speaks to some of the core reproducibility challenges faced by data scientists and other practitioners. Warden refers to the iterative nature of current approaches to machine and deep learning and the fact that data scientists are not easily able to record their steps through each iteration. Furthermore, the data science stack for deep learning has a lot of moving parts, and changes in any of these layers–the deep learning framework, GPU drivers, or training or validation datasets–can all impact results. Finally, with opaque models like deep neural networks, it’s difficult to understand the root cause of differences between expected and observed results. These problems are further compounded by the fact that many published papers fail to explicitly mention many of their simplifying assumptions or implementation details, making it harder for others to reproduce their work. Efforts to reproduce deep learning results are further confounded by the fact that we really don’t know why, when or to what extent deep learning works. During an award acceptance speech at the 2017 NIPS conference, Google’s Ali Rahimi likened modern machine learning to alchemy for this reason. He explained that while alchemy gave us metallurgy, modern glass making, and medications, alchemists also believed they could cure illnesses with leeches and transmute base metals into gold. Similarly, while deep learning has given us incredible new ways to process data, Rahimi called for the systems responsible for critical decisions in healthcare and public policy to be “built on top of verifiable, rigorous, thorough knowledge.” Gollnick and Rahimi are united in advocating for a deeper understanding of how and why the models we use work. Doing so might mean a trip back to basics–as far back as the foundations of the scientific method. Gollnick mentioned in our conversation that she’s been fascinated recently with the “philosophy of data,” that is, the philosophical exploration of scientific knowledge, what it means to certain of something, and how data can support these. It stands to reason that any thought exercise that forces us to face tough questions about issues like explainability, causation, and certainty, could be of great value as we broaden our application of modern machine learning methods. Guided by the work of science philosophers like Karl Popper, Thomas Kuhn, and as far back as David Hume, this type of deep introspection into our methods could prove useful for the field of AI as a whole. What do you think? Does AI have a reproducibility crisis? Should we bother philosophizing about the new tools we’ve made, or just get to building with them? Sign up for our Newsletter to receive this weekly to your inbox.
Healthcare applications of machine learning and AI have been in the news a bit more than usual recently, concurrent with the recent HiMSS conference in Las Vegas. HiMSS is a 45,000+ attendee conference dedicated to healthcare IT. Surprising no one, AI was a major factor at this year’s event. There was a whole subconference focused on ML & AI, plus a ton of AI-focused sessions in the regular conference and a good number of announcements by industry leaders and startups alike. I’ve only done a couple of healthcare-focused shows on the podcast so far, but I’m planning to dive into this area more deeply this year. Healthcare is arguably one the most promising–not to mention important–areas of AI application. Progress is being made across a good many areas, including: Radiology. Image-based diagnostics like radiology lend themselves to the application of deep learning. There are large amounts of labeled image data to work from and a degree of uniformity that's unmatched in many other vision applications. There’s been a raft of research papers on the application of deep learning to radiology and a lot of speculation about AI eventually replacing radiologists, but also strong arguments against this ever happening. Diagnostics. Radiology aside, machine learning and AI has the potential to help doctors make better diagnostic calls. One company that’s been active in this space is the startup Enlitic. The company–which at one time was lead by Fast.ai founder and former Kaggle president Jeremy Howard–wants to use deep learning to help make diagnostic calls, manage patient triage and screening programs, and identify high-level population health trends. Google Brain, Google DeepMind, and IBM Watson are all very active in this area as well, among others; the first of these recently published interesting research into the use of deep learning to predict cardiovascular diseases using retinal images, as opposed to more invasive blood tests. Health Monitoring. Machine learning is also driving health diagnostics and monitoring into the hands of consumers. Last year Apple unveiled a research study app that uses Apple Watch’s built-in heart rate monitor to collect data on irregular heartbeats and alert patients who may be experiencing atrial fibrillation. FirstBeat, a supplier to other fitness watch makers, uses machine learning to predict wearer’s stress levels and recovery times. I spoke with Ilkka Korhonen, the company’s vice president of technology about physiology-based models for fitness and training earlier this year. Personalized medicine. Personalized, or precision, medicine seeks to tailor medical interventions to the predicted response of the patient based on their genetics or other factors. Applications include selecting the best medicines for each patient and developing custom medications that target pathways based on an individual patient’s genetics. My interview with Brendan Frey of Toronto-based Deep Genomics explored a few of the opportunities in this space. Deep Genomics is working on “biologically accurate” artificial intelligence for developing new therapies. Electronic Health Records. The major EHR vendors–including Allscripts, Athenahealth, Cerner, eClinicalWorks, and Epic–all made announcements at HiMSS about ways that they would be incorporating AI into their products. Allscripts announced a partnership with Microsoft to develop an Azure-powered EHR, while Epic unveiled a partnership with Nuance to integrate their AI-powered virtual assistant into the Epic EHR workflow. Trump Administration advisor Jared Kushner even made an appearance advocating for greater EHR interoperability as a step towards applying AI, machine learning, and big data. Surgery. Researchers are beginning to incorporate AI into the planning and execution of a variety of surgical procedures. A variety of surgical scenarios have been explored, including burn care, limb transplants, craniofacial surgery, cancer treatment, and aesthetic (plastic) surgery. Of course, significant obstacles remain before we see AI fully integrated into healthcare delivery. Naturally, the barrier to releasing new products in healthcare is much higher than other industries since even small mistakes can have life-threatening consequences for patients. The techniques being applied now in research must be made more robust, a clear chain of accountability must be present, and justification for how why and how care decisions are made must be made clear. Improving robustness and performance will require time, a lot of data, and many rounds of testing. Increasing trust will further require new tools and techniques for explaining opaque algorithms like deep learning (the aforementioned Google research using retinal images provides a good example of this). We won’t see the autonomous robo-doctors of science fiction anytime soon, but machine learning and AI will undoubtedly play a significant role in the experience of healthcare consumers and providers in the years to come. Sign up for our Newsletter to receive this weekly to your inbox.
#MyAI, Your AI A few weeks back, following my visit to CES, I asked you to share your thoughts on AI in our personal lives. We've seen some insightful responses so far and, as the contest comes to an end, I thought I’d share some of them. As a reminder, we asked listeners to record short video responses to the questions: How has AI impacted your personal and home life? And how do you see it impacting you in the future? A common theme in many of the responses was an appreciation for how AI facilitates small tasks in new and more efficient ways. Sometimes this new level of utility makes it hard to think how you would've done it before. For example, for those who use it, features like photos search and text prediction offer fundamental shifts in usability. This calls to mind the idea that AI only seems like “AI” when it's magic; everything else recedes into the background and becomes expected. Some respondents agreed that the tech still had a long way to go with more complex tasks. At times, AI-powered actions can feel clunky and inefficient, forcing you to go out of your way to overcome its limitations. Having to reiterate spoken commands or correct recommendations are a couple of examples. So again, not always magic, but still a lot of goodness to be had. Let’s take a quick look at a few of AI products which folks said were most useful: Google Photos. The ubiquity of smartphones means we all have tons of photos and making sense of them all is non-trivial. According to our community, AI has made a big difference in making photo libraries more accessible. From the Google Photos app, you can easily search your gallery with a myriad of phrases and get astoundingly accurate results. Want to pull up those vacation pictures you took a couple of years ago? Just search “beach.” Looking for that picture of your neighbor’s dog? Just search for “dog,” and the pic is there. The integration is seamless, efficient, and intuitive--everything you’d want an AI to be. Peeling back the veil a bit, one presumes that Google Photos uses a combination of computer vision (CV) and natural language processing (NLP) techniques to label images so that they can be easily searched. Similar functionality is available via the Google Cloud Vision API, for you to embed into your own applications, or via the Cloud AutoML Vision API we discussed last month, should you need to train with your own images. The app also uses AI to pick and enhance your best pictures. Duolingo. One listener, Chandana, mentioned Duolingo, one of my favorite apps, in her submission. Duolingo is a popular language learning app that I’ve used a ton in my own endeavors. Like the other apps mentioned here, Duolingo uses machine learning so seamlessly that it’s not immediately appreciable just how much is involved.At the core of Duolingo is a model that tracks statistics about every word they’ve ever taught you--it’s a database with billions of entries that’s updated 3,000 times per second! The app uses an approach they call Half-LIfe Regression (HLR) to optimize when words are presented to users. HRL combines machine learning and data science with the psycholinguistic theory of “forgetting curves.” After implementing this method, Duolingo saw ten percent increases in user retention and activity. You can read more about HLR and even check out the source code at Duolingo’s blog post on the topic. Gboard. Typing on a glass surface is not quite as intuitive as having actual keys, so developing predictive text was essential to the rise of the modern smartphone. The first apps used predictive models based on dictionaries, but now products like Google’s Gboard rely on neural nets.Gboard initially used a Gaussian model to quantify the probability of tapping neighboring keys, coupled with a rule-based model to represent cognitive and motor errors. More recently these were replaced with a single long short-term memory (LSTM) model trained with a connectionist temporal classification (CTC) criterion. Google shares a bunch of detail about the AI behind Gboard here on their research blog. These are but a few of the many great examples of simple, seamless, every-day AI. I have enjoyed hearing your thoughts on AI through your entries, and I’d love to hear from more of you. Don’t be intimidated by the video format—we’ve made it super easy to record and upload your video right from the TWIML web page. Join the conversation by giving us your thoughts in 2 minutes or less. As a bonus, there are some pretty cool AI-powered prized for folks who get the most likes on their video. Sign up for our Newsletter to receive this weekly to your inbox.
Bits & Bytes A few interesting ML and AI related tidbits from around the web over the past week or so: China is building a huge AI-focused research and business park. The state-backed $2.1 billion technology park is part of China’s wider initiative to position themselves at the forefront of emerging markets. The 55 hectare park is expected to house 400 businesses and generate nearly $8 billion a year. Richmond-based AI startup, Notch, acquired by Capital One. Fifteen months after Capital One created an in-house “Center for Machine Learning,” the company has reportedly acquired Notch, a data engineering and machine learning consulting firm. LG distributes in house AI development tools throughout company. A few weeks after LG introduced ThinQ, the umbrella brand for the company’s smart home products, the company has announced availability of an in-house deep learning platform, called DeepThinQ, which is meant to facilitate AI technology development across platforms within the company. Google releases preemptible GPUs with a 50% discount incentive. Preemptible GPUs work well with large machine learning tasks and other batch computational jobs, and Google is making them cheaperfor customers. CEVA unveils new family of AI-processors designed for deep learning at the edge. As I mentioned previously, I’ll be keeping an eye on AI acceleration hardware products this year. CEVA's new line is one such offering. A new derivative-free optimization toolbox for deep learning. ZOOpt/ZOOjl is an interesting toolset that allows for optimization across data that is inconsistent or non-continuous, where standard algorithms like gradient descent which require differentiability would fall short. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & bytes New chips at NIPS. Intel Nervana's forthcoming Neural Network Processor was shown publicly for the first time at this year's NIPS, and the company provided an update on the NNP architecture on its blog. Meanwhile, Nvidia announced the Titan V, its latest GPU based on the Volta architecture. The new and improved GPU boasts 110 teraflops of computing power, an impressive 9x boost over its predecessor, and clocks in at a 'mere' $3k. Nvidia makes healthcare imaging play. Nvidia is taking a crack at medical imaging, an area growing rapidly as healthcare researchers and professionals consider ways to take advantage of AI’s efficiency and power. Nvidia has partnered with GE Healthcare to bring Nvidia's AI platform to its 500,000 imaging devices globally. The company has also partnered with Nuance, whose AI Marketplace for Diagnostic Imaging brings deep learning tools to radiologists. Increasing activity in the AI developer tools space, including an update from Apple's Turi. Interesting profile of Finnish startup Valohai, which offers not only infrastructure-as-a-service for machine learning but also a collaboration platform for data science team workflows. Apple recently released the Turi Create machine learning framework. The tool is designed to simplify the development of custom machine learning models and allow them to be easily exported to Apple’s suite of OSs. The Open Neural Network Exchange format (ONNX) for interoperable neural nets has now reached version 1.0 and is ready for production use, at least according to its backers Microsoft, Facebook and AWS. Learning to play from Zero. Google’s DeepMind recently posted a paper describing AlphaZero, AI software that is capable of learning to play any of three complex games: Chess, Go, or Shogi, achieving superhuman performance in each in under 24 hours of self-play. A single program capable of learning three very different and complex games demonstrates a versatility that is difficult to achieve with modern AI. Google released an AI that analyzes your genome. Genome sequencing has become exponentially more efficient in the 15 years since the first human genome was sequenced. However, analyzing the sequenced genomes is still a tedious process. Google’s DeepVariant is a tool for researchers that can be used to identify regions of interest in a patient's DNA. AI startup funding roundup. Prognos announced it has raised $20.5 million, the startup uses AI algorithms to enable earlier identification of patients who can benefit from enhanced treatment. Legal software firm Luminance announced that it has received $10 millioninvestment to fund its expansion into the U.S. and speed the legal review contracts and other documents with AI. Chinese tech giant Alibaba Holdings has supposedly invested $227M in Beijing-based facial and image recognition firm SenseTime, valuing the firm at $3 billion post-investment. Chattermill announced it raised £600k in funding, the company employs deep algorithms to help companies makes sense of customer feedback. Interesting acquisitions. After getting off to a less than stellar start with the release of their virtual assistant, Bixby, Samsung has acquired Fluenty to help bolster its launch of Bixby 2.0. Siemens has boughtSolido Design Automation, a company using ML algorithms to perfect complex chip design and make sure they’re optimized for power consumption. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & bytes In run-up to re:Invent, Google cuts GPU prices. Google Cloud announced lower GPU pricing for NVIDIA Tesla and K80 GPUs. Price reductions are as high as 36% in some cases. Cray jumps on deep learning bandwagon. Supercomputer maker Cray announced a host of new deep learning initiatives including Accel AI, a suite of offerings for deep learning training and inference ranging from starter systems to a production-level cluster supercomputer, the Accel AI Lab, an innovation center providing educational content and classes, and new software support for TensorFlow. China challenges Nvidia’s hold on AI chips. China’s Ministry of Science announced (anew) its intention to fund research to develop chips for accelerating AI workloads. Meanwhile, across the Yellow Sea, Samsung announced the formation of a new AI research center. No grad students? Consider Mechanical Turker Descent. A new paper by Facebook AI Research proposes a general framework and algorithm for interactively and collaboratively training ML models. The system presents humans (e.g. Mechanical Turkers) with a game which has them compete against each other to come up with useful training examples for an ML agent. At the end of each round each player’s agent is evaluated on all the other player’s data sets, and the player whose agent fares best wins the round. Deal desk AISense raises $10M Series A to power ambient voice intelligence  (transcription on steroids). This is the third startup I’ve heard about in a month going after some variation of this idea. Nvidia-backed Chinese autonomous truck co TuSimple raises $55M Series C. Finance analytics startup Xenodata raises $2.2M from Japan megabanks. The company uses NLP to extract financial data from reports. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Get ready to delve into newsletter number twelve! De-mist-ifying AI and the Cloud The other day my friends Aaron Delp and Brian Gracely interviewed me for their cloud computing podcast, The CloudCast. I’ve been spending a lot of time of late thinking about the multi-faceted relationship between machine learning, AI and the cloud of late, for some upcoming research on the topic, and our discussion provided a nice opportunity to summarize some of my thoughts on the topic. (The podcast should post to their site in a couple of weeks; I’ll keep you posted here.) Most media coverage of cloud computing and artificial intelligence treat these two topics, for the most part, as distinct and independent technology trends. However, the two are, and will remain, inexorably tied to one another. The leading cloud vendors are all investing heavily in AI. Google, Microsoft, Salesforce and Amazon represent two-thirds of the top six companies with the most AI acquisitions, according to Quid data published by Recode. A number of reasons are motivating these firms’ investments, not the least of which are projections that AI training and inference workloads are huge potential growth markets for cloud vendors. In a recent investor presentation, NVIDIA projected that deep learning training and inference represent $11 billion and $15 billion markets respectively, both with steep growth curves.   Three types of cloud-based AI offerings When I think of AI in the cloud, three distinct delivery models come to mind: AI-optimized infrastructure. These are traditional cloud Infrastructure-as-a-Service (IaaS) offerings that have been optimized for machine learning and AI workloads. One obvious optimization comes in the form of GPU support, but network and memory architectures play a role here as well, as do tooling, developer experience and pricing model. Users get a lot of control and customizability over the infrastructure, but they must ultimately manage it themselves. Cloud-based data science platforms. These offerings represent the next step up in terms of abstraction. Users of these platforms, typically data scientists, don’t have to think about the underlying infrastructure supporting their workloads. Rather, they get to work exclusively in the realm of higher level concepts such as code notebooks, ML models, and data flows. They trade off some control for speed and agility, and they’re not responsible for managing the underlying infrastructure. Cloud-based AI (e.g. cognitive) APIs. Finally, these offerings represent the tip of the AI in the Cloud pyramid. They are simply cloud-hosted APIs backed by pre-trained models. A common example is image object detection. A developer calls the object detection API passing in an image, and the API returns an array of items that are detected in the image. These APIs make AI extremely accessible for developers who just want to build smarter apps, but you lose a lot of control. In many cases, you can’t even train the models on your own data, so if you get too far afield from the basics you may be out of luck. I’ll be elaborating on each of these areas over the upcoming weeks, and am planning to publish a three-part “AI in the Cloud” research series that covers each in depth. Stay tuned. Join Team TWIML! TWIML is growing and we're looking for an energetic and passionate Community Manager to help expand our programs. This full-time position can be remote, but if you happen to be in St. Louis, all the better. If you’re interested, please reach out for additional details. Sign up for our Newsletter to receive this weekly to your inbox.
After a successful first run, the TWIML paper reading group is back! I'm excited to share the details of the upcoming TWIML & AI meetup! The focus of the meetup will be discussing academic papers and other texts in the machine learning and AI space, though I hope we get to see some implementation demos from time to time as well. Our next presenter will be Nikola Kučerová, who has also stepped up along with a couple of other community members (thanks Duncan, Joshua and Nikola!) to help organize the meetup in general. Topic: Learning Long-Term Dependencies with Gradient Descent is Difficult, by authors Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Tuesday, September 12th, 2017 3:00 PM US Pacific Time / 6:00 PM Eastern Time Check HERE for your timezone.
Bits & Bytes AI getting stylish. Amazon’s Lab126 has developed (using GANs) an algorithm that learns fashion styles from images and can generate new items in similar styles from scratch. Zalando has released Fashion-MNIST, a dataset consisting of 70k 28x28 images of clothing and accessories designed to be a drop-in replacement for the ubiquitous MNIST database of handwritten digits. Ride the Brainwave. Microsoft announced a deep learning hardware platform called Brainwave. Brainwave includes a neural network processing unit (DPU) based on FPGAs, an architecture for building distributed systems around the DPUs, and the requisite software toolchain for using the system, which supports Microsoft’s CNTK as well at TensorFlow. Killer robot ban. Elon Musk and a group of researchers and AI experts from 26 countries have called for a UN ban on armed autonomous robots. It’s not the first time, but sounds like a good idea to me. Reinforcing RL Didn’t get enough RL in our Industrial AI series? Check out this newly published paper which reviews research in the field: A Brief Survey of Deep Reinforcement Learning. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Hi there, It was worth the wait, this newsletter number eight! Oh so Adversarial, Part I “Adversarial” is a hot term in deep learning right now. The word comes up in two main contexts: I’ll call them adversarial training and adversarial attacks here to be clear. You’ll also hear ambiguous terms like “adversarial machine learning” used, but no worries… after the brief explainer in this and a future newsletter, you’ll easily be able to distinguish the two based on context. Let’s start with adversarial training. Adversarial training is a pretty nifty idea. Basically what we’re doing with adversarial training is pitting two neural networks against each other so that one helps train the other. The idea was popularized by a good fellow named Ian in what he called Generative Adversarial Networks (GANs), in 2014. (This week’s podcast guest, Jürgen Schmidhuber, explored a similar idea in a 1992 paper, while at UC Boulder.) GANs apply the adversarial training idea to the creation of new stuff. In a GAN you have two neural nets: a generator that is trying to create real-looking stuff and a discriminator that can tell real-looking stuff from fake looking stuff. The discriminator is trained using real, labeled data (i.e. supervised learning), and then it trains the generator (i.e. unsupervised learning). Adversarial learning was the topic of the first TWIML Online Meetup, which was held a couple of weeks ago and for which the video archive is now available. The focus of the meetup was discussing Apple’s CVPR best-paper-award-winner “Learning From Simulated and Unsupervised Images through Adversarial Training.” Consider a problem like eye gaze detection in the context of the GANs I just described. You’ve got a picture from, for example a cell phone camera, and you want to determine which way the user is looking. Generating enough labeled eye gaze training data to train a robust detector is hard and expensive. Generating simulated eye gaze training data sets, is much easier and cheaper, though. (For example, we can use something like a video game engine.) The problem is that the simulated eye gaze images don’t look close enough to real images to train a model to work effectively on real data. This paper proposes using a Generative Adversarial Network to train a "refiner" network (i.e. a generator) that can make simulated eye gaze images look like real eye gaze images while preserving the gaze direction, with pretty good results. Cool stuff! Thanks again to meetup members Josh Manela, who did a great job presenting this paper, and Kevin Mader, for walking through a TensorFlow implementation of the model that he created! You guys are just awesome! Our next meetup will be held on Tuesday, September 12th at 3 pm Pacific Time. Our presenter will be Nikola Kučerová, who will be leading us in a discussion of one of the classic papers on recurrent neural nets, Learning Long-Term Dependencies with Gradient Descent is Difficult. Visit twimlai.com/meetup for more details, to register, or to access the archives. Sign up for our Newsletter to receive this weekly to your inbox.
This is a recap of the first monthly TWIML Online Meetup, held on Aug 16 2017. The focus of the meetup was the CVPR best-paper-award-winner "Learning From Simulated and and Unsupervised Images through Adversarial Training" by researchers from Apple (Link Below). Thanks again to community members Josh Manela who did a great job presenting this paper and to Kevin Mader for walking through a TensorFlow implementation of the model that he created! Make sure you Like this video, and Subscribe to our channel above! https://youtu.be/h7tFos8MSt0 Full paper: Learning from Simulated and Unsupervised Images through Adversarial Training Apple blog post: Improving the Realism of Synthetic Images To register for the next meetup, visit twimlai.com/meetup
Here’s your weekly fix, newsletter number six! We’ve been Cubed! What an exciting week for us here at TWIML HQ. On the heels of our one-year anniversary, and hitting 500k listens, I learned on Monday that your and my favorite Shark is an avid TWIML listener! Speaking to attendees at the Ozy Fest conference (NYC’s answer to SXSW), Mark gave us a shout-out as one of the resources he turns to keep up with advances in AI!!! Woohoo This little morsel appeared in an article on CNBC.com under the unfortunately click-baity title “Billionaire tech titan Mark Cuban on A.I.: ‘It scares the s--- out of me’.” While I haven’t been able to find a video of the full interview with Mark, and I find the mainstream press' fearmongering around AI generally frustrating and counter-productive, I otherwise agree with Mark’s general sentiment. In particular, as I noted last week, AI will have a huge impact on the way we live and work over the next 20 years; more so than any prior tech revolution. On the individual and societal levels, and all in between, we need to be thinking about and preparing for its impact. While AI will undoubtedly improve our lives, it will also force many changes in the way we live, work and relate to one another. For many these changes will be uncomfortable. There will be winners and losers. In the US and elsewhere, inequality continues to grow and social safety nets are being dismantled. These issues need to be addressed somehow. At some point though, Warren Buffet’s ‘Noah Rule’ applies: “No credit for predicting rain, only for building arks.” I don’t have answers to all the big thorny societal questions that AI begs, but I do believe that by better understanding the capabilities and limitations of machine learning and AI we’re better able to put it to good use, and better equipped to reason through the impact its likely to have. I’m glad the podcast has allowed me to play a small role in building that particular ark. Sign up for our Newsletter to receive this weekly to your inbox.
Bits & Bytes Apple launched their Machine Learning Journal site to publish their research. The first article is on a technique for improving the realism of fake images using a technique similar to GANs. Google updated their 9 million large Open Images dataset to add some 2 million bounding boxes and several million more labels. OpenAI published research on a new approach to reinforcement learning called Proximal Policy Optimization (PPO). PPO aims to outperform the current state-of-the-art methods while being simpler to implement. In a paper published in Neuron, Google DeepMind founder Demis Hassabis and co-authors argue that understanding human intelligence is the key to creating artificial intelligence. An interesting discussion of some ways technical debt is accumulatedin machine learning projects. Harvard Business Review features a nice profile of Facebook’s Applied Machine Learning group. The IEEE Computer Vision and Pattern Recognition (CVPR) conference just ended. Best Paper winners were Densely Connected Convolutional Networks and Learning from Simulated and Unsupervised Images through Adversarial Training. Perhaps someone reading this would like to present one of these papers at an upcoming meetup? Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
A number of you have expressed interest in participating in a TWIML paper reading group and I'm excited to share the details of the inaugural TWIML & AI meetup! The focus of the meetup will be discussing academic papers and other texts in the machine learning and AI space, though I hope we get to see some implementation demos from time to time as well. Our first presenter will be Joshua Manela, who has also stepped up along with a couple of other community members (thanks Duncan, Joshua and Nikola!) to help organize the meetup in general. Joshua will be presenting one of the Best Paper award winners from this year's CVPR conference! Topic: Learning from Simulated and Unsupervised Images through Adversarial Training, by authors from Apple. (See also Improving the Realism of Synthetic Images. Wednesday, August 16th, 2017 11:00 AM US Pacific Time / 2:00 PM Eastern Time Check HERE for your timezone.  
Bits & Bytes Interesting article in Science about explainability approaches for deep neural networks. Marco Ribeiro and Carlos Guestrin’s LIME is discussed—check out my interview with Carlos for more detail on that project. This recent article on interpreting neurons in an LSTM network is related, and also very interesting. Impressive work by the EFF compiling a bunch of metrics of progress in ML/AI. Nice touch presenting it as a Jupyter notebook! Researchers from Stanford and iRythm collaborate to develop 34-layer CNN that can detect arrhythmias in single-lead ECG better than a cardiologist. Three new papers from Google DeepMind explore teaching complex movement to simulated humanoid forms. Baidu releases Apollo—a comprehensive open source platform for self-driving cars. Is this the future Android OS for the autonomous, connected vehicle? They announced with a huge partner ecosystem, so it will be interesting to see where this goes. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
In 1996, Bill Gates popularized the saying "content is king." Twenty years later it's data that's king, and those able to harness it for better insights, predictions and experiences are the new kingmakers. To help give you a view into the next twenty years of data, and how to take advantage of it today, I've partnered with the team at Interop ITX to create the Future of Data Summit. This 2-day event will bring together noted experts and practitioners to discuss the future of enterprise data from a variety of technology perspectives. We’ll be exploring the innovation and opportunity being offered in areas such as, of course, ML, AI and cognitive services, but also IoT and edge computing, AR/VR, blockchain, algorithmic IT operations, data security and more. I’ve hand-picked the speakers to both inspire Summit attendees with a view into what’s possible, as well as provide practical insights into how to get there. Here's our agenda for the Summit: Day 1 - 5/15 Day 2 - 5/16 Opening Remarks Opening Remarks Enterprise AI & the Future of DataSam CharringtonPrincipal Analyst, CloudPulse Strategies Understanding Deep LearningJames McCaffreyResearch Engineer, Microsoft Research Living on the Edge: Fog and Edge Computing for a World of Ubiquitous DevicesJanakiram MSVPrincipal, Janakiram & Assoc. Building AI ProductsJosh BloomCTO, Chairman & Founder, Wise.io (GE) Break Break Data Gravity and Archimedes' Lever for IoT (10:45 - 11:45)Dave McCroryCTO, Basho AI in Financial Services at Capital One (10:45 - 11:22)Zachary HanifDirector of Machine Learning at Capital One The Intersection of Big Data, Cloud, Mobility and IoT: Making the Connections Bob Friend Director, National Practice, BlueMetal Marketing in the Age of AI (11:22 - 12:00) Srividya Kannan RamachandranPrincipal, Marketing & Data Science, Level 3 Communications Lunch Lunch Cloud, IoT and Big Data SecurityDiana KelleyGlobal Executive Security Advisor, IBM Big Data and the Advent of Data MixologyJennifer PrendkiSr. Data Science Manager, WalmartLabs How the Future of Hardware Enables the Future of DataAssaf ArakiSr. Big Data Architect, Intel AI in the Enterprise Panel DiscussionSam Charrington (moderator) Zach Hanif, Capital One Srividya Kannan Ramachandran, Level 3 Jennifer Prendki, Walmart Labs Break Break Algorithmic IT Operations (AIOps)Eric SammerCTO & Co-Founder, Rocana Virtual and Augmented Reality in the EnterpriseAmy PeckFounder & AR/VR Consultant, EndeavorVR Unlocking the Data Lake with AIAshley FidlerHead of Product, Versive IoT and the Bitcoin BlockchainAndre De CastroCEO & Co-Founder, Blockchain of Things Summit Details: Date: May 15-16, 2017 Venue: MGM Grand, Las Vegas A brief word about the parent event, Interop ITX. Interop ITX is one of the largest and longest running conferences providing education and networking opportunities to enterprise technology leaders and practitioners. The conference hosts 3-4,000 attendees. In addition to my Future of Data Summit, which is part of the pre-conference program, the regular conference offers dedicated tracks on Data & Analytics, Cloud, Infrastructure, Security, DevOps and Leadership & Professional Development. I hope you can join me for the great presentations and discussion we'll be having at the Summit. Registration for the Future of Data Summit is done via the Interop ITX web site, and you'll need a package that includes the summits in order to attend. Please use my promo code CHARRINGTON for a 20% discount and to let the folks at Interop ITX know that you're coming for the Summit.
Today, I'm joined by Kenneth Stanley, Professor in the Department of Computer Science at the University of Central Florida and senior research scientist at Uber AI Labs. Kenneth studied under TWIML Talk #47 guest Risto Miikkulainen at UT Austin, and joined Uber AI Labs after Geometric Intelligence , the company he co-founded with Gary Marcus and others, was acquired in late 2016. Kenneth's research focus is what he calls Neuroevolution, applies the idea of genetic algorithms to the challenge of evolving neural network architectures. In this conversation, we discuss the Neuroevolution of Augmenting Topologies (or NEAT) paper that Kenneth authored along with Risto, which won the 2017 International Society for Artificial Life's Award for Outstanding Paper of the Decade 2002 - 2012. We also cover some of the extensions to that approach he's created since, including, HyperNEAT, which can efficiently evolve very large networks with connectivity patterns that look more like those of the human and that are generally much larger than what prior approaches to neural learning could produce, and novelty search, an approach which unlike most evolutionary algorithms has no defined objective, but rather simply searches for novel behaviors. We also cover concepts like "Complexification" and "Deception", biology vs computation including differences and similarities, and some of his other work including his book, and NERO, a video game complete with Real-time Neuroevolution. This is a meaty "Nerd Alert" interview that I think you'll really enjoy.