We could not locate the page you were looking for.

Below we have generated a list of search results based on the page you were trying to reach.

404 Error
Google’s premier cloud computing AI conference, Google Cloud Next 2023, took place the last week of August at Moscone Center in San Francisco. I attended the event and had the opportunity to spend several days in a variety of keynotes, briefings, sessions, as well as explore the event’s expo floor. Of course, I shared some of my real-time observations via Twitter X, which you can check out here. Here, I’ll share a few of my key takeaways from the event. This was the first in-person Google Cloud Next event in three years. While the event felt a lot smaller and more compact than the last one I attended, it was still large for a post-pandemic conference with approximately 15,000 attendees present. Generative AI in FocusNo surprise here, but Generative AI was very much a key theme flowing throughout the event, though there was plenty of content for folks more interested in traditional cloud computing topics.In addition to enabling new features and capabilities for the company’s the core AI stack (AI-oriented infrastructure and accelerators, AI/ML/DS platforms, and AI-powered applications), Google is weaving generative AI into non-AI products through Duet AI, which adds AI-bases assistant technologies to a wide range of Google Cloud products.A good indication of the breadth of work they’ve done to quickly build generative AI into their product base can be seen in the many AI-related announcements they made during the event. Here’s a summary of the most interesting AI-focused ones, out of the full list of 161 noted in Alison Wagonfeld’s wrap-up post:Duet AI in Google Cloud is now in preview with new capabilities, and general availability coming later this year. There were a dozen more announcements covering Duet AI features for specific Google Cloud tools, but you can check out the blog post for a summary.Vertex AI Search and Conversation, formerly Enterprise Search on Generative AI App Builder and Conversational AI on Generative AI App Builder, are both now generally available.Google Cloud added new models to Vertex AI Model Garden including Meta’s Llama 2 and Code Llama and Technology Innovation Institute's Falcon LLM, and pre-announced Anthropic’s Claude 2. The PaLM 2 foundation model now supports 38 languages, and 32,000-token context windows that make it possible to process long documents in prompts.The Codey chat and code generation model offers up to a 25% quality improvement in major supported languages for code generation and code chat. The Imagen image generation model features improved visual appeal, image editing, captioning, a new tuning feature to align images to guidelines with 10 or fewer samples, and visual questions and answering, as well as digital watermarking functionality powered by Google DeepMind SynthID. Adapter tuning in Vertex AI is generally available for PaLM 2 for text. Reinforcement Learning with Human Feedback (RLHF) is now in public preview. New Vertex AI Extensions let models take actions and retrieve specific information in real time and act on behalf of users across Google and third-party applications like Datastax, MongoDB and Redis. New Vertex AI data connectors help ingest data from enterprise and third-party applications like Salesforce, Confluence, and JIRA.Vertex AI now supports Ray, an open-source unified compute framework to scale AI and Python workloads. Google Cloud announced Colab Enterprise, a managed service in public preview that combines the ease-of-use of Google’s Colab notebooks with enterprise-level security and compliance support capabilities. Next month Google will make Med-PaLM 2, a medically-tuned version of PaLM 2, available as a preview to more customers in the healthcare and life sciences industry.New features to enhance MLOps for generative AI, including Automatic Metrics in Vertex AI to evaluate models based on a defined task and “ground truth” dataset, and Automatic Side by Side in Vertex AI, which uses a large model to evaluate the output of multiple models being tested, helping to augment human evaluation at scale, and a new generation of Vertex AI Feature Store, now built on BigQuery, to help avoid data duplication and preserve data access policies.Now Vertex AI foundation models, including PaLM 2, can be accessed directly from BigQuery. New model inference in BigQuery lets users run model inferences across formats like TensorFlow, ONNX, and XGBoost, and new capabilities for real-time inference can identify patterns and automatically generate alerts. Vector and semantic search for model tuning now supported in BigQuery. You also can automatically synchronize vector embeddings in BigQuery with Vertex AI Feature Store for model grounding. A3 VMs, based on NVIDIA H100 GPUs and delivered as a GPU supercomputer, will be generally available next month. The new Google Cloud TPU v5e, in preview, has up to 2x higher training performance per dollar and up to 2.5x inference performance per dollar for LLMs and generative AI models compared to Cloud TPU v4. New Multislice technology in preview lets you scale AI models beyond the boundaries of physical TPU pods, with tens of thousands of Cloud TPU v5e or TPU v4 chips. Support for Cloud TPUs in GKE is now available for Cloud TPU v5e and Cloud TPU v4. Support for AI inference on Cloud TPUs is also in preview. GKE now supports Cloud TPU v5e, A3 VMs with NVIDIA H100 GPUs, and Google Cloud Storage FUSE on GKE (GA).Key TakeawaysMy takeaways from Google Cloud Next are very much in the same vein as those from my attendance at Google’s Cloud Executive Forum held earlier in the summer. I continued to be impressed with Google Cloud’s velocity and focus when it comes to attacking the opportunity presented by generative AI. The company clearly sees gen AI as a way to leap ahead of competitors AWS and Microsoft and is taking an “all in” approach. The company has also been very quick to rally customers around its new gen AI product offerings. In addition to the product announcements noted above, Google Cloud announced and highlighted new and expanded generative-AI-focused collaborations with a wide variety of customers and partners, including AdoreMe, Anthropic, Bayer Pharmaceuticals, Canoo, Deutsche Bank, Dun & Bradstreet, Fox Sports, GE Appliances, General Motors, Ginkgo Bioworks, Hackensack Meridian Health, HCA Healthcare, Huma, Infinitus, Meditech, MSCI, NVIDIA, Runway, Six Flags, eleven generative AI startups, DocuSign, SAP, and more.Interesting overview of @FOXSports use of Gen AI. Have 27 PB of video, ingest 10k hrs per month. Have custom models for things like celebrity detection, foul ball prediction, and more. Use the tech to allow analysts to more easily search archives. #GoogleCloudNext pic.twitter.com/ea3tQCVXU0— Sam Charrington (@samcharrington) August 29, 2023 AI-Driven Transformation panel at #googlecloudnext Analyst Summit featuring data leaders from ⁦@Snap⁩ and ⁦@Wayfair⁩. pic.twitter.com/aANlHv6nHT— Sam Charrington (@samcharrington) August 29, 2023 https://twitter.com/samcharrington/status/1696597457134817490https://twitter.com/samcharrington/status/1696597126090985574"For the first time, the business is really engaged in transformation... We will figure out hallucinations, omissions, etc., ... but the level of engagement is game changing."- Gil Perez, Chief Innovation Officer, Deutsche Bank /*! elementor - v3.12.2 - 23-04-2023 */ .elementor-widget-divider{--divider-border-style:none;--divider-border-width:1px;--divider-color:#0c0d0e;--divider-icon-size:20px;--divider-element-spacing:10px;--divider-pattern-height:24px;--divider-pattern-size:20px;--divider-pattern-url:none;--divider-pattern-repeat:repeat-x}.elementor-widget-divider .elementor-divider{display:flex}.elementor-widget-divider .elementor-divider__text{font-size:15px;line-height:1;max-width:95%}.elementor-widget-divider .elementor-divider__element{margin:0 var(--divider-element-spacing);flex-shrink:0}.elementor-widget-divider .elementor-icon{font-size:var(--divider-icon-size)}.elementor-widget-divider .elementor-divider-separator{display:flex;margin:0;direction:ltr}.elementor-widget-divider--view-line_icon .elementor-divider-separator,.elementor-widget-divider--view-line_text .elementor-divider-separator{align-items:center}.elementor-widget-divider--view-line_icon .elementor-divider-separator:after,.elementor-widget-divider--view-line_icon .elementor-divider-separator:before,.elementor-widget-divider--view-line_text .elementor-divider-separator:after,.elementor-widget-divider--view-line_text .elementor-divider-separator:before{display:block;content:"";border-bottom:0;flex-grow:1;border-top:var(--divider-border-width) var(--divider-border-style) var(--divider-color)}.elementor-widget-divider--element-align-left .elementor-divider .elementor-divider-separator>.elementor-divider__svg:first-of-type{flex-grow:0;flex-shrink:100}.elementor-widget-divider--element-align-left .elementor-divider-separator:before{content:none}.elementor-widget-divider--element-align-left .elementor-divider__element{margin-left:0}.elementor-widget-divider--element-align-right .elementor-divider .elementor-divider-separator>.elementor-divider__svg:last-of-type{flex-grow:0;flex-shrink:100}.elementor-widget-divider--element-align-right .elementor-divider-separator:after{content:none}.elementor-widget-divider--element-align-right .elementor-divider__element{margin-right:0}.elementor-widget-divider:not(.elementor-widget-divider--view-line_text):not(.elementor-widget-divider--view-line_icon) .elementor-divider-separator{border-top:var(--divider-border-width) var(--divider-border-style) var(--divider-color)}.elementor-widget-divider--separator-type-pattern{--divider-border-style:none}.elementor-widget-divider--separator-type-pattern.elementor-widget-divider--view-line .elementor-divider-separator,.elementor-widget-divider--separator-type-pattern:not(.elementor-widget-divider--view-line) .elementor-divider-separator:after,.elementor-widget-divider--separator-type-pattern:not(.elementor-widget-divider--view-line) .elementor-divider-separator:before,.elementor-widget-divider--separator-type-pattern:not([class*=elementor-widget-divider--view]) .elementor-divider-separator{width:100%;min-height:var(--divider-pattern-height);-webkit-mask-size:var(--divider-pattern-size) 100%;mask-size:var(--divider-pattern-size) 100%;-webkit-mask-repeat:var(--divider-pattern-repeat);mask-repeat:var(--divider-pattern-repeat);background-color:var(--divider-color);-webkit-mask-image:var(--divider-pattern-url);mask-image:var(--divider-pattern-url)}.elementor-widget-divider--no-spacing{--divider-pattern-size:auto}.elementor-widget-divider--bg-round{--divider-pattern-repeat:round}.rtl .elementor-widget-divider .elementor-divider__text{direction:rtl}.e-con-inner>.elementor-widget-divider,.e-con>.elementor-widget-divider{width:var(--container-widget-width,100%);--flex-grow:var(--container-widget-flex-grow)} Additionally, Google Cloud continues to grow their generative AI ecosystem, announcing availability of Anthropic’s Claude2 and Meta’s Llama2 & CodeLlama models in the Vertex AI Model Garden.TK highlighting breadth of model catalog in Vertex AI, via new and existing model partners. Announcing support for @AnthropicAI Claude2 and @MetaAI Llama2 and CodeLlama models. #googlecloudnext pic.twitter.com/E1gkpT59UA— Sam Charrington (@samcharrington) August 29, 2023 /*! elementor - v3.12.2 - 23-04-2023 */ .elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=".svg"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block} OpportunitiesNumerous opportunities remain for Google Cloud, most notably in managing complexity in both their messaging and communication as well as in the products themselves.From a messaging perspective, with so many new ideas to talk about, it is not always clear what is actually a new feature or product capability, vs. simply a trendy topic that the company wants to be able to talk about. For example, the company mentioned new Grounding features for LLMs numerous times but I’ve been unable to find any concrete detail about how new features enable this on the platform. The wrap-up blog post noted previously links to an older blog post on the broader topic of using embeddings to ground LLM output using 1st party and 3rd party products. It’s a nice resource but not really related to any new product features.And since the conference, I’ve spent some time exploring various Vertex AI features and APIs and generally still find the console and example notebooks a bit confusing to use and the documentation a bit inconsistent. To be fair, these complaints could be leveled at any of Google Cloud’s major competitors as well, but coming from an underdog position in the cloud computing race, Google has the most to lose if product complexity makes switching costs too high.Nonetheless, I’m looking forward to seeing how things evolve for Google Cloud over the next few months. In fact, we won’t need to wait a full year for updates, since Google Cloud Next ‘24 will take place in the spring, April 9-11, in Las Vegas.
Today we’re joined by Su-In Lee, a professor at the Paul G. Allen School of Computer Science And Engineering at the University Of Washington. In our conversation, Su-In details her talk from the ICML 2023 Workshop on Computational Biology which focuses on developing explainable AI techniques for the computational biology and clinical medicine fields. Su-In discussed the importance of explainable AI contributing to feature collaboration, the robustness of different explainability approaches, and the need for interdisciplinary collaboration between the computer science, biology, and medical fields. We also explore her recent paper on the use of drug combination therapy, challenges with handling biomedical data, and how they aim to make meaningful contributions to the healthcare industry by aiding in cause identification and treatments for Cancer and Alzheimer's diseases.
I recently had the opportunity to attend the Google Cloud Executive Forum, held at Google’s impressive new Bay View campus, in Mountain View, California. The Forum was an invitation-only event that brought together CIOs and CTOs of leading companies to discuss Generative AI and showcase Google Cloud’s latest advancements in the domain. I shared my real-time reactions to the event content via Twitter, some of which you can find here. (Some weren't hash-tagged, but you can find most by navigating the threads.) In this post I’ll add a few key takeaways and observations from the day I spent at the event. Key Takeaways Continued product velocity Google Cloud has executed impressively against the Generative AI opportunity, with a wide variety of product offerings announced at the Google Data Cloud & AI Summit in March and at Google I/O in May. These include new tools like Generative AI Studio and Generative AI App Builder; models like PaLM for Text and Chat, Chirp, Imagen, and Codey; Embeddings APIs for Text and Images; Duet AI for Google Workspace and Google Cloud; new hardware offerings; and more. The company took the opportunity of the Forum to announce the general availability of Generative AI Studio and Model Garden, both part of the Vertex AI platform, as well as the pre-order availability of Duet AI for Google Workspace. Nenshad Bardoliwalla, product director for Vertex AI, delivered an impressive demo showing one-click fine tuning and deployment of foundation models on the platform. /*! elementor - v3.12.2 - 23-04-2023 */.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=".svg"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block} Considering that the post-ChatGPT Generative AI wave is only six months old, Google’s ability to quickly get Gen AI products out the door and into customer hands quickly has been noteworthy. Customer and partner traction Speaking of customers, this was another area where Google Cloud’s performance has been impressive. The company announced several new Generative AI customer case studies at the Forum, including Mayo Clinic, GA Telesis, Priceline, and PhotoRoom. Executives from Wendy’s, Wayfair, Priceline and Mayo participated in an engaging customer panel that was part of the opening keynote session. Several other customers were mentioned during various keynotes and sessions, as well as in private meetings I had with Google Cloud execs. See my Twitter thread for highlights and perspectives from the customer panel, which shared interesting insights about how those orgs are thinking about generative AI. Strong positioning While Models Aren’t Everything™, in a generative AI competitive landscape in which Microsoft’s strategy is strongly oriented around a single opaque model (ChatGPT via its OpenAI investment) and AWS’ strategy is strongly oriented around models from partners and open source communities, Google Cloud is promoting itself as a one-stop shop with strong first party models from Google AI, support for open source models via its Model Garden, as well as partnerships with external research labs like AI21, Anthropic and Cohere. The company also demonstrates a strong understanding of enterprise customer requirements around generative AI, with particular emphasis on data and model privacy, security and governance. The company’s strategy will continue to evolve and unfold in the upcoming months and much more will be discussed at Google Cloud Next in August, but I liked what I heard from product leaders at the event about the direction they’re heading. One hint: they have some strong ideas about how to address hallucination, which is one of the biggest drawbacks to enterprise use of large language models (LLMs). I don’t believe that hallucinations by LLMs can ever be completely eliminated, but in the context of a complete system with access to a comprehensive map of the world’s knowledge, there’s a good chance that the issue can be sufficiently mitigated to make LLMs useful in a wide variety of customer-facing enterprise use cases. Complex communication environment and need to educate In his opening keynote to an audience of executives, TK introduced concepts like reinforcement learning from human feedback, low-rank adaptation, synthetic data generation, and more. While impressive, and to some degree an issue of TK’s personal style, it’s also a bit indicative of where we are in this market that we’re talking to CIOs about LoRA and not ROI. This will certainly evolve as customers get more sophisticated and use cases get more stabilized, but it’s indicative of the complex communication challenges Google faces in evangelizing highly technical products in a brand new space to a rapidly growing audience. This also highlights the need for strong customer and market education efforts, to help bring all the new entrants up to speed. To this end, Google Cloud announced new consulting offerings, learning journeys, and reference architectures at the Forum to help customers get up to speed. (To add to the training courses announced at I/O). I also got to chat 1:1 with one of their “black belt ambassadors,” part of a team they’ve put in place to help support the broader engineering, sales and other internal teams at the company. Overall, I think the company’s success will be in large part dependent on their effectiveness at helping to bring these external and internal communities up to speed on Generative AI. Broad range of attitudes A broad range of attitudes about Generative AI was present at the event. On the one hand there was what I took as a very healthy “moderated enthusiasm” on the part of some. Wayfair CTO Fiona Tan exemplified this perspective both in her comments on the customer panel and in our lunch discussion. She talked about the need to manage “digital legacy” and the importance of platform investments, and was clear in noting that many of the company’s early investments in generative AI were experiments (e.g. a stable-diffusion based room designer they’re working on). On the other hand, there were comments clearly indicative of “inflated expectations,” like those of another panelist who speculated that using code generation would allow enterprises to reduce the time it takes to build applications from six weeks to two days or those of a fellow analyst who proclaimed that generative AI was the solution to healthcare in America. The quicker we get everyone past this stage the better. For its part, Google Cloud did a good job navigating this communication challenge by staying grounded on what real companies were doing with its products. I’m grateful to the Google Cloud Analyst Relations team for bringing me out to attend the event. Disclosure: Google is a client.
Today we’re joined by Joon Sung Park, a PhD Student at Stanford University. Joon shares his passion for creating AI systems that can solve human problems and his work on the recent paper Generative Agents: Interactive Simulacra of Human Behavior, which showcases generative agents that exhibit believable human behavior. We discuss the use of empirical methods in studying these systems and the conflicting papers on whether AI models have a worldview and common sense. Joon talks about the importance of context and environment in creating believable agent behavior and shares his team's work on scaling emerging community behaviors. He also dives into the importance of a long-term memory module in agents and the use of knowledge graphs in retrieving associative information. The goal, Joon explains, is to create something that people can enjoy and empower people, solving existing problems and challenges in the traditional HCI and AI field.
Today we kick off our coverage of the 2023 ICLR conference joined by Christos Louizos, an ML researcher at Qualcomm Technologies. In our conversation with Christos, we explore his paper Hyperparameter Optimization through Neural Network Partitioning and a few of his colleague's works from the conference. We discuss methods for speeding up attention mechanisms in transformers, scheduling operations for computation graphs, estimating channels in indoor environments, and adapting to distribution shifts in test time with neural network modules. We also talk through the benefits and limitations of federated learning, exploring sparse models, optimizing communication between servers and devices, and much more. 
Today we continue our CVPR series joined by Kate Saenko, an associate professor at Boston University and a consulting professor for the MIT-IBM Watson AI Lab. In our conversation with Kate, we explore her research in multimodal learning, which she spoke about at the Multimodal Learning and Applications Workshop, one of a whopping 6 workshops she spoke at. We discuss the emergence of multimodal learning, the current research frontier, and Kate’s thoughts on the inherent bias in LLMs and how to deal with it. We also talk through some of the challenges that come up when building out applications, including the cost of labeling, and some of the methods she’s had success with. Finally, we discuss Kate’s perspective on the monopolizing of compute resources for “foundational” models, and her paper Unsupervised Domain Generalization by learning a Bridge Across Domains.
I am a Research Scientist at Google. Previously, I completed my Ph.D. at Boston University, advised by Professor and Dean of the College of Arts and Sciences Stan Sclaroff. My primary research focus is computer vision and machine learning. I interned at Amazon working with Javier Romero, Timo Bolkart, Ming C. Lin, and Raja Bala during the Summer of 2021. I interned at Apple AI Research during the 2019 and 2020 Summers where I worked with Dr. Barry-John Theobald and Dr. Nicholas Apostoloff. In 2018 I was a Spring/Summer intern at the NEC-Labs Media Analytics Department, where I worked with Prof. Manmohan Chandraker and Dr. Samuel Schulter. I graduated from Georgia Tech in Fall 2017 with an M.Sc. in Computer Science specializing in Machine Learning, advised by Prof. James Rehg at the Center for Behavioral Imaging. Recently, our work DreamBooth has been selected for a Student Best Paper Honorable Mention Award at CVPR 2023 (0.25% award rate) I have been selected as a Twitch Research Fellowship finalist for the year 2020 and as a second-round interviewee for the Open Phil AI Fellowship. I also appeared on the popular Machine Learning and AI podcast TWIML AI talking about my recent work on defending against deepfakes. While on a 5-year valedictorian scholarship, I obtained my B.Sc. and M.Sc. from Ecole Polytechnique in Paris, France. Additionally, I worked as an intern at MIT CSAIL with Dr. Kalyan Veeramachaneni and Dr. Lalana Kagal.
There are few things I love more than cuddling up with an exciting new book. There are always more things I want to learn than time I have in the day, and I think books are such a fun, long-form way of engaging (one where I won’t be tempted to check Twitter partway through). This book roundup is a selection from the last few years of TWIML guests, counting only the ones related to ML/AI published in the past 10 years. We hope that some of their insights are useful to you! If you liked their book or want to hear more about them before taking the leap into longform writing, check out the accompanying podcast episode (linked on the guest’s name). (Note: These links are affiliate links, which means that ordering through them helps support our show!) Adversarial ML Generative Adversarial Learning: Architectures and Applications (2022), Jürgen Schmidhuber AI Ethics Sex, Race, and Robots: How to Be Human in the Age of AI (2019), Ayanna Howard Ethics and Data Science (2018), Hilary Mason AI Sci-Fi AI 2041: Ten Visions for Our Future (2021), Kai-Fu Lee AI Analysis AI Superpowers: China, Silicon Valley, And The New World Order (2018), Kai-Fu Lee Rebooting AI: Building Artificial Intelligence We Can Trust (2019), Gary Marcus Artificial Unintelligence: How Computers Misunderstand the World (The MIT Press) (2019), Meredith Broussard Complexity: A Guided Tour (2011), Melanie Mitchell Artificial Intelligence: A Guide for Thinking Humans (2019), Melanie Mitchell Career Insights My Journey into AI (2018), Kai-Fu Lee Build a Career in Data Science (2020), Jacqueline Nolis Computational Neuroscience The Computational Brain (2016), Terrence Sejnowski Computer Vision Large-Scale Visual Geo-Localization (Advances in Computer Vision and Pattern Recognition) (2016), Amir Zamir Image Understanding using Sparse Representations (2014), Pavan Turaga Visual Attributes (Advances in Computer Vision and Pattern Recognition) (2017), Devi Parikh Crowdsourcing in Computer Vision (Foundations and Trends(r) in Computer Graphics and Vision) (2016), Adriana Kovashka Riemannian Computing in Computer Vision (2015), Pavan Turaga Databases Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases (2021), Xin Luna Dong Big Data Integration (Synthesis Lectures on Data Management) (2015), Xin Luna Dong Deep Learning The Deep Learning Revolution (2016), Terrence Sejnowski Dive into Deep Learning (2021), Zachary Lipton Introduction to Machine Learning A Course in Machine Learning (2020), Hal Daume III Approaching (Almost) Any Machine Learning Problem (2020), Abhishek Thakur Building Machine Learning Powered Applications: Going from Idea to Product (2020), Emmanuel Ameisen ML Organization Data Driven (2015), Hilary Mason The AI Organization: Learn from Real Companies and Microsoft’s Journey How to Redefine Your Organization with AI (2019), David Carmona MLOps Effective Data Science Infrastructure: How to make data scientists productive (2022), Ville Tuulos Model Specifics An Introduction to Variational Autoencoders (Foundations and Trends(r) in Machine Learning) (2019), Max Welling NLP Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics (2013), Emily M. Bender Robotics What to Expect When You’re Expecting Robots (2021), Julie Shah The New Breed: What Our History with Animals Reveals about Our Future with Robots (2021), Kate Darling Software How To Kernel-based Approximation Methods Using Matlab (2015), Michael McCourt
Checkout our recent Causal Modeling in ML webinar with Robert Ness here We’re collaborating with research scientist and instructor Robert Ness to bring his course sequence, Causal Modeling in Machine Learning, to the TWIML Community. Causality has become a very hot topic in the ML/AI space. In fact, it’s come up in a good number of my recent conversations, like this one with Zach Lipton. One of the challenges facing those interested in learning about causality in ML is that most resources on the topic are geared towards the needs of statisticians or economists, versus those of data scientists and machine learning engineers. Robert, an ML research scientist at startup Gamalon and an instructor at Northeastern University, has developed a series of six course modules on Causal Modeling in Machine Learning that are designed to be more practical and accessible for data scientists and engineers. He is teaching the course live to graduate students at Northeastern University, and through our new partnership Robert will also be hosting a study group via the TWIML platform, i.e. Zoom and Slack. The study group will provide TWIML enrollees some of the benefits of taking the course live. Robert will hold a weekly review session after each week of study in the sequence, will be available to answer questions via Slack, will personally grade submitted assignments, and will be available to assist with course homework and projects. The previous cohort of this course received great feedback from students: "I liked the course very much. Robert did a great job of reaching out to students to understand their background and interest in the course. It was great how he then continued to use what he learned about the students to make the course relevant and engaging to everyone enrolled. I also like how he made a connection to new paradigms. It was really nice to feel that the course is up to date. There are a lot of machine learning courses but this course was really special." "I loved the course. I learned a ton and Robert was very available to students. When I think about how much I would have paid at my university for a similar course, TWIML is a great value." To learn more about the courses and study group please feel free to peruse the FAQ we've prepared below. Note, enrollment is open through Thursday, September 17th. When you're ready to enroll, you can do so at Robert's AltDeep.ai web site. Be sure to also join the TWIML Community and the #causality_course channel on our Slack. Frequently Asked Questions What are the courses? The course sequence consists of six modules as listed on the AltDeep web site. They are: Model-based Thinking in Machine Learning. Lay the foundation for causal models by deconstructing mental biases and acquiring new mental models for applied DS/ML. Do Causality like a Bayesian. Continue your “mental refactoring” by developing a Bayesian mental model for machine learning. How to Speak Graph; or DAG that's a Nice Model! Become fluent in directed graphs and graph algorithms as a language of probability. The Tao of Do; Modeling and Simulating Causal Interventions. Learn to build your first causal generative machine learning model using a deep learning framework Applied Causal Inference; Identification and Estimation of Causal Effects from Data. Gain mastery of programmatic causal effect estimation. Counterfactual Machine Learning. Implement counterfactual reasoning algorithms in automated decision-making settings in industry. Are the courses free? How much are they? How are they sold? These are paid courses. Robert has put a ton of work into this sequence and will be providing TWIML learners with human support as they take the courses. Rather than selling the modules individually, Robert offers enrollment in the full Causal Modeling in Machine Learning Track for $1,199. This course sequence is designed to take you deeper into the practice of causal ML. How long will each course run? What is the level of effort expected? The course will run from September 10th to December 10th. On time commitment, if you just want to go through lectures and videos, then the time commitment is akin to a deep read of one paper a week. If you wanted to work through code examples and assignments, then more. The course is designed to give you a level of depth that suits you. Is there a discount for TWIML participants? Glad you asked. Yes, to kick off this partnership, Robert has agreed to extend a 15% discount to TWIML community members who register using the links above. I suspect this is the lowest price these courses will ever be offered for. Please use the discount codes TWIML2020FALL or TWIML2020FALL (for the monthly payment plan) to get the TWIML participant discount. Is TWIML paid as part of this arrangement? Yes, we are an AltDeep / Teachable affiliate and get a commission as part of the partnership. Whatever we earn through this relationship will help support our broader community and educational efforts. That said, we would never recommend a course we didn’t think was a good use of your time and a good value. How long will students have access to the course materials? After you enroll, you will have access to the materials indefinitely. How long will the course be open? While the course itself is fundamentally designed for self-paced study, with Robert running a live weekly study group, enrollment will be closed on September 17th. Will the weekly study group sessions be open to anyone? Robert’s weekly study group sessions are intended for enrollees and will assume that learners have at least gone over that week’s lectures at a high level. Is there a detailed syllabus? Yes, the syllabus will roughly follow that of Robert’s Northeastern course, which you can find here. What programming languages/frameworks are used in the course? The courses incorporate probabilistic programming concepts and use Pyro. From the Pyro web site:Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling. Where can I learn more about Robert and the course? Please check out my recent podcast, Causality 101, with Robert, here. What can I expect from the weekly online sessions? The weekly online sessions are live study group sessions presented by Robert, the instructor and author of the Causal Models in Machine Learning courses. At the sessions, Robert will present a summary of that week’s lecture and open the floor for student Q&A. What if I cannot participate in the weekly online sessions? The weekly study group sessions will be recorded and will be available to TWIML enrollees. What is the refund policy? There is a 30-day refund policy on the courses.
Sam Charrington: [00:00:00] All right, everyone. I am here with Jabran Zahid, Senior Researcher with Microsoft Research. Jabran, welcome to the TWIML AI Podcast. Jabran Zahid: [00:00:09] Thank you very much, Sam. It's a pleasure to be here. Sam Charrington: [00:00:11] Great to have you on the show. I'm really looking forward to digging into our conversation. To get us started. I'd love to have you share a little bit about your background and how you came to work at the confluence of biology and artificial intelligence. Jabran Zahid: [00:00:26] Oh, thank you very much for this opportunity to share with you what we've been working on here at Microsoft. By training, I'm an astrophysicist, and prior to coming to Microsoft a year and a half ago, I was working on understanding galaxy evolution and cosmology, largely trying to look at galaxies. The most recent stuff I was working on is look at galaxies and try to develop techniques to tie those galaxies to the dark matter distribution in the universe. I was interested in mapping the dark matter in the universe using the galaxies as little beacons of light in this sea of dark matter. It was a real privilege to be able to study astrophysics. It's a beautiful subject, but as I've gotten older, one of the things that started to become a higher priority for me, personally, was to be able to have a greater impact with the work I was doing, and by impact, I meant, that meant to me, the ability to impact people's lives on the day to day. While astronomy is a beautiful subject, it's not the most practical in terms of every people's day-to-day lives. It has important cultural impact. It doesn't have that impact on everyone's lives from the day to day, so I started to look for opportunities in one place that made perfect sense to look towards was industry, where not only is there interesting projects and interesting things being done, there's the opportunity and ability to have reach if you work at the right place that has the reach to individuals. When, one of my former colleagues who was also an astrophysicist himself, went to Microsoft Research. She told me about the position within the Immunomics group and told me a little bit about the details. It was just my bread and butter. It was a science project mixed with a very, very, if successful, could potentially have a huge impact could even change the world if we succeed at what we're doing in this project. That just really got me excited. Once I had learned more about the project and brought my skills to the table, it made sense. I was a good fit for the role, and I ended up at Microsoft Research at the end of January last year, six weeks before the pandemic hit. Sam Charrington: [00:02:26] Wow. Did you say Immunomics? Jabran Zahid: [00:02:30] That's what we call it.  It's immunology mixed with genomics basically. Our project essentially is, we're trying to map the immune system and the way we do that is we genetically sequence the T-cells of the human immune system, which we'll go into details on what that means. We're essentially trying to learn how to read the immune system from the genes themselves. Sam Charrington: [00:02:50] You mentioned that you started just before the pandemic. Did that influence the evolution of the project at all? Jabran Zahid: [00:02:57] Absolutely. We have been engaged in helping adaptive biotechnologies. The project I worked on, The Antigen Map Project, is a collaboration between Microsoft Research and Adaptive Biotechnologies. We've been helping them make diagnostics, and when COVID hit it presented a very unique opportunity for us to turn all of our efforts or a big fraction of our efforts, towards trying to diagnose COVID, which we did successfully. Adaptive Biotechnologies has a FDA authorized diagnostic on the market, which you could order today if you wanted to. COVID not only provided a very strong impetus in regards to the fact that it was just one of the most pressing human problems that we were facing, but also, it provided a unique opportunity to really bring together many, many aspects of our project. It's a great test case for understanding what we do in our project, what the antigen map is. It really accelerated our research. I anticipate that when we look back at last year, it will be seen as a watershed moment in our project, simply because of the accelerant that COVID was for our project. Sam Charrington: [00:04:15] Awesome. We'll dig into the machine learning aspect of the project and how you apply ML, but I think I'd like to hear a bit more about the biology and understand the problem that you're trying to solve at a more fundamental level. Immunomics, how does it work? What specifically are you trying to do with The Antigen Map Project? Jabran Zahid: [00:04:39] Yeah. Thank you for asking about that, Sam. I'm really happy to share this and I should, first of all, say that what I'm discussing now is a representation of 50 or so people's work. It's not just me who's carrying this out. This is a large collaboration. It really is an effort that spans multiple companies and really builds on decades of research and immunology. The human immune system is an amazing system. The adaptive immune system specifically is something that have started evolving about 300 million years ago. What the adaptive immune system is, is the part of the human immune system that has a memory. When you're a kid, you get sick with, let's say measles or something, your immune system will eventually will respond to that, and the adaptive immune system will retain a memory of having seen the measles. You will not get sick with the measles again if you've had it in the past, because the second your body gets exposed to the measles, your adaptive immune system is ready to go. It remembers what it looks like, what the pathogens from measles looks like, and it springs into action. A big part of that immune system is the T-cells. The T-cells essentially are floating around in your blood and then in some of your organs. When they recognize, they have a little receptor on their surface, and that's actually what we sequence is the T-cell receptor. We get a genetic sequence of the T-cell receptor and that genetic sequence encodes, more or less, the shape of that receptor. Like a key fitting into a lock, if that receptor's T-cell finds its lock that it fits into, if it finds the pathogen that it binds, it'll basically trigger an immune response. After that, immune response, the virus or bacteria is cleared from the body, it will remember. Those T-cells, that special T-cell, will stick around in your body for much longer than the rest of the T-cells. These T-cells, the adaptive immune system itself, is produced by a stochastic quasi random process in which different combinations of amino acids are put together producing a huge number of possible shapes for the T-cell receptor. That's where the complexity of the problem comes in, and that's where machine learning is required. The space of possible T-cells is something like 10 to the 15, and you yourself have hundreds of billions of these things in your body. Trying to use sequencing, which is what Adaptive Biotechnologies secret sauce is, their ability to genetically sequence a large number of T-cells. For an individual, I can tell you from a  vial of blood, you can sequence something like 500,000 to a million T-cells. and then we can read those in our computer and we have that for tens of thousands of individuals. You can imagine, now you have all these strings of letters floating around that represent T-cells. You want to read, what do those  letters mean, becase those T-cells encode. the memory of all the things you've been exposed to in your past If we can successfully read that book of your immune system we will be able to tell you all the things you've been exposed to in the past, and things you may be actively fighting which is the area we've been mostly focused on which is building diagnostics of things you're actively fighting now Sam Charrington: [00:07:53] A couple of questions based on your explanation. The first is, you mentioned that T-cell production is, in many ways random. The result of some stochastic process, so the 500,000 T-cells that you mentioned you might pull from a vial of my blood isn't some historical DNA record of 500,000 diseases . There's some number of diseases that have created T-cells but then there's a lot of randomness built-in Am I getting that right Jabran Zahid: [00:08:23] That's a wonderful question What it really is is the process by which these T-cells are produced is called VDJ recombination Essentially in your thymus different groupings of amino acids are inserted to create the T-cell receptor . Now, those are naive T-cells they don't know what their cognate pathogen is You just have a huge number of them This is the beauty of the adaptive immune system It just creates a huge number of them It's only when those random ones of the naive ones encounter a pathogen to which they latch so that key fitting into the lock, that's when they proliferate. They clonally expand they start reproducing themselves, and they retain a memory, and they become what are called memory cells This is a very simplified version of it but essentially what happens at that stage those will stick around in your blood far longer than the ones that are naive To your question specifically,when we draw the vial of blood we have a huge number of these naive cells The vast majority are naive cells actually but not all of them One to some fives of percent are these memory cells and discriminating between the memory and naive is one of the major challenges of our project and that's something we're very actively engaged in. Sam Charrington: [00:09:38] We'll come back to that in a second I want to ask another question I had about this. Maybe it is the same question, when you're doing the sequencing, is the sequence of proteins directly telling you the receptor or something about the receptor or is there something more fundamental about a T-cell that is coming out of the sequencing? Jabran Zahid: [00:10:02] That's great question The sequencing is what's known as the CDR3 region which encodes the receptor itself The sequence is just amino acids 20 different possibilities ofA's C's T's G's whatever but amino acids are encoding for proteins which then make up the structure of the receptor. In your mind, the picture you should have is literally the lock and key picture? That there is a structure to this receptor. It has to physically fit the pathogen that it's trying to bind in a way that it binds through a physical chemical bond, essentially. If the shape is right, then those two things will come together and it'll be a good fit, and that's when the immune response starts. Otherwise, nothing happens. Those cells just float around. Sam Charrington: [00:10:51] When you're using machine learning to distinguish between the random T-cells and the ones that are activated and have identified their pathogen, it's not within that protein sequence because the receptors are the same. Is there some other flag or characteristic that distinguishes the two? Jabran Zahid: [00:11:12] Generally, if one really wanted to get the ground truth, you would go and you would look at surface markers on this T-cell so not the receptor itself, but the T-cell that would help you distinguish between whether it's a memory or naive cell. The way we go about understanding that issue is by looking at other characteristics. One of the primary characteristics is what's known as the publicity of the T-cells. These T-cells have a range of generation, but probabilities of occurring in any individual, which is referred to as a generation probability. The probability is generated by this random process of VDJ, and for ones that have reasonably high generation probabilities, there's a good chance you'll see them in a number of individuals. One of the standard ways that we set up our experiments or are the methods by which we get and arrive at finding the collection of T-cells that are both memory and specific to a disease is, COVID's a great example. You have thousand individuals that have COVID, we've drawn their blood. We've sampled their T-cells. We compare that against a thousand people, a control sample that don't have COVID, and we simply ask the question, which T-cells appear in a statistically significantly higher frequency amongst the individuals that have COVID as compared to the individuals that don't. That gives you your set of T-cells that may potentially be T-cells that are actively fighting COVID and then you do all your machine learning and things like that from there. That's the starting point of our diagnostic procedure. Sam Charrington: [00:12:47] Got it. It sounds like a great application for some pattern matching. Jabran Zahid: [00:12:51] Yeah, absolutely. You can really imagine some of the tools of natural language processing coming into here, because these are literally just strings, but you got throw in a little bit of physics too, because they're encoding for physical properties of a thing. It's a complicated problem, which we're just scratching the surface of right now, but really have made enough progress that it's clear to us this is going to be something that's going to yield very important techniques for us understanding human health. Sam Charrington: [00:13:18] Before we dig into the technology aspect, I just want to hit pause briefly and ask you, you talked about your background as a astrophysicist and cosmologist, I did not hear doctor, biologist, any of that, but yet you're speaking very fluently about the biology. I'm just curious about that process for you coming up to speed in this domain and how you approached it, and if there's anything interesting from your background that you brought to this problem area? Jabran Zahid: [00:13:52] Starting out on this project I had a high school biology understanding of the immune system, and then whatever Wikipedia told me. I didn't have any sophisticated knowledge. That was the primary challenge. The tools that I had learned along the way for studying galaxies and cosmology were very applicable and translated, very straightforwardly to the problem, and the techniques, and the training, and the craft of doing research was something. I had been doing research for 20 years. I understood and had great mentorship that really gave me those skills, but the domain specific knowledge was the greatest challenge, and remains my greatest challenge to this day. You may say I speak of it fluently, but in my mind I feel that ignorance is outweighing the knowledge that I have on this subject. I appreciate you saying that, but the reality is that that's been the challenge. Basically, the way you approach a science problem is you got to start playing with the data, but at the same time, you got to contextualize that exploration of the data in what is known in the field. The way I've gone about doing that is of course reading a huge amount of the papers that have the 30 years of immunological, 40 years of immunological research on the subject, going to conferences when possible, that's been a little bit more difficult these days, but scientists have made huge strides in virtual conferences. One of the most important things is talking to my colleagues that are immunologists and just asking what, sometimes it may seem like a stupid question or a dumb question, but it's really just a reflection of my own ignorance and trying to fill that in. That's what's gotten me this far and I feel that filling in those gaps, combined with the techniques that we're developing as a team, using tools of machine learning, are really the things that are going to be required to take this project to the next level. Sam Charrington: [00:15:50] Let's talk about some of those techniques. You describe the setup, at least high level of this pattern matching problem. You've got your folks with an identified disease. You've got your control group. You take a bunch of T-cells from all of them, and you're trying to figure out which T-cells are more significantly evidenced in your exposed group. What machine learning approaches do you apply to a problem like that? Even the step before that, what is the data collection, and so many of these are supervised techniques, labeling process look like for this kind of problem? Jabran Zahid: [00:16:32] We can take COVID as an example, it varies from disease to disease. COVID encapsulates much of the process, which is, in some sense, a process that's ubiquitous in any machine learning process. You collect your data, which is drawing vials of blood, and for COVID, the way we did that was Adaptive has partners throughout both industry and academia, so the ground truth, oftentimes, not always, but most of the times, was taken as a PCR test. If someone had a PCR positive test, we know this person has the virus in their body, and therefore they not only exposed or infected. Let's draw their blood, that's where the labels are typically coming from. There are other subtleties involved, which we don't need to go into. Then, you get your label data, and now we have a huge number of... Sam Charrington: [00:17:25] If I can jump in quickly there. These PCR tests aren't perfect. They have whatever the false positive rate is for the PCR test, false negative rate. Do you try to adjust for that in the process, or either by some kind of quorum technique, multiple tests or mathematically somewhere? Jabran Zahid: [00:17:48] Yeah. Different ways, depending on the circumstances in which we address that issue. Oftentimes what we see is that these false  negatives, which are somewhere at the level of 5% or so, I think that's typically the number. You see them as outliers, but we have large enough samples and that's  just part of the game. There's going to always be... Sam Charrington: [00:18:06] Another source of noise. Jabran Zahid: [00:18:08] Yeah. There's always noise and you just deal with it and it depends on the circumstances and how it's affecting your system, so it's certainly an issue, but we are well equipped to handle that. Sam Charrington: [00:18:16] Okay. Jabran Zahid: [00:18:17] Yeah. Then we have our label data. In any machine learning project, one of the things you really want to do next is, once you collect the data, is determine your features. At the highest level, our features are these public sequences, the sequences of these T-cells that appear in multiple individuals in a statistically higher frequency in the individuals who have whatever endpoint we care about. In the case of COVID, people who have COVID versus individuals in our control sample, and then we just count those sequences. How many of those are occurring in an individual, and then do a simple logistic regression model, and that gets you pretty far. It's impressive how far that can get you. Just like any machine learning application, usually the simplest models gets you 90% of the way there. You have to start with the simplest models because you have to have a baseline, and you can interpret them much more easily, so that's where we're at in terms of our diagnostic. We have the simple model that we can submit to the FDA and it has been authorized by the FDA, but of course you want to extend on that. We have this enormous data set, and how do you push that further? We don't care about just whether you have COVID or not. We want to know other things that we can learn from this data. One interesting application is in addition to these tests where we just sequenced what we call the repertoire, so the T-cells. There's laboratory experiments in which we take the actual pieces of the virus of COVID and put them in test tubes and throw a bunch of T-cells at them and see what sticks to what. One of the issues with the diagnostic approach that I described is you see that these T-cells are statistically occurring in a higher statistical frequency in the cases versus the controls, but you don't really know for sure whether they're specifically  attacking COVID. These laboratory experiments allow us to make that test, which is take those pieces of the virus, when the virus enters your body, the way your immune system responds, as it chops up the virus and then presents it essentially on a surface of a cell to the T-cell to come along and grasp onto it. There's a presentation step and that presentation is usually about 10 or so amino acids of the virus. It gets chopped up. We chop up the virus, throw it in a test tube, throw a bunch of T-cells at it, figure out which ones stick and then ask the question: of the ones that are sticking, how many of these do we see in our diagnostic? In the public cells that we comprise our diagnostic. The upshot of all of this is now we have the ability to both know that the T-cells that we have in our diagnostic are attacking COVID, but not only that, but what they are attacking in COVID. What part of the virus are they attacking. Sam Charrington: [00:21:06] Meaning which 10 protein sequence is the receptor latching onto in particular? Jabran Zahid: [00:21:13] Exactly. 10-ish. That's just a rough number. One upshot of this is we can distinguish now between whether this T-cell is hitting the spike protein, which is the protein that encodes the spikes on the surface of Coronavirus or the envelope protein, which creates something else. If you follow the vaccine development, and one thing you note is that almost all the vaccines, certainly all the ones that have been approved in the United States, all target the spike proteins.  They don't introduce the  whole Coronavirus. They just cut out the spike protein and whether it's mRNA virus where they just indirectly introduced that RNA into your body or whether it's something like the Johnson & Johnson, which they attach it to a vector, like a common cold virus and they attach it. In any case, that's what your body is building your immune response up to, and the fact that we can discriminate between what the T-cells are responding to means that our diagnostic has the power and we're working on this very diligently, to discriminate, whether you have had a vaccine or a natural infection. That has important implications for things like trying to understand people who get reinfected after a vaccine, for example, and vaccine manufacturers will really care about that. COVID whether we like it or not, it's going to be here for a while, so this is really providing an ability for us to begin to understand and dissect the disease in a way at level of resolution that hasn't been previously possible. Sam Charrington: [00:22:49] I'm not sure I'm following that. How does this technique allow you to differentiate between folks that have T-cells because they were vaccinated versus the naturally occurring virus? Before you do that, I love that you refer to the set of T-cells that a person has as a repertoire, like it's a certain set of skills. Jabran Zahid: [00:23:15] That's what the field refers to them. That's a bit of jargon, but I love that too. I'm glad you picked up on that. That's cool, right? That's that's the technical term for it. Again, the diagnostic that we build works by counting up the T-cell response. You count up the different T-cells and now that we think are specific to COVID. Now what we can say is these T-cells are specific to this subset of all of our T-cells that we think are in our diagnostic. Let's say we have 10,000 T-cells in our diagnostic, some fraction of those are attacking the spike protein, and some fraction of those are not attacking spike they're attacking the envelope, and the spike protein is a small fraction of the genome of Coronavirus. There's something like 10,000 amino acids and the spike is only a few hundred to a few thousand. I don't remember the exact number, but if we know which T-cell is attacking, what, in people who have vaccination. We only observed those T-cells that are targeting spike in those individuals. It's actually quite, it's amazing how robustly we can do that. Whereas someone who has a natural infection will have a response that covers a much broader range of the T-cells. Sam Charrington: [00:24:26] It's really speaking to both the granularity of the problem, and I'll elaborate on this in a second, but also the diversity of T-cells that you are speaking to, it's not the case that there is a Coronavirus T-cell and there's one and only one. It's that there's a family of T-cells that attack different aspects of the Coronavirus, and maybe even multiple that attack the spike, and the population that someone has of each of the, possibly many in this family, can tell you a lot about how they acquired the virus. Jabran Zahid: [00:25:04] Absolutely. That's partly where the machine learning comes in. How the immune response was triggered and that's really where that machine learning comes in. Finding those deep, deep patterns encoded in those receptors. What makes these T-cells specific to COVID, and what's similar about these two that we know are hitting the spike protein and things like that. That's really where. The next step of the project really requires this very sophisticated modeling. A problem we haven't cracked by the way, despite many, many, many different attempts, so it's a very difficult problem and can only be addressed with the tools and sophistication of machine learning algorithms. Sam Charrington: [00:25:52] We started out talking about logistic regression and the supervise problem where you've got the test results as labels, and now you're starting to talk about things that sound like clustering and unsupervised types of problems. Is that the general direction that you're heading with this kind of analysis? Jabran Zahid: [00:26:11] Absolutely. The unsupervised techniques provide a means for clustering. For example, dimensionality reduction. The standard approaches is that one would throw at any problem with very, very high dimensionality and large parameter space, but that's only the first step. The real question, the heart of it all, is we want to read the immune system. What we call the antigen map is I give you a T-cell and its receptor and you tell me what antigen that T-cell will bind to because it's only then that we can read off your immune history. When we draw your blood, we may know this T-cell is a memory cell, but we won't know if it's a memory cell to the common cold or to Coronavirus or to some bacteria. We won't know that just from looking at it, we'll have to use the sequence and understand how that sequence encodes the information about what it has attached to in the pas; what it's bound to in the past. That's where the machine learning really comes in and you can imagine the complexity of the problem. We're literally trying to read the immune system in a way that allows us to read your immune history. It's just a bunch of strings when you look at it on this computer screen, and so the challenge is going from that bunch of strings on your computer screen to a physical mechanism and physical system and the physical properties of that T-cell that really give us the information about what it's binding. Sam Charrington: [00:27:47] You've tried a lot of things and have a list of things that haven't worked. What are some of those things? Jabran Zahid: [00:27:54] That's a great question. It's pretty interesting because a few researchers have come onto this problem since I have and everyone treads the same path in some sense, which is, you come in and you say logistic regression, how are you still using logistic regression to do this? That's that naivete that's required to really try some interesting, crazy things in science. One of the obvious things is how far could we carry this analogy of we're trying to read the immune system. One of the things I tried was to take BERT, which is a well-known natural language  processing model. It's called a transformer. It's a model that's essentially it's used in natural language processing tasks for questions and answers on a bot or translation. It's a very  multi-faceted tool. Natural language processing is a field in which machine learning has really matured and they have techniques and approaches by which what they called transfer learning, where you can take a model trained in one domain, this happens in image analysis as well, but you take the model trained in one domain, let's say all of the web pages of Wikipedia, and then apply it in another domain. You do this training in this huge data set, and then you fine tune it to your specific problem. It works to varying degrees depending on the nature of the problem, but that's besides the point. The question I asked is, can we just use this model, this transformer type, natural language processing model to read the sequences and see if we can get somewhere? It turns out it just doesn't work. It doesn't work at least in the way that we set it up. It's not surprising. These sequences and the analogy breaks down between natural language and biophysics and biochemistry. Understanding where that breakdown happens is one of the most critical questions to really figuring out what the right set of algorithms and the right set of constraints and the right data. In some way,s the right setup of the problem, that's one of the most difficult tasks and machine learning is setting up the problem appropriately. Hopefully these failures will help guide us to the path that's going to lead us to success. Sam Charrington: [00:30:14] Are there specific things that you can share that you learned in attempting to apply BERT to this problem or specific places that it broke down? Jabran Zahid: [00:30:24] I didn't push it too far. I would say that the one thing that immediately stood out to me was it worked to a degree. At first, I was very excited. I was like, "Wow, this has predictive  power on specific tasks," and so, "Hey, let's publish this or let's use it," but it turned out, BERT is something like a hundred million parameter model. It's a really, really huge model, which, unless you have a lot of data it's not really justified. The reason it was working is basically the way BERT is designed, as I understand it is, typically you have an embedding, a layer that does all the embedding, and then you have this layer that, you attach to it on the end, that does essentially the decoding slash whatever task you care about and more or less most of the interesting stuff was happening in those surface layers. You could really reduce the model down, take away the 700 odd hidden layers and still get the same level of accuracy, and then in fact, what that led me to realize was there's actually even simpler models like Random Forest, embarrassingly, that will get you that same level of accuracy that was in BERT, and one of the lessons I honestly took away from that was don't rush to the most complicated models, start with simple models and build up from there. That's what we've been doing. One of the ways we've been approaching this problem and one of the things we've learned by going to this approach is that, you have these strings of amino acids, you cannot just substitute in random positions, new amino acids, and think that it will bind to the same thing. The places where substitutions can happen in the amino acids is very specific places and only changes from very specific amino acids to different amino acids, and this of course begs the question why is that the case? We suspect this has to do with the physical properties of being amino acids themselves. Some are interchangeable. This is known because of the physical, chemical properties of these amino acids have been measured in the laboratory. Putting that physical picture together, which came into sharp relief when we started by using complex models, but understood that actually simpler models can get us there has really guided us on the path of understanding the problem. It's not just  enough what we're dealing with human health. It's not just enough to predict things. We need to understand why those predictions are happening the way they are, otherwise we run a serious risk of producing essentially a black box, and we found in human health, often you have confounding signals. You think you're seeing one thing, but it's actually being caused by something completely unrelated, and when you don't fully understand what your model is doing, you can fall into those types of traps. Sam Charrington: [00:33:15] With regard to BERT, it sounds like you are using, you mentioned a transfer learning, sounds like you were using a pre-trained BERT model and trying to fine tune. Did you also try to train from the ground up? Jabran Zahid: [00:33:31] Yeah, we did. The thing that we took from BERT was the unsupervised training step, which was, what BERT does is it would take a sentence and it would mask out random words in that sentence and then try to reproduce what was masked out and that's unsupervised because it... Sam Charrington: [00:33:47] It would seem to preserve some of the positionality that you require for proteins? Jabran Zahid: [00:33:54] Exactly. We would mask out random amino acids and then try to reproduce the sequence on the other side, you start with that unsupervised task. That'show you do the pre-training, so to speak, and then you slap on a layer for a classifier or whatever your specific task is, we definitely tried that and it was successful, as I said, but what we came to learn was something like a Random Forest is a lot easier to interpret. What is it that's what we're learning and through that procedure we learned that, "Oh, it's actually positional information and very specific types of substitutions that are allowed." It was a lesson that I've learned many times you're doing machine learning, which is don't go to the complex models. Don't go to what's sexy necessarily right away, unless it's warranted, but we also follow our passions, and sometimes you see the new shiny new model and you want to try it. BERT may make it easy, and natural language processing community in general makes it very easy to take models out of the box and use them. Something I think that the rest of the sciences and certainly immunology would benefit greatly from is making progress in that way as well. Sam Charrington: [00:35:01] Awesome. Tell us about where you are with this project relative to where you want to be and what the future path is? Jabran Zahid: [00:35:11] Yeah. We have made significant progress in the last year, driven by COVID not only the fact that it's one of was, and remains one of the greatest immediate challenges facing humanity, but also it provided an accelerant for us to bring together all our techniques that we've been working on. I described, for example, these laboratory techniques where we throw a bunch of T-cells at the pieces of the virus, bringing that together with our diagnostic approaches has demonstrated  this application I was describing for discriminating between vaccine versus natural infection, et cetera. We really brought together a lot of the different techniques and demonstrated the power of these techniques. Not only to ourselves, which is the one of the most important things, but to the world, by having these diagnostics that are authorized by the FDA, and I may be wrong about this, but I'm pretty confident that these are some of the very first, if not the first, COVID diagnostic machine learning diagnostic approved by the FDA. That in and of itself is an amazing accomplishment, and there's a lot of back and forth on how do you do that and things like that validated, et cetera. That's an interesting  side note. We made enormous progress. The ultimate goal is the antigen map. As I've described in the beginning, which is this ability to take any T-cell and understand what it's meant to target. My hope is that five years from now, when we look back at this moment, we'll see it as a watershed moment. We will have arrived at a firm understanding of whether that is even possible, whether the antigen map is possible because the reality is, we often refer to it internally as a moonshot. It's a high risk, high reward venture, but if we are able to succeed in this, we will have the ability to understand immune risk to the human health in a way that humans have never had before. It will impact therapeutics, diagnostics, every aspect of how we treat human health. I'm excited to be a part of this. I hope we succeed. I hope we are able to provide this great benefit to the world and we'll see if we can succeed or not. That's the question that we've set out to answer and hopefully in five years, we'll have an answer to that question. Sam Charrington: [00:37:30] Awesome. Well, Jabran, thanks so much for doing the work, but also coming on the show to share a bit of it with us. Jabran Zahid: [00:37:38] Sam, thank you so much for this opportunity to share the amazing work we're doing on our team. Thank you. Sam Charrington: [00:37:43] Thank you. All right, everyone. That's our show for today to learn more about today's guest or the topics mentioned in this interview, visit TWIMLAI.com. Of course, if you like what you hear on the podcast, please subscribe, rate and review the show on your favorite podcatcher. Thanks so much for listening and catch you next time.
Sam Charrington: [00:00:00] Welcome to the TWIML AI podcast. I'm your host, Sam Charrington. Hey, what's up, everyone. Before we jump into today's interview, I'd like to give a huge thanks to our friends at Microsoft for their continued support of the podcast. Microsoft's mission is to empower every single person on the planet to achieve more, to inspire customers to reimagine their businesses and the world. Learn more at Microsoft.com/AI and Microsoft.com/innovation. And now, onto the show. All right, everyone. I am here with David Carmona. David is the general manager of artificial intelligence and innovation at Microsoft. David, welcome to the TWIML AI podcast. David Carmona: [00:01:01] Thank you, Sam. Pleasure to be here with you. Sam Charrington: [00:01:04] It is great to have you on the show. And I'm looking forward to digging into our conversation, which will focus on AI at scale and large scale language models, and a bunch of really interesting things you're doing there. Before we jump into the topic, though, I'd love to have you share a little bit about your background and how you came to work on all this cool stuff. David Carmona: [00:01:25] Yeah. Well, I've been in Microsoft for almost 20 years, 19 and a half. Sam Charrington: [00:01:30] Wow. David Carmona: [00:01:30] So, almost getting to that magical [laughs], magical moment. And it's funny because my beginning with Microsoft, I was [inaudible 00:01:37] to Microsoft. That was 20 years ago. So, that was the big Windows moment. Right? But actually, I didn't come to Microsoft because of Windows. I came to Microsoft because of, … At that time, my favorite product, which was Visual Studio. So, I was a developer. I still am a developer. I will always be a developer no matter what I am. Sam Charrington: [00:01:57] [laughs]. David Carmona: [00:01:58] And for me, working in Visual Studio has been like my entire career. So, [inaudible 00:02:04] I started with AI and, and VR probably way too early [laughs]. That didn't end well. So, I ended in traditional development. And I had a ton of fun with that. And I, when I move … I'm originally from Spain. When I moved here to the US [inaudible 00:02:17], I worked in, in, in Visual Studio. So, I ended managing the business for Visual Studio and all our tools like .NET and, and all of that. It was a super fun time because it was that big transition in Microsoft to open development. So, I was lucky to do things like launching TypeScript. Right? Or- Sam Charrington: [00:02:36] Oh, wow. David Carmona: [00:02:36] … open-sourcing .NET or making it cross-platform, or releasing Visual Studio code. Right? So, super fun stuff. But then like five years ago, this AI thing started to become super real. So, [laughs] I was, I was offered to lead a new team in Microsoft, focused on the business, on creating a new business for AI. And I, I didn't think about it twice. So, yeah, that's where I am. So, it's interesting … So, as you can see, my career is always like, between technology and businesses. I think … I, I mean, knock on wood, but I think I'm in, in that great balance right now [laughs]. So, I have both. I'm super fortunate to have both because I work, connecting with Microsoft research and, and the entire organization of technology and research in, Microsoft. My goal, my team's goal is really to connect that with the business. So, we work on … We define it as themes, like bigger themes of innovation in Microsoft. And then we connect those themes to actual real products and technologies that we can take to market. it's super cool. And one of those things … We have many, but one of them … I think like, probably the start of the themes is, is AI at scale. Sam Charrington: [00:03:46] Okay. And so is the role primarily focused on taking innovations that are happening in research to existing Microsoft products? Or is it more focused on creating new business opportunities? Or is there some balance between the two? David Carmona: [00:04:01] Yeah. It's a balance. So, we have … The way that we work in Microsoft on our framework for innovation is based on Horizon. So, we have … We refer to them as the three [inaudible 00:04:10] Horizon. Right? So, we have Horizon 1, two, and three. Three, Horizon 3 are the like, the moonshots, right? Like, longer-term new business creation, new category creation for Microsoft. A lot of that is, driven by curiosity, in most cases, in research. So, we leave a lot of room for researchers to work on those themes. But then we go all the way to Horizon 2, which are things that are really about opening new opportunities or creating new opportunities for existing products. And you can go to Horizon 1 even, which is extending existing products. Right? So, making them better. So, we work in that, in that balance, between the three. Sam Charrington: [00:04:52] Nice. And so you mentioned AI at scale as being one of your big focus areas. What exactly does that mean at Microsoft? David Carmona: [00:05:00] Yeah. So, AI at scale, I mean, we, we named that as a new category. So, it's not that it's a product or anything like that. So, it's how we refer to what we believe is a huge change in the way that we are going to see people developing AI. And it's driven by m- many different things, many different trends and technology breakthroughs. But I think the most important one is this concept of massive models and, and what they mean. Right? So, this, this ability to create now, like, this huge [laughs], massive models with billions of, of parameters. And beyond the technical achievement, the reality is that those massive models are opening new opportunities that go beyond the technology and get into the business. Right? So, we can discuss it today. So, [inaudible 00:05:47] … So, we can spend a lot of time on the technology behind it. And then- Sam Charrington: [00:05:47] Mm-hmm [affirmative]. David Carmona: [00:05:47] … we can, we can focus a little bit on, "Hey, but what does it really mean?" So, how is this going to change the way that any company can develop AI? Right? And, and [inaudible 00:05:59] it's really interesting. And then there's a whole ecosystem around this concept like, that, that you need to, for example, train these models, you need an AI supercomputer. So, that's another piece of the puzzle, right, for AI at scale. Sam Charrington: [00:06:14] So, we talk a lot about the increasing size of models and, you know, particularly in the context of NLP and language models. But help us contextualize that. You know, we throw around, you know, millions of parameters and, you know, hundreds of layers, and things like that. How is it shaking out? Or how do you think of this progression towards larger-size models? David Carmona: [00:06:41] Yeah. I think in, in a sense, you probably remember [laughs] [inaudible 00:06:45] ImageNet moment for, [laughs]- Sam Charrington: [00:06:46] [laughs]. David Carmona: [00:06:47] … for [inaudible 00:06:48] learning. Right? So eh- Sam Charrington: [00:06:49] Uh-huh [affirmative]. David Carmona: [00:06:49] That was, … I mean, [inaudible 00:06:51] many people referring to this moment, like the ImageNet moment for NLP. Right? So, because we get to a point that there's something that allows us to increase the size of the model. So, we go for it. And then we see, "Hey, wait a second. This is getting better. So, the more parameters that I add, the better that this is getting." Right? So, that was the moment in ImageNet with ResNet, for example. Right? That we added so many layers, and, "Hey, this, this image classifier is, is working so much better." So, we are kind of in the same place, but at a totally different scale, right, or order of magnitude. Right? For example, that model, the ResNet model for ImageNet, I think had like 60 million parameters. I mean, a completely different domain. That was computer vision. Now, we're talking about billions of parameters. And, and, and when we see progression, it's being like, very [laughs], very quick. So, [crosstalk 00:07:44]- Sam Charrington: [00:07:46] Mm-hmm [affirmative]. David Carmona: [00:07:46] I don't know. GPT-2. So, the first version was like 100 million parameters. Then, I think BERT was like 300. Then you have Turing NLR. I think it, at that time, was like 1.2 billion. Then you have GPT-2, 1.5. Then you have Turing NLG. That was 17 billion parameters. That was last year [laughs]. We're not talking months ago. That, … We're not talking about, about years ago. And then we had just, just a couple of months after that, GPT-3 with 175 billion [laughs] parameters. Right? So- Sam Charrington: [00:08:18] Yeah. David Carmona: [00:08:18] Every step is 10 times [laughs] [inaudible 00:08:21]. It's a new order of magnitude [crosstalk 00:08:22]- Sam Charrington: [00:08:22] Mm-hmm [affirmative]. David Carmona: [00:08:22] … which is super impressive [laughs]. Sam Charrington: [00:08:24] So, we've kind of transitioned from … In the domain of Vision, you know, we would always talk about the number of layers as an indication of the size and complexity of the model. And now, when we talk about these language models, we tend to talk about parameters. What is that? And how does that tie to the architecture of these models? David Carmona: [00:08:45] Yeah. I mean, behind … It's not that we didn't want to build these massive models before. It's that we couldn't [laughs]. That's the reality. Sam Charrington: [00:08:52] Mm-hmm [affirmative]. David Carmona: [00:08:52] And I think the big breakthrough to really enable these, these sizes of the model is the transformer architecture. And yeah, definitely a lot of say about that. But, yeah, the transformer architecture, it has … I mean, it's also based in layers. In this case, they are symmetric. So, it scales very well because it always has the same number of inputs and outputs. So, you can stack up all the layers. And, and it was a huge change because that broadened the blocker that we had before with scaling these NLP models, is that we were using techniques as, as you know, as recurrent neural networks. Right? Like, LSTM and things like those. And those things are great because it allows you to connect, for example, in a text, the words between words. You can have some kind of memory. So, a word right now can be impacted by words in the text before. Right? And, and you keep that memory. The problem is that the way that we were doing that was very sequential. So, and I mean, by definition, a recurrent neural network taking the previous step as an input. So, you need to finish that step to go to the next one. So, that impacted on the scalability of the models. So, I think with the transformer architecture, we kind of broke that ceiling because now, suddenly, we don't have an architecture that is [inaudible 00:10:05]. So now, in this case, it's all in parallel. We take the, all the inputs in parallel and with some techniques, in particular, … I think the most important one [inaudible 00:10:16] I would highlight too. But definitely, for that work, two things have to happen. One, it's the concept of the positional embedding, so how every word needs to get an input in the, in the model, the position somehow, a flag of an indication of where that word is because that's [laughs], of course, important [laughs]. It's very important- Sam Charrington: [00:10:36] Mm-hmm [affirmative]. David Carmona: [00:10:37] … Where a word is in a sentence to understand the sentence. But then the second thing is this concept of attention or, in this case, self attention, which is a way to kind of replicate that concept of connecting or changing the meaning of words, depending on the words that were happening before, or even in the case of bidirectional [inaudible 00:10:56] words are happening after that. Right? And that's, that's a whole new construct applied to NLP that is proving to be, not only super scalable, but even, performing even better [inaudible 00:11:08] the traditional approach to NLP. Sam Charrington: [00:10:43] Hmm. And so how should we think about how attention works in these kinds of models? David Carmona: [00:10:43] So, I, I, I mean, it's a very simplistic view, but I like to think of it … Because attention is not new. So, we've been using attention- Sam Charrington: [00:10:44] Mm-hmm [affirmative]. David Carmona: [00:11:23] … in, in others … Even in other domains. Right? Like, vision or i- image generation, or … I mean, the most simple example that I use all the time is movie recommendation. Right? So, how do you know if, if a user is gonna like a movie or not? So, the way that you do that is that you take a vector defining the movie in, you know, in any dimensional space. And then you take another vector defining the taste of the user. And then you multiply those vectors, right, to get the distance, the, like, the cosine distance or similarity between those two vectors. And that's an indication how much the user will like the movie. That's that's attention, but in the case, of two different entities. Right? My taste and the movie. In this case, self attention is like doing similar, but with a sentence with itself or with a text with itself. Right? So, but in this case, the w- the attention that we want to measure is the connection between the words. So, how one word is related or connected to the rest of the words. And at the end, you're gonna have like, a heat map, right, so, where every word is connected in some manner with other words. So, if you're saying, "The kid hit the ball, and he was happy." So, he will be super connected with the boy. Right? So, I mean, super simple because at the end, you have multi [inaudible 00:12:42] attention blocks. And, and then you have all these different layers. It's like trying to understand [inaudible 00:12:49] networks. After three layers, you're lost [laughs]. You are completely lost on [crosstalk 00:12:53]. Sam Charrington: [00:12:53] [laughs]. David Carmona: [00:12:53] But I mean, that's the core principle of it. Sam Charrington: [00:12:56] Mm-hmm [affirmative]. Part of what's interesting here is that, you know, we've transitioned from an approach to NPL that was, like you mentioned … Prior to capturing positionality, you know, we'd take a bag of words of things that was at document level, didn't capture where those words were, didn't really do a good job of capturing the relationships, but we're just looking at the statistical properties of a document or sentence or- David Carmona: [00:13:22] Yeah. Sam Charrington: [00:13:23] … corpus to now looking at the relationships between all of these entities that make up language. Is that part of the power of this [crosstalk 00:13:31]? David Carmona: [00:13:32] Yeah. Yeah. E- exactly. I would say that and then the concept of, of training these models with self supervised algorithms. Right? So- Sam Charrington: [00:13:42] Mm-hmm [affirmative]. David Carmona: [00:13:42] [inaudible 00:13:43] supervised training. I think that's the other thing that, that … It was the explosion in all these models, is how now, … Because this scales amazingly well, now, you can afford training these things with huge amounts of data. Like, for example, the entire internet [inaudible 00:14:00] kind of. Right? Which is kind of what we're doing with this model. So, we take the text on the internet. And then depending on the model we can go in, in a little more detail in there if it's a [inaudible 00:14:10] model or representation model. With smart techniques, you take that. You take … You mask that text, so the, so the model can try to guess either the missing words or the words that are happening after a given text. And by training that with that input, that you are almost not touching at all. Right? So, it's all self supervised, [inaudible 00:14:31] and, and all of that. The model can actually learn very complex concepts and relationships. Sam Charrington: [00:14:37] Mm-hmm [affirmative]. You mentioned different types of models. Elaborate on that a bit. David Carmona: [00:14:41] Yeah. So, I think, the way that … And, and we can talk that more about because at the end, these same concepts can apply beyond NLP. But if we focus just on NLP, they are the main families of models. One is that I think people are super excited also because of Turing NLG and because of GTP-3. Those models are generation models. So, they are a natural language generation model, so NLG. And in that case, what … The way that that model is trained, they are called ultra aggressive models because you train the model with the, a lot of text. But then you train it to guess what is gonna happen, what text goes after a particular text. Right? So, they generate … They are super good, generating text, like guessing the end of a sentence or guessing an entire document, or guessing how a movie will, will end, or whatever [laughs] we want to, to guess or [inaudible 00:15:37] text, things, things like those. And that's one big family of models. You have em … Again, like, GTP-3 is an example of that. Turing NLG is an example of that. And then you have another family, which is more about representation, so natural language representation models. And the goal of those is more like, representing the text. So, in that case, the architecture that is, that is used, instead of trying to guess … Or the way that it's trained. Instead of trying to guess what's next, what we do is that you mask some words in the text. And then the model will try to guess it. And they are called bidirectional because in that case, not only they look at what happened before a certain moment, but also after that. So, they will look at the words before and after a particular word to understand the context there. Right? So, those are really good to map like, text to representation, then I fine tune to do whatever I want. Right? So, from super basic sentiment analysis to question answering, or whatever I want to fine tune the model. So, those are like, the two big blocks. Then I like to go a little bit deeper 'cause for each of them, they are two other families that I think are very relevant to understand, which is how, … So, then there's more than one language in the world [laughs]. Right? So- Sam Charrington: [00:16:58] [crosstalk 00:16:59]. David Carmona: [00:16:59] You need to address that. Right? So, in particular, where you are creating real products. So, we are using these models in, in Office, for example. Office is working [inaudible 00:17:07], I feel like, 100 languages. So, imagine doing this for every language would be very [crosstalk 00:17:13]. Sam Charrington: [00:17:13] Mm-hmm [affirmative]. David Carmona: [00:17:13] And that would be the traditional approach of, of doing this. So, we, … And, and Microsoft has been a big believer on the need of doing this thing in an universal way. So, that creates a new family of models that are universal models, right, universal language models. And in the case of Turing, for example, we have both. We have a regular model. And then we have the universal language representation, ULR, so T, Turing ULR, universal language representation. And that is super powerful 'cause what allows us, for example, in, in Microsoft, is to implement features in Word using this, like, … I don't know. Em, semantic search. We don't need to train that feature or that model for every language. We just need to fine tune it for one language. And then you have the feature for free in 100 languages. Right? Sam Charrington: [00:18:03] [crosstalk 00:18:04]. David Carmona: [00:18:03] Which is super cool. So, very, very recommend them to use those models for that. Th- this was, by the way, for people who want to go deeper. There's a paper that I like a lot is [inaudible 00:18:14] 2017 where it explains this, this concept. And, the example that it uses is how you learn math. Right? So, you look at … Well, not me. I wouldn't consider me bilingual. I speak Spanish and a little bit of English, but [laughs] my kids are truly bilingual. And when they learn math, they don't need to learn that two plus is equal four in English, but then [Spanish 00:18:39] in Spanish. Right? So, they just need to learn math once. And then- Sam Charrington: [00:18:43] [crosstalk 00:18:44]. David Carmona: [00:18:43] … they can apply that in different languages. So- Sam Charrington: [00:18:46] Mm. David Carmona: [00:18:46] It's the same thing for models. So you can focus on teaching or training the core concepts, fine tuning for the concept. And then you have it for free in all the languages. Sam Charrington: [00:18:56] Mm-hmm [affirmative]. Yeah. [inaudible 00:18:57] I wanna dig into transfer learning and multitask. These are all things that are coming to mind as you're explaining this. But before we do that, we started out talking about language models as an example of these massive models that require a new way of thinking about, you know, AI at scale. And you mentioned, you know, the progression of the sizes of these models … And you know, it's 10X each time. GPT-3 is, you know, 10X Turing. And one question that occurs to me is, you know, is size the, you know, the most important or the only factor? You know, does it mean that each time we jump a generation, you know, "Let's just forget about the, you know … We shouldn't be using Turing anymore. Let's just use GPT-3 because it's 10X better." I think, you know, there are some obvious reasons why that might be the case, like if they're trained on, on different corpuses. Like, we know that GPT-3 has kind of a very broad public internet. And at least with GPT-2, like, there was a lot of critique about, you know, Reddit, you know, and, and the biases that get introduced there. So, the training set is going to be an obvious differentiator that separates from the size. But I'm wondering if there are other things that we need to be thinking about beyond just the size of the model. David Carmona: [00:20:24] Yeah. Yeah. No, you are right. And I think … So, it's a very simplistic thing to just discuss the models of … Or the parameters of a, of a model. [crosstalk 00:20:35]. Sam Charrington: [00:20:32] Mm-hmm [affirmative]. David Carmona: [00:20:33] There's way more. I have say, though, that the one thing that we are, we are seeing is that the more parameters that you add … Right now, we are not seeing the ceiling of this. So, we keep improving the accuracy and the generality of the, of the model. So, hey, parameters are important. But then at the same time, it is true that it really … So, there's not one model for everything. So, different models are good for different things. Right? And in our case, for example, we, we … Turing, our family of models. It's actually a family because of that. So, we don't believe that one model will … At least right now, will be useful for every single scenario that you are targeting. Right? So, in, in our case, we created that, that family of model, which are inclusive of, of many things, including many different language, like, this basic [inaudible 00:21:27] that I was providing before or, or this, these metrics- Sam Charrington: [00:21:30] Mm-hmm [affirmative]. David Carmona: [00:21:30] … of, of different models. You're gonna need a model for each of them, depending on what you want to accomplish. But then even beyond that, 'cause not everything that you do is NLP. So, in the family of Turing in Microsoft, we have models that are even multi-modal, that include image and text or that are focused on image. And that thing will keep growing. So, that's something important to keep in mind. The other thing is, of course, the eternal debate on the importance of the architectures, right, that, that you're using. So, I think there's a … And I don't have a super strong opinion. I think it's like everything. It will go through phases. It will get to a moment that just by adding brute force parameters, the thing will be very difficult to improve. And we'll need to be a little bit smarter on how we can improve those models. We can optimize those models in, in another different way. But again, I don't want to diminish the fact that we keep seeing that we add more parameters and, and we get more power. Right? One thing that you said, though, Sam, I, I want to, I want to double click on that 'cause it's super important. So, it's the responsible AI implications of the model. I think that will be an an area for models to differentiate and to keep in, in mind when you're using a model 'cause the reality is that, right now, these models, they have a lot of challenges from the bias, transparency, and, and, and others that, that we need to keep in mind. So, we need to just … So, we innovate on the power, accuracy and, you know, multitask aspect of generality of these models, we also need to innovate on the responsible side of them. And eh- Sam Charrington: [00:23:08] [crosstalk 00:23:09]. David Carmona: [00:23:09] As, as you said, the training corpus, that's important. I think right now, we are probably way too late in the pipeline to apply responsible AI principles to these models, meaning that we create things with these models. And then, just then, we apply those things like … I don't know. Like, you know, filtering or many, many other techniques that you can use there. I think we need to go earlier in the process, even at the point of the training, so we can make those models responsible by design. Sam Charrington: [00:23:41] Do have a sense for how we can do that? A lot of the power of these models comes from, essentially, taking the entire internet and building a language model based on it or, you know, large parts of the internet. How do you apply the, you know, how … What are the techniques that we can use to build responsibility earlier at that scale? David Carmona: [00:24:08] So just as an example, but one example in Microsoft could be the Office or the Outlook auto reply. Right? So, what is … So, that is the typical example of a massive NLP model that is taking as an input, an email and, as an output, is creating a likely reply that you want to, that want to do. Right? So- Sam Charrington: [00:24:28] Mm-hmm [affirmative]. David Carmona: [00:24:28] That [scenario on paper, it looks so simple [laughs] il- extremely simple. But when you get into the responsible side of [inaudible 00:24:37] extremely complex. And you need to, you need to pay a lot of attention. And it's not like a one-shot thing that you do, and done. You are, you are, you are golden. The reality is that you need to apply that across the entire lifecycle of the model from, as you said … So, you mentioned one that is important, which is the training data. So yes, of course, we need to get a subset of the training data to make sure that there's no toxic data that is training the model. But that is not, that is not enough. So, we need to keep in mind things like the privacy of the user. Right? So, think of, "How can we … " So, actually, for this feature, we use differential privacy to make sure that the instances that we use [inaudible 00:25:20] surface, they are not … They cannot identify a user or things like those. And you can also think of the input as something that we also manage, that we make sure that they are short answers, that they are not like, long emails [laughs], of course, things like those. So, it's something that you need to do at every stage. There's a ton of research, active research happening right now to really tackle this super complex challenge that we have with these models. Sam Charrington: [00:25:47] Mm-hmm [affirmative]. So, before we jump into how we achieve this kind of scale, you mentioned something in our pre-call that really stuck with me, is this idea that models are becoming a platform. And you know, transfer is a piece of that. Fine tuning is a piece of that. I'd love to hear you riff on, on that idea. I think it's a really interesting way to think about models. David Carmona: [00:26:14] Yeah, yeah. It's not a new concept. So definitely, we've been, seeing … So, you see our services [inaudible 00:26:23] services in Azure. And they support the concept of transfer learning. So, you don't need to train a model from scratch. Right? So, it's … But the reality is that a lot of what we do in AI is training models from scratch for your particular scenario. So, we're doing everything that we can to try to simplify that process because if we don't simplify that process, it's gonna be very difficult to really scale AI in an organization, in a, in a company. So, there are definitely many techniques to do that. I think in the area of NLP, fine tuning is the most relevant now. And then we can talk about some emerging ones that are super interesting and cool. But with the fine tuning process, the idea is that you pre-train … You can use a model that is pre-trained, like our Turing model, pre-train on that [inaudible 00:27:10] information from the internet, multi domain, totally general. And then you fine tune that model. So, fine tuning, meaning adding something to it. Like, for example, you want to fine tune the model to do a sentiment analysis. So, you would add then like, a classifier or something like that, a binary classifier. And then you use label data. In this case, you use like, sentences that are, you know, positive, negative sentiment. And then you fine tune. So, you train additionally. It's like extra steps of training that entire thing with your added classifier, in this case, for example, which is gonna update the weight. But it's not starting from scratch, meaning that you don't need that massive data and the skills because you don't need to change the architecture. You don't need to compute because it's not that much compute needed. So, that is certainly a huge step into democratizing these models. Right? So, that's, that's super important. And not only you can do that for fine tuning for specific tasks, you can also fine tune it for your domain. So, if you work in finance, or you work in health, or you are in any industry, and you want to find a law company … So, you want a law firm. You want to fine tune that model for the domain of your vertical. So, you don't need to train the whole thing. You just need to train for that particular domain. So, super, super important, but then what we're seeing is these models can go even beyond that. And that's a super interesting area. Right now, it's still in the beginnings. But what is the big difference with that approach? So, in this first approach, with fine tuning, you are training the model at some point. I mean- Sam Charrington: [00:28:51] Mm-hmm [affirmative]. David Carmona: [00:28:52] Not from scratch, but you're training it. You are changing the weight of, of the model. You're- Sam Charrington: [00:28:56] Mm-hmm [affirmative]. David Carmona: [00:28:56] You're updating that model. You need [inaudible 00:28:58] to train it. But then we have these other techniques. They are called like, zero-shot or few-shot, where you don't do that. So, the model can learn in [inaudible 00:29:08] time. So, you don't need to change the [inaudible 00:29:11] of the model. You have only a model. You don't change that model. Now, in [inaudible 00:29:15] time, where you are doing the inference of the model, you can … If you are doing a few-shot, then what you do is just provide a few examples of the task that you want to do, and then directly, the one that you want to solve. And the model will do it, which is mind blowing [laughs] that it can do that. But then you have zero-shot, which is like, the mind blowing times three [laughs], which is that you don't even need to provide examples. So, you can ask one of these models, "Hey, I want to translate this to French." And you provide the sentence. And the model will know how to do that. It will identify patterns in the corpus data that it was trained on. And it will know what it means to be, to do a translation. And it will do that translation. So, those techniques, what they are really doing, from fine tuning to few-shot to zero-shot, is making it much easier to really use these models in your particular scenarios for your particular domain, your particular task, or your particular modality. Super cool. Sam Charrington: [00:30:18] Mm. Awesome, awesome. We've talked about different kinds of models. Uh, just a few quick words on applications. Like, you know, what do you think are the most exciting applications of language models generally or, or Turing in particular, you know, within and outside of Microsoft? David Carmona: [00:30:38] Yeah. So what, what I can do because it's a [laughs], it's a big one. We can, we can talk for a long time. I can give you an overview of how we are using it in Microsoft. And then you can get a sense of, of the usages that, that it can have. So, in Microsoft, the way we look at this is like … We always look at these things, any technology is a stack. So, our goal always is to deliver a full stack. So, you just … And that's our approach to any technology. So, we do the research. But then we want to make sure that that research is available for others to, to use. And then we want to make sure that we keep adding layers [inaudible 00:31:19]. for example, the first one would be releasing that as open source. Right? So, we add another layer. We want that to be part of Azure, so you can train those models yourselves, which is the AI supercomputer that we are, providing in Azure to train those models. But then we keep building on that. On top of that, we have things like Azure machine learning. So, you have another abstraction layer that can improve your productivity, fine tuning those models, like [inaudible 00:31:44] mentioned before. But then we put another layer on top of that, which is [inaudible 00:31:49] services, which are end to end out-of-the-box services that you can use as [inaudible 00:31:54] points. And you can infuse directly into your application without worrying about doing anything with, with those models. And then on top of that, we build applications. So, we make them part of our products, like, Office, Dynamics. Or we create new products that were impossible before. So, that's the [inaudible 00:32:11] approach. I think if we focus on the application side, just to give you some, some examples of things that are already available, that people can use that are powered by these massive models [inaudible 00:32:21] a lot in Office. A lot of things in Office are powered by these models. So, you can think of, for example, semantic search in Office [inaudible 00:32:30] you open a Word document, you search for something in that Word document. And that is not the traditional find and replace [laughs] that we had before. This is semantic search. So, you can even ask questions to the document. And [laughs] the document will answer those, those questions. That is all powered by, by Turing. You have things like document summarization. So, you go to SharePoint, and you hover on a document. And you will see a summary of the document in there. That is a … It's an abstraction. So, it's not just taking parts of the document. That is generated with, with Turing. Things in Outlook, like Outlook auto-reply that I was mentioning before, or things like, … There's something meeting, Meeting Insights, that before a meeting, it will give you all the relevant information about that meeting. So, those are like, … In the taxonomy that we were talking about before, those would be Horizon 1. It's about making those applications better. But then we have these Horizon 2 things that are [inaudible 00:33:24] new opportunities that these models can open. And I think a good example of that would be Project Cortex. So, Project Cortex is part of the Microsoft 365 family. And the goal of that project is super cool. So, what it does is that it's able to get all your internal knowledge in your organization by looking at both the structure and the, and structure data in your organization. So, think of documents, meetings, PowerPoints, anything that you have in there, even images 'cause it's able to scan and do OCR on, on images. So, it's able to crawl all that information for your company, and then to extract knowledge out of that. So, what we do is that we create this concept of a knowledge entity. Like, imagine that, … I, I don't know. You are in a law firm. Imagine international, whatever, commerce. I don't know. I have no idea of, of law. But it's like a topic- Sam Charrington: [00:34:23] [crosstalk 00:34:24]. David Carmona: [00:34:23] … that then AI system was able to extract from your information. And it can, it can help you a lot. So, it can give you … It can provide you with a summary. It can give you, what are the most relevant documents for that particular subject in the company, what are the experts, so, who you should talk with about, about those topics. So, it's mind blowing [inaudible 00:34:45] knowledge basis. Right? So that, that you can get … It's extracting the DNA of your company. So, you can really make it available for the, for the rest of the employees. And like, those, I mean, I can [inaudible 00:34:57]. So, every, any product that you can mention [inaudible 00:35:00] use Bing. So, it's another, of course, super important one. Things like question and answer in Bing [inaudible 00:35:05] even the universal search. So, we use this trick of universal language representation in Bing. And those are all available in there as well. Yeah. So, we use it [inaudible 00:35:16]. But more on the business side, I would mention, in Dynamics 365, we use these models for a lot of different things. Very obvious one, of course, is anything that has to do with customer service understanding or, you know, sentiment analysis. All of that in customer service that is- Sam Charrington: [00:35:33] Mm-hmm [affirmative]. David Carmona: [00:35:33] … powered by these models. But then things that are more visionary. So, think of, for example … In Dynamics 365, one of the things that we can provide is suggestions to sellers in your company by looking at any interaction with that customer before, like emails or documents, phone calls, whatever. Right? So, it's able to understand that and structure information, and give you … It's like language generation. But in this case, to take the next steps to your, to your customs. Sam Charrington: [00:36:01] Hmm. David Carmona: [00:36:02] So, yeah. Super, super broad. We could talk for a while. Yeah [laughs]. Sam Charrington: [00:36:04] [laughs]. So, you know, let's maybe jump into what's happening that's enabling all of this to take place now. One of things that … You know, when we think about kind of the scale and size of these models … You know, we've talked about the scale of the compute that has been required to enable it. You know, how do you thi- … And you mentioned AI supercomputers. Like, what's that all about? How do you think about, you know, building out the infrastructure to scale and train these models? David Carmona: [00:36:36] Yeah. Le- let's say that the train model like this in your laptop will take probably thousands of centuries [laughs]. So, definitely, you need a lot of scale to train [crosstalk 00:36:48]. Sam Charrington: [00:36:48] Yeah. David Carmona: [00:36:48] And you need … I mean, it's amazing, the kind of challenges that you get when you grow a model like this. Like, fundamental challenges like, "Hey, the model doesn't fit in your GPU." [laughs] That's- Sam Charrington: [00:37:02] Mm-hmm [ David Carmona: [00:37:03] affirmative]. … Something that we wouldn't use before. Right? So, I think it is like … If you pass 1.3 parameters, something like that, then the model is not gonna fit. So, you better find new ways. But then it's just a computer. So, the time- Sam Charrington: [00:37:15] [crosstalk 00:37:16]. David Carmona: [00:37:16] … required to train one of these models, you need like, ultra [inaudible 00:37:19]. I, and, and I think … So, that's the main reason why we focus on … And like, always, like I was saying, in the beginning, we try to have a platform approach to it. So, not thinking of fixing this problem for Turing, for our models, but fixing this problem for our customers, so they can use this infrastructure as well. Sam Charrington: [00:37:38] Mm-hmm [affirmative]. David Carmona: [00:37:38] So, the approach that we took was building this massive infrastructure in Azure. So, these are massive clusters that are, that you can sting directly in Azure. And not only you can sting, then, of course, you have the complexity when you have … These are … I mean, imagine … For example, the one that we announced a year ago, that is a massive cluster of like, 10,000 GPUs. You have more 200,000 CPUs. So, it's massive scale. So, how do you manage that? You need things that allow you to manage that in a distributed way. And then what is even more challenging is, "Okay. So, I have my infrastructure completely managed. I can [inaudible 00:38:15]." It is integrated with Azure machine learning. So, you can like, launch like, jobs in that massive infrastructure. But then how would you actually do it? So, you have a model that is by definition, huge. So, how do you train that thing? How do you divide this task, this super complex task, into individual [inaudible 00:38:36] in your, in your massive cluster? And that's that's the other side of the coin, which is our work on these like, software systems that are meant to help you in that process. So, this was … At the same time that we announced the AI supercomputer, we also announced … It's called DeepSpeed. It's open source. So you can use it on, on top of anything. And it will help you do that for you. So, what it will do is that it will take this training. And it will distribute that training across a massive infrastructure. So, it will know how to do that in an efficient way. And it does it basically … It's like a three … We call a 3D distribution because it takes like three different [inaudible 00:39:18] to, let's say, chunk this task. Right? One, which is the most basic one, is the data distribution. So, you just [inaudible 00:39:27] your data in smaller chunks. And then you have [inaudible 00:39:30] each node is gonna take one of those chunks. But that is not enough. You need to go further than that. So, the other level of distribution that we use is [inaudible 00:39:39] distribution, which is [inaudible 00:39:41] because of the transformer architecture, that [inaudible 00:39:44] symmetry is [inaudible 00:39:46] to split the [inaudible 00:39:49] layers. So [inaudible 00:39:50] each node will take a different layer [inaudible 00:39:54] communication and optimization going on there that [inaudible 00:39:57] you need to take care. And then the last one is the [inaudible 00:40:00] which [inaudible 00:40:01] even for each of those layers, we can divide [inaudible 00:40:04] smaller chunk [inaudible 00:40:07] a different GPU. So [inaudible 00:40:09] what that allows you, it [inaudible 00:40:11] a lot of research involved [inaudible 00:40:13] this framework. [inaudible 00:40:14] you almost get like, a linear distribution, like, a linear growth in your model. So, you can [inaudible 00:40:20] number of parameters … And by the way, [inaudible 00:40:23] is able [inaudible 00:40:24] more than one [inaudible 00:40:25] parameters. So huh, you can train models that are not even [inaudible 00:40:29] existing today. And you see the line, and it's almost linear. So, it's exactly what you're, you are looking for in these systems. Sam Charrington: [00:40:35] Oh, wow. Wow. And what about on the hardware side? Microsoft announced this Brainwave Project some time ago to bring new hardware architectures to bear this problem. Can you share a little bit about that? David Carmona: [00:40:50] Yeah. So, yeah. We announced the [inaudible 00:40:53] maybe a little bit more ago. But it's fully available now. So, you go to Azure. And you go to Azure machine learning. And one of the options that you have to deploy your model is[inaudible 00:41:02]. And what, what that is gonna give you, especially [inaudible 00:41:05] inference time, is very low latency and a lot of, you know, efficiency in cost. Right? So, it's perfect for massive … I mean, I, I always use the same example. So, this feature in Word, one of the features powered in Word by Turing, is called predictive text. So, that means that, when you type, it's gonna give you suggestion, how the text will continue. Right? So [inaudible 00:41:29] think of [inaudible 00:41:30] intelligence, but, but for Word. 300 million users of Word. Imagine doing the inference of that model in every keystroke [laughs]. So, that's the- Sam Charrington: [00:41:39] Mm-hmm [affirmative]. David Carmona: [00:41:40] That's the scale that we're talking here. it's huge. So, you better optimize that a lot if you want to scale it to that, to that number. And we do that … I mean, you have to do it in, … Again, it's like a game that you have to tweak every single step. Of course, we don't go with this m- multi billion models on inference time. So, there's a lot of optimization to do there to reduce the number of parameters, to even using techniques to make it more efficient. And then there's the hardware. Right? So, we use the ONNX Runtime thing in Microsoft. That can optimize not only for the CPU … So, it has optimization for CPUs, but also for [FPA 00:42:21]. So, it's a way of [inaudible 00:42:23] from the hardware that you have, underneath. And it really allows you to bring all these things that are great to talk from the research point of view. But then putting [inaudible 00:42:33] in action, it requires all this level of detail that is a new level of complexity. Sam Charrington: [00:42:38] Mm. So, this is primarily focused on the inference side. Do you see any … Are there any particular innovations you're excited about on the hardware side for training? Or you, do you see it primarily being evolutions of today's GPUs? David Carmona: [00:42:55] I mean, when we see … I mean [inaudible 00:42:57] super evolving. So, we'll see … The reality right now is that you have to be flexible. So, we are not- Sam Charrington: [00:43:02] Mm-hmm [affirmative]. David Carmona: [00:43:02] … discarding any approach, any at all. Right? So, the reality is that FPA for the inference was super efficient because it allows you to change it. Right? So, it's programmable. So, that was very, very efficient [inaudible 00:43:16] and very agile. The combination of agility and efficiency was, was the right thing. But that may change at, at any moment. And as these things get more stable, then ASIC may be the way to go. And, and, yeah, of course, we are, we are not discarding any, any of those approaches. Sam Charrington: [00:43:32] So, how do you see this level of scale that we're dealing with today impacting the world for kind of users of AI? What, what changes? David Carmona: [00:43:43] I think that the main thing maybe bringing, bringing all of this together is how this will change the way that you develop AI. So, how this will open new ways of developing AI that we can, that we can use right now. So, that whole concept of creating more general multitask, multi-domain, multi-modality models, that then you can customize for your particular task, that is, that has huge implications on how you can … One, how you can scale AI in your organization and how AI can scale to other organizations, like smaller organizations. Right? So, that for us, it's a, it's a huge aspect of, of all of this. And the way that I see it is, is that uh, it's kind of what we experienced in the last 20 years for software. And this is very similar. So- Sam Charrington: [00:44:38] Mm-hmm [affirmative]. David Carmona: [00:44:38] Software at some moment, we had the hard lesson that software has to be super connected to [laughs] the business. So, you have a team of software developers in a basement [laughs] not connected to the- Sam Charrington: [00:44:51] [laughs]. David Carmona: [00:44:51] … business, that is not gonna work. I think we are ki- … AI is in a basement right now, kind of. Right? So, it's- Sam Charrington: [00:44:57] [laughs]. David Carmona: [00:44:57] We are not fully connected to the business [inaudible 00:45:01] because it requires so much like, skills so many skills and expertise that, that it's a very technical domain right now. We need to change that. So, we need to make sure that the business and a- AI come together. And, we learned that with software. It's called devops. It's about bringing the two together, and then doing a small iteration [inaudible 00:45:22]. It's coming to AI. We are all talking about MLops now. It's a huge area. It's our [inaudible 00:45:28] definitely in Microsoft to provide the platform to empower that collaboration and that continuous iteration, and trackability of everything that you do in your AI development cycle. [crosstalk 00:45:37] and that will be, massively be empowered by AI at scale. So, you have models that can really empower like, a more dynamic way, so you don't have to create from scratch, these models. You can iterate on them with the business and just focus on teaching your domain to the model instead of starting from scratch. That goes in that direction. We do think that there's one step beyond that. We are also seeing … We also saw it with software. That also needs to happen with AI, which is really going beyond the technology and the businesses, and getting to every employee. So, how every employee in an organization should be empowered with AI just like they can Excel right now to [inaudible 00:46:21] numbers [inaudible 00:46:21] that for AI. So, every employee can apply AI, and not only apply it, but also create, consume, mix and match [inaudible 00:46:31] of having some level of freedom to really apply AI to, to what they do. That's another huge area, like the augmented intelligence area. Sam Charrington: [00:46:41] Mm-hmm [affirmative]. David Carmona: [00:46:41] That [inaudible 00:46:42] models, we, we may see it happening sooner than later. Sam Charrington: [00:46:45] Awesome. Well, David, it's been wonderful to catch up with you and to dig into some of the work you're doing around AI at scale. Thanks so much for taking the time to chat with us. David Carmona: [00:46:58] Thank you so much, Sam. It was a pleasure. Sam Charrington: [00:47:00] My pleasure. David Carmona: [00:47:01] Thank you. Sam Charrington: [00:47:02] All right, everyone. That's our show for today. To learn more about today's guest or the topics mentioned in this interview, visit TWIMLAI.com of course, if you like what you hear on the podcast, please subscribe, rate, and review the show on your favorite podcatcher. Thank you so much for listening, and catch you next time.
Sam Charrington: [00:00:00] Welcome to The TWIML AI Podcast. I’m your host, Sam Charrington. Before we jump into the interview, I’d like to take a moment to thank Microsoft for their support of the show and their sponsorship of this series of episodes highlighting just a few of the fundamental innovations behind Azure Cognitive Services. Cognitive Services is a portfolio of domain-specific capabilities that brings AI within the reach of every developer without requiring machine learning expertise. All it takes is an API call to embed the ability to see, hear, speak, search, understand and accelerate decision making into your apps. Visit aka.ms/cognitive to learn how customers like Volkswagen, Uber and the BBC have used Azure Cognitive Services to embed services like realtime translation, facial recognition, and natural language understanding to create robust and intelligent user experiences in their apps. While you’re there, you can take advantage of the $200 credit to start building your own intelligent applications when you open and Azure free account. That link again is aka.ms/cognitive. And now, on to the show. All right, everyone, I am here with Arul Menezes. Arul is a distinguished engineer at Microsoft. Arul, welcome to the TWIML AI podcast. Arul Menezes: [00:01:43] Thank you, Sam. I’m delighted to be here. Sam Charrington: [00:01:45] I’m really looking forward to our chat, which will focus on some of the work you’re doing in the machine translation space. To get us started, I’d love to have you introduce yourself and share a little bit about your background. How did you come to work in NLP and, and translation? And tell us a little bit about your story. Arul Menezes: [00:02:03] Yeah, so I’ve actually been at Microsoft 30 years at this point. Sam Charrington: [00:02:07] Wow. Arul Menezes: [00:02:07] I, yeah, I know. God, it’s a long time. I was actually in a PhD program. I came here for the summer, loved it so much I never went back. So I worked at Microsoft in the various engineering teams for a while, and then eventually I drifted back into research and I joined the natural language processing team in Microsoft Research, and I started the machine translation project, and I’ve been doing that ever since, so I’ve been doing machine translation for, like, 20 years now, and it’s been, it’s been a great ride because it’s just a fascinating field. So many interesting challenges and we have made so much progress from when we started, you know, and we’ve gone through so many evolutions of technology. It’s been, it’s been a great ride, yeah. Sam Charrington: [00:02:49] Yeah, there are some pretty famous examples of, you know, how the introduction of deep learning has changed machine translation. I’m assuming that your experience there i- is no different. Arul Menezes: [00:03:04] Yeah. Sam Charrington: [00:03:04] Can you share a little bit about how the, the evolution that you’ve seen over the years? Arul Menezes: [00:03:08] Sure. Sure. I mean, historically, you know, machine translation is something people s- tried to do, you know, in the ’50s. It was one of the first things they wanted to do with computers, you know, along with simulating sort of nuclear sort of bombs. But for the longest time, it was very, very hard to make progress, so all the way through, I would say, the late ’90s, early 2000s, we were still in sort of rule based and knowledge sort of engineered approaches, but then the first real breakthrough that came in the late ’90s, well actually starting a little earlier in terms of some papers published at IBM, but really taking off in the late ’90s and early 2000s was statistical machine translation, where for the first time, you know, we were able to take advantage of, like, large amounts of previously translated data, right? So you take documents and web pages and things that, that have previously been translated by people and you get these parallel texts, which is, let’s say, English and French, and you align documents and sentences, and then eventually words and phrases so you can learn these translations, and so with statistical machine translation, we were learning from data for the very first time, instead of having people hand code it. And it worked, actually, astonishingly well compared to what we were doing before. But eventually, we ran into the limits of the technology, because while we had the data, we didn’t have the techniques to do a good job of learning what that data was telling us because, you know, the machine learning techniques that we had back then just weren’t good enough at… They were good at memorizing, right? If you said something exactly the way they had seen in the data, they would do a good job of translating it. But they were terrible at generalizing from what they saw in the data, and that’s where neural models come in. Like, neural models are amazing at generalizing, you know. People always talk about how some of the latest models, you know, you can probe them to figure out what was in their training data and get them to reproduce what was in their training data. But what we forget is it takes work to actually make them do that, because most of the time, they’re generalizing. They’re paraphrasing. They’re not just replicating their training data, and that’s something we were not able to do before. So if you look at the evolution over the last 20 years of machine translation, we had our statistical machine translation, which did really well for a while, but then eventually plateaued. Then, you know, we had sort of the advent of neural networks, and the first thing that people tried to do was, you know, we did feedforward neural networks. We tried to shoehorn them into the framework we already had and combine feedforward networks and statistical techniques, and that worked okay. You got a few incremental improvements. But it wasn’t until we had the sort of pure neural LSTM models that we, for the first time, were really capturing the power of neural models, right? So what an LSTM model would do would be, you know, you have this encoder that you feed the source language sentence in, and it basically embeds the meaning of that entire sentence in the LSTM state. And then you feed that through a decoder that is now generating a fluent sentence, sort of based on this very abstracted embedded understanding of what the source language said. And so that’s very different from the way we were doing it, just sort of copying words and phrases that we’d memorized. So that was the first revolution, and, and it gave us amazing results, actually, compared to what we were doing before. And then, of course, along, after that came transformers, which sort of take that whole encoder/decoder architecture, but take it to the next level. Instead of having the meaning of the entire source sentence be encoded into a single LSTM state, which may work well for short sentences but gets, you know, worse as you get to longer sentences. In a transformer, you know, we have the self attention that’s basically looking at every word in the source and every word in the target, and so you have, like, full context available to the model at any point in time. So that’s where we stand today is, you know, transformers are the state of the art, but of course there’s lots of really cool, interesting variations and things we’re doing, which I think we’re going to talk about at some point. Sam Charrington: [00:07:25] And, and when you talk about transformers being the state of the art, is that what is powering the current kind of production Azure machine translation service? Or is that the state of the art in research and, you know, there’s some combination of the various techniques you mentioned that is powering the live service? Arul Menezes: [00:07:46] So the live service is very much powered by transformers. We have, you know, all 180 language pairs or something that we support powered by transformers running in production. Now, one thing we do do is that we take advantage of what’s called knowledge distillation, right, to take the knowledge that’s embedded in these very large transformers that we train offline and then condense that or distill that into smaller, still transformers, but smaller, shallower, and narrower models that we use in production, right? So we typically go through multiple stages of these teacher models before we get to the student, so our pipeline’s actually fairly complex. We take the parallel data, which I mentioned earlier, which is sort of the lifeblood of machine translation. This is the previously translated human text. And we train, like, a first teacher based on that data. Then we typically do what’s called back translation, which is a technique in machine translation to take advantage of monolingual data, so data that’s not parallel, so it’s not translated source and target. It’s just in one language, typically the target language. And what we do there is we want to take advantage of this monolingual data to teach the model more about the syntax and the, you know, semantics of the target language so it gets more fluent. And the way we incorporate that data into a machine translation model is through something called back translation, where we take the, the target language data, we translate it back to the source using one of our models, and then we use it to train the model in the other direction. So this is a little complicated. So basically, if you’re training an English to French model… Sam Charrington: [00:09:28] Mm-hmm [affirmative]. Arul Menezes: [00:09:29] … in addition to the parallel English-French data, you also take some French monolingual data, you translate it back to English using your other direction translation system, the French to English system… Sam Charrington: [00:09:41] Mm-hmm [affirmative]. Arul Menezes: [00:09:41] … and then you put that synthetic data back into training your English-French system. Sam Charrington: [00:09:45] Okay. Arul Menezes: [00:09:45] So, so that’s- Sam Charrington: [00:09:47] [crosstalk 00:09:49] essentially a, a data augmentation technique? Arul Menezes: [00:09:51] It is, yeah, it’s a data augmentation technique, and it works, like, incredibly well, actually. Adds several points to our metric. The metric we use in machine translation is called a blue score. I mean, there are other metrics and I, I mean, I could talk about that at some point if we want to get into it, but, you know, we get several points of blue score out of the back translation. And then, so that’s our final sort of teacher model, which is typically huge, and then what we do is we use that model to teach the student model. And the way we do that is essentially we run, like, a huge amount of text through this teacher model, and then we take the data generated by the teacher and we train the student on it. And the reason that works is because unlike sort of natural data that we train the teacher on, which can be confusing, contradictory, diverse, the, the data generated by the teacher is very uniform and it’s very standardized, and so you can use a much simpler student model to learn all of that knowledge from the teacher, because it’s a simpler learning problem. And having done that, that model runs, like, super fast and we can h- host in production and translate, like, trillions of words, you know, so yeah. Sam Charrington: [00:10:59] And so the, the student teacher part of the process is kind of interesting to explore a little bit further. Are you essentially trying to do something your- the task that you’re trying to, or goal that you’re trying to achieve with that is model compression. Arul Menezes: [00:11:13] Right. Sam Charrington: [00:11:14] Very different approach to it than, like, pruning or… Arul Menezes: [00:11:18] Right. Sam Charrington: [00:11:18] … you know, some of the other ways you might approach compression. Arul Menezes: [00:11:20] Yeah, right. So we do, like, we do a lot of different things to model compression, right? So one of the things we do is we, we do quantization, for example, within all our models in eight bits. We’ve experimented with less than eight bits. It’s not quite as effective, but, you know we, we do that. We do some other, like, pruning techniques as well, but the biggest one is the knowledge distillation, and what you’re trying to do there is get a smaller model to basically mimic the behavior of the big teacher model just running a lot cheaper. And by combining all the techniques, we published a paper last year on this at a workshop, and from our big teacher with all of the knowledge distillation, the compression, the quantization and, and so on, we’re running something like 250 times faster… Sam Charrington: [00:12:06] Wow. Arul Menezes: [00:12:06] … on the student than the teacher with, I mean, there is a small loss in quality, right? But we lose maybe half a blue point, not too much, and in some cases not even any. We can, like, actually maintain the quality as is, so… Sam Charrington: [00:12:20] The, my next question for you, it, the way you describe the process, and in particular the idea that the teacher is outputting more consistent examples than what is found in the training data… Arul Menezes: [00:12:35] Mm-hmm [affirmative], right. Sam Charrington: [00:12:35] My next question was, or the intuition that I had was that that would cause the student to be far less effective at generalizing and would make it perform worse, but it sounds like that’s not the case in practice. Arul Menezes: [00:12:51] So the key to that is to make sure that the data that you feed through the teacher to teach the student is diverse enough to cover all the situations that you may encounter, right? So the students are a little weird, I mean, and I think you’re sort of hinting at that. We do, for example over-fit the student to the training data, which is something that you typically wouldn’t do in your teacher model, because you, in fact, are trying to make the teacher match the student as much as possible. Sam Charrington: [00:13:18] Mm-hmm [affirmative]. Arul Menezes: [00:13:18] So some of the things that you do to make the, to the teachers better at generalization, you don’t do in the student. And in fact, if you look at the student distributions, they’re much sharper than the teacher distributions, because they have overfit to the data that they’ve seen. But, you know, there’s a little evidence that you could get into some corner cases that are brittle, like you know there’s this problem of neural hallucination that all of the neural models are subject to where, you know, occasionally they’ll just output something that is completely off the wall, unrelated to anything that they’ve seen. And there’s some evidence that there’s a little bit of amplification of that going on. Like if it’s… You know, the teachers are also subject to hallucination, but maybe at a very, very low frequency, and that maybe that’s being amplified a little bit in the student. So w- we’re, you know, we’re working on managing that, but yeah, so there’s, there’s, you know, it’s a trade-off. Like, the students have lower capacity, but that’s what enables us to run them, and we, you know, we run them on CPU. We don’t, we, we don’t use GPUs in production inference. We use, of course, all models are trained and, and all the knowledge [inaudible 00:14:26] are done on GPUs, but in, but in production we’re just using CPUs. Sam Charrington: [00:14:26] And is, is that primarily based on the cost benefit analysis, or is it based on a latency envelope that you have to work with and not needing, not wanting a kind of batch… Arul Menezes: [00:14:38] Right. Sam Charrington: [00:14:38] … Inference requests? Arul Menezes: [00:14:39] Yeah, that, that’s exactly right. I mean, you know, latency is a big concern. Our API’s a real- real-time API, and so, you know, latency is the biggest driving factor. And honestly, if you do inference on GPUs, you know, you get some latency benefit, but the big benefit is on large batches. And so unless you have a matched batch translation API, you can’t really take advantage of the full capacity of your, of your GPU, so, you know, in a real-time API. Sam Charrington: [00:15:05] Mm-hmm [affirmative]. And are both the teacher and the student models transformers for you? Arul Menezes: [00:15:12] Yeah, they are. They are. Yeah, the, the students are transformer large or a little bit larger, and then the s- sorry, that’s the teachers, and then the students, they’re very highly optimized transformer. I mean, they, we start with transformer base, but then we do a lot of really strange stuff. I would refer you to the paper, actually. [laughs] Sam Charrington: [00:15:30] Okay. When you were describing the data augmentation technique that you use… Arul Menezes: [00:15:36] Right. Sam Charrington: [00:15:37] … it kind of called to mind ideas about incorporating a GAN type of approach where you’re doing the pass-back translation and then, you know, maybe there’s some GAN that is trying to… Arul Menezes: [00:15:47] Right. Sam Charrington: [00:15:47] … figure out if the results going backwards… Arul Menezes: [00:15:50] Right. Sam Charrington: [00:15:51] … is like a human translation. Is there a role for that kind of technique? Is that something that comes up in the research? Arul Menezes: [00:15:57] Yeah, so we’ve, we’ve looked at GANs. There were some exciting results, but but in the end, I mean, I think we have some okay research results. We haven’t seen much benefit, but more broadly, in terms of data augmentation, we’re using it all over the place, right? So it’s we have the back translation, but there are a lot of phenomenon that we want to address in machine translation that is maybe not well represented in the data, and so we use data augmentation pretty heavily to cover those cases, right? To give you a simple example, when you translate a sentence and you get a particular translation and then you go in and let’s say you remove the period at the end of the sentence, sometimes it changes the translation entirely. They may both be perfectly good translations, right? But they’re different. So one way to look at it is, well, they’re good, both good translations, but people don’t like that. So if you look at our customers, and we’re very sensitive to what our users, the feedback we get from our users. So one of the feedback we got was that, you know, we want a little more stability in our translation. So, you know, just because I lost a period at the end of the sentence, I shouldn’t get a drastically different translation. And so, you know, it’s very easy to augment the data and say, well, you know, stochastically I’m going to, like, delete the period on my sentences, and so then the model learns to basically be robust whether there’s a period or not. Now, of course, you know, that’s different than a question mark. You definitely want to leave the question mark in because that changes the meaning of the whole… Sam Charrington: [00:17:22] Mm-hmm [affirmative]. Arul Menezes: [00:17:23] … sentence. But, you know, things like that, punctuation, the period, commas, things like that. Maybe, you know, capitalization, for example. One of the the other examples would be like an all caps sentence. You know, you take the whole sentence and you change it to all caps. Well, you get a totally different translation, right? So we, again, generate some synthetic all caps data so that the model learns to do a good job of translating that as well. And then there’s, you know, there’s all these, like, I, I would call them, you know, long-tail phenomenon that and, you know we feel that data augmentation’s a good way to address some of these, yeah. Sam Charrington: [00:17:53] Your examples are really interesting to me because I’m refer- I’m comparing them to, like, your textbook NLP types of examples where the first thing you’re doing is making everything lowercase and getting rid of all of your punctuation. Arul Menezes: [00:18:05] Yeah. Sam Charrington: [00:18:05] Sounds like that does not work for translation. Arul Menezes: [00:18:08] No, because there’s a lot of information in casing and punctuation, right? Like, I mean, if you want to handle names, for example, you need to pay attention to the case of the input. Like, everything in the input has information, and so actually even the punctuation, right? Like, sometimes if you take the period off the end of the sentence, it should change things because it may be a noun phrase rather than an actual sentence, right? So it’s not so much we’re preprocessing the data and trying to be clever. It’s about exposing the model to different variations so that the model can figure things out for itself. Sam Charrington: [00:18:41] Mm-hmm [affirmative]. One of the questions this prompts is, like, the unit of, you know, work or the unit of thing that you’re trying to translate. Arul Menezes: [00:18:50] Right. Sam Charrington: [00:18:50] You know, translating a word being different from translating a sentence… Arul Menezes: [00:18:54] Sure. Sam Charrington: [00:18:54] … being different from translating a, an entire document. Arul Menezes: [00:18:57] Right. Sam Charrington: [00:18:57] Sounds like most of what we’ve been talking about is kind of phrase by phrase now relative to the word by word that, you know, we were doing 20 years ago. Are you also looking at the entire document? Are you able to get information from a broader context to impact the translations? Arul Menezes: [00:19:18] Yeah, so that’s, that’s a very good question, Sam. Yeah, so the context matters a lot, right? So one of the reasons why neural models are so great at translating now is because they, they are looking at the whole sentence context and they’re translating the entire conte- the sentence and every w- they, they’re basically sort of figuring out the meaning of every word and phrase in the context of the whole sentence, which is something we couldn’t do with statistical machine translation before. So now the next step is to expand that context to beyond the sentence, right? So there are a lot of phenomenon that it’s impossible to translate well without context beyond the sentence. Like, in many languages, unless you have document-level context or paragraph-level context, you can’t generate the right pronouns because you don’t actually know. The sentence doesn’t have enough clues to let you know what is the gender of the subject or the object or the person you’re talking about in that sentence. Beyond just the pronouns, it’s also like you know, the senses of words and you know, disambiguating those. So, we’re, we’re actually moving towards translating at the whole document level context, or at least, you know, very large, multi-sentence fragments. And then there, we’ll be able to use, you know, the, the, the, the context of the entire document to translate each individual sentence. And we actually have some really great research results based on translating at the document level. Yeah, so we’re pretty excited about that. That model is not in production yet. Sam Charrington: [00:20:48] Mm-hmm [affirmative]. Arul Menezes: [00:20:48] But it’s something that we’re working on. We did ship a document-level API. I think it’s in public preview right now. Which addresses sort of the other half of the problem, which is, you know, people have documents. They’ve got formatting. You know, it’s in PDF. It’s in Word. It’s in PowerPoint, whatever, and HTML, and it’s a hassle getting all the text out of that… Sam Charrington: [00:21:11] Yeah. Arul Menezes: [00:21:11] … getting it translated, and then worse still trying to reassemble the document and reconstruct the formatting of that document on the translated thing. So we’ve made that easy. We just shipped this API. Just give us your PDF. We’ll tear it apart, we’ll do the translation, we’ll put it back together, and we’ll preserve the format. And you know, especially for PDF, that is actually really hard. Doing the format preservation is tricky. But we’re pretty excited about that API. And so, then, that’s the place where our document level neural model would fit right in, right? Because now we have, the user’s giving us the whole document. We can not only handle all the stuff about the formatting and all that. We can go one better. We can actually use the whole document context to give you better quality translations. Sam Charrington: [00:21:53] Mm-hmm [affirmative]. Can you give us an overview of some of the techniques that go into looking at the entire document when building the, the model? Arul Menezes: [00:22:03] Yeah, so there’s, I mean, right now, as I said, we haven’t actually shipped this, so we’re looking at a, a bunch of variations. You know, there’s several things that people have looked at, like mo- you know, there are hierarchical models where you do the, you run transformers at the sentence level, and then you run a second level to sort of, like, collect the sentence level information into, like, a document level context vector, and then you feed that back into translating each sentence. We’re finding that actually, if you just make it super simple and treat the whole thing as, as if it were a giant sentence, in effect, you get really good results. You do have to deal with the performance issues, right, because transformers are n-squared in the size of the input and the output, so instead of, you know, handling, you know, a 25-word sentence, if we’re not translating a thousand-word para- you know, document or paragraph, then the, you know, you’ve got, like, an n-squared problem in terms of the performance, right? It’s going to be that much more expensive, but we have, we have things that we’re looking at to make that faster as well, so we’re pretty optimistic we can do that, and and I think we can do that with just letting the transformer figure it out for itself rather than trying to be very clever about all this hierarchical stuff. Sam Charrington: [00:23:10] Nice. Nice. Let’s talk a little bit about the role of different languages. You know, we, we’ve already talked about how you can use back translation to help augment the performance of your translation of a language in, in one direction or the translation between a couple of language pairs. Arul Menezes: [00:23:27] Right. Sam Charrington: [00:23:28] Are there ways to take advantage of the other 130 or so languages… Arul Menezes: [00:23:33] Sure. Sam Charrington: [00:23:33] … that y- that you support when you’re building the n-plus-1th model for a given language? Arul Menezes: [00:23:38] Absolutely. Absolutely. That’s been one of the most exciting things, I would say, that came out of sort of transformers and, and neural models in general is the ability to do this sort of transfer learning between languages, right? And the reason we can do that is because transformers or neural models in general are representing the meanings of words and sentences and phrases as embeddings in, you know, the space, and by training on multiple languages together, you can actually get the representations of these languages to merge and have the similar concepts be represented through relative points in spa- in that space, right? So, as a practical matter, we’ve basically found that if we group languages by family, right, and take, so for example we took all our Indian languages and we put them together and we trained one joint model a- across all of the languages and now we’re talking, you know, languages where you have a very different amount of data. You have Hindi, where we have quite a lot of data, and then we have, like, Assamese, which was, I think, the last one that we shipped, that has probably, like, two orders of magnitude less data. And the, the wonderful thing is that by training them jointly, the Assamese model learns from the huge amount of data that we have for Hindi and does, like, dramatically better than if we had just trained on Assamese by itself. In fact, we have done those experiments and, you know, for the smaller languages, we can get, like, five, 10 blue points, which is, like, a crazy level of improvement just from the transfer learning and multilingual. We also do that with, like, Arabic all of our Middle Eastern languages. So we’re just, like, grouping more and more language families together and getting huge benefits out of this. Sam Charrington: [00:25:31] And when you’re grouping the, the language families, have you ever, do you experiment with going across language families and seeing if there’s some improvement, Arul Menezes: [00:25:41] yeah. Sam Charrington: [00:25:42] … improvement there? Arul Menezes: [00:25:43] Yeah, so we, you know, we’ve trained models that are, like, 50 or 100 languages in them. What you run into is, you know, as you add languages, you have to increase the size of your vocabulary to accommodate all of these languages, and you have to increase the size of the model, because at some point you run into model capacity a little bit. So you can have a model that does a ni- a really nice job of learning from 50 or 100 languages, but it gets to be a really huge model, and so in terms of cost effectiveness, we’ve found that, like, you get, like, almost all of the benefit of the transfer learning at, like, a much reduced cost by just grouping 10, 15 languages at a time. And if they’re related, it’s better. But actually, even if they’re unrelated, it still works. [laughs] It’s quite amazing how well it works even if the languages are not related, yeah. Sam Charrington: [00:26:32] We may think of it as, like, a computation test of Chomsky’s universal grammar and, you know, these ideas that suggest that all languages have these common elements. Arul Menezes: [00:26:41] Yeah. Sam Charrington: [00:26:42] I- if you are able to train these models across languages and im- improve them, that would seem to support those kinds of theories. Arul Menezes: [00:26:48] I mean, definitely the models do a really good job of bringing related concepts together in the, in the embedding space, right? Sam Charrington: [00:26:56] Would you consider this, you, you referenced this as, like, multilingual transfer learning. Would you also think of it as a type of multitask learning as well, or is, is that not technically what you’re doing in this task? Arul Menezes: [00:27:09] So we’re also doing, in addition to multilingual just machine translation, we’re also doing multilingual multitask learning, and what we’re doing there is we are combining the sort of so there’s let me back up a bit. There’s been this whole line of research based on models like BERT, right? Pretrained language models where, if you look at BERT, it’s actually the encoder half of a machine translation model, but it’s trained on monolingual data. It’s trained on a, on a single language data on this objective that’s a reconstruction objective where, you know, you’re given a, a, a sentence where you have couple of words or phrases blanked out. You need to predict that, right? And then you have multilingual BERT where you take multiple separate monolingual corpora, right, so it’s like a bunch of English text, a bunch of French text, and all of it, and you train them jointly in the same model. And it does a pretty good job of actually pulling the representations of those things together. So that’s one line of research that’s sort of really driven a revolution in, like, the whole natural language understanding field, right? So for example, today if you want to train a named entity tagger, you wouldn’t start from scratch on your ta- on your named entity data. You would start with a pretrained model. So one of the things that we’re very excited about is we have this project that we call [ZICOR 00:28:43] where we’re bringing the machine translation work and this sort of pretrained language model, BERT-style work together, right? And we train, we’re training this multitask, multilingual model that’s, architecturally, it’s just a machine translation model, right? But in addition to training it on the parallel data, let’s say the English-French data and the English-German data and conversely the German-English data and the French-English data and, you know, 10 or 15 or 50 or 100 other languages. In addition, we have a separate task where we have the BERT tasks, where we take monolingual data and we, we have it reconstruct, you know, the, the missing words. And we also have what’s called a denoising autoencoder task, which is where you give it a scrambled sentence, and then it has to output the unscrambled sentence through the decoder. And then now you have these three tasks, and we train them in rotation on the same model, so they’re sharing parameters. So the model has to figure out how to use the same parameters to do a good job of the BERT task, to do a good job of the denoising autoencoder task, as well as to do a good job of the machine translation task. And this, we find, leads to, like, much better representation that work for better natural language understanding quality, but also better machine translation quality. Sam Charrington: [00:29:43] Nice. And the, the BERT task in this example is within the same language, as opposed to… Arul Menezes: [00:29:49] Right. Sam Charrington: [00:29:49] … across the target, to the target language? Arul Menezes: [00:29:53] Yeah, there’s actually, like, a whole family of tasks, right? I mean, people have come up with, I mean, we’ve, we’ve experimented with, like, 20, 25 tasks. Like, so you can do a monolingual mass language model task, which is the BERT task, but you can do a cross-lingual ma- mass language task as well, and you can do the denoising autoencoder task monolingually, where you have to reconstruct the same language, but you can also do that cross-lingually where you have to reconstruct sort of a scrambled foreign language task, so there’s, like, a real, like, sort of stone soup approach where people are just throwing in all kinds of tasks, and they all help a little bit. But we need to figure out, like, what’s the minimal set that you need? Because, you know, it’s work. It’s computational expense to train these huge models on all these tasks, so if we can find the minimal set that works, that would be ideal. And so far, what we’re working with is, like, a denoising autoencoder, a mass language model, and a machine translation task. Sam Charrington: [00:30:49] Very, very cool. Very cool. So I think one of the things that, you know, often users of this kind of machine translation services experiences that, you know, they weren’t great in the general case, but when you start to try to apply them to specific domains, it’s a lot more challenging, you know, and kind of the technical conversations or translating, you know, medical conversations or, you know, construction or what have you. Is there anything that you’re doing to make the domain-specific performance better for these kinds of systems? Arul Menezes: [00:31:26] Yeah, definitely you know, domain performance in specialized domains is a real challenge. We’re doing several things to get better there, right? So the, the first thing is that the quality is really determined by the availability of data, right? So in the domains like, let’s say news or web pages where we have a ton of data, you know, we’re doing really, really well. And then if you go into a more specialized domain like, let’s say, medical or legal where we don’t have as much data, we’re maybe not doing quite as well. And so one of the things we’re doing is we’re now taking the same neural models that are good at translation and we’re using them to identify parallel data in these domains that we can find on the web that we maybe weren’t finding before, and we can do that because these models, you know, because the representations are shared in the multilingual models, they are actually very good at identifying potential training data that, that is translations of each other. So that’s one thing we’re doing. The other thing we’re doing, of course, is the same kind of transfer learning approach that we’re using cross-lingually applies within domains, as well, right? So if you have a small amount of medical domain data, you don’t want to, like, just train a l- a model that’s based just on that you know, small data. What we’re doing instead is we’re taking, you know, our huge model that’s trained on a ton of, like, general data across a bunch of domains, and then you fine-tune it for the specific domains that you’re interested in. And we actually have a product called Customer Translator that we have, like, you know, thousands of customers using, where they are using this approach to customize the machine translation to their company or their application needs, right? So let’s say you’re a car company or something and you have a bunch of data that’s about, like, automotive manuals, right? So you come to our website, you log in, you create your account, etc., you upload this data, and then what we do is we take your small amount of domain-specific data, we take our large model, and then we fine-tune it to that data, and now you have the model that does, like, you know, sometimes dramatically, again, 10, 15, 20 blue points better than the baseline because, you know, we’ve learned the vocabulary and the specifics of your domain, but we’re still leveraging, we’re standing on this platform of, like, the broad general domain quality. So that’s been extremely popular and valuable, actually. We just shipped a new version of that based on transformers a couple of months ago. Sam Charrington: [00:33:45] And in that case, the user is presumably bringing translated documents so that, that you’re able to train or fine tune all, with both source and target translations? Arul Menezes: [00:33:56] Yeah, that’s exactly right. I mean, a lot of the companies that we work with have some data, right? Like, let’s say they had a previous version of their vehicle or, you know whatever and they had manuals that were translated. In Microsoft’s case, for example, you know, we have, let’s say the manuals for Microsoft Word going back, you know, a couple of decades, and this is the kind of data you can use to customize it so that anything, any new content that you want to translate can have, like, a very consistent, like, vocabulary and, and tone and so on, yeah. Sam Charrington: [00:34:28] Mm-hmm [affirmative]. And then in that first example or the first technique that you mentioned, that sounds really interesting. So you’ve got this index of the web and Bing… Arul Menezes: [00:34:37] Right. Sam Charrington: [00:34:37] … you know, for example, or maybe you have a separate one, but you go- have this mechanism to kind of crawl the web and… Arul Menezes: [00:34:43] Right. Sam Charrington: [00:34:44] It sounds like the idea is that you can use the model to identify, hey, I’ve got these two documents. Arul Menezes: [00:34:52] Right. Sam Charrington: [00:34:52] They look really similar, but there’s a, a high percentage of words that I don’t know… Arul Menezes: [00:34:57] Yeah, right. Sam Charrington: [00:34:57] … that occupy similar positions in the same documents. Arul Menezes: [00:35:00] Yeah. Sam Charrington: [00:35:00] And then you have someone translate- oh, well actually, then once you know that, you can just align them, so to speak, and you’ve got more domain-specific document to add to your training set? Is that the general idea? Arul Menezes: [00:35:13] Yeah. I mean, it, it’s like you’re trying to find two very similar-looking needles in a very, very, very large haystack, right? [laughs] Sam Charrington: [00:35:20] Mm-hmm [affirmative]. Arul Menezes: [00:35:21] And so, so you have to have a magnet that finds exactly those two needles and rejects everything else. So the, the cross-lingual embedding space is pretty key here, right? So you’re basically, in principle, if you embedded every single sentence or document on the web and then were able to look at every single document and find all of its very similarly close embeddings, you’d be done. But, you know, that’s [laughs] that’s, Sam Charrington: [00:35:47] easier said than done? Arul Menezes: [00:35:48] Easier said than done, right? So that’s the kind of thing that we’re trying to do at scale, right, is, like, you got these, you know, trillions of documents and, you know, we want to find the matching one, so you need to do it efficiently, and so there’s a lot of, like, clever engineering that goes into, like, indexing this stuff and, and, like, computing the embeddings efficiently. And, of course, also, you know, we’re not really trying to match every page in the web to every other page in the web, because you have, you know, a lot of clues that says whe- you know, if I have a document here, you know, is it likely I’d have a translated document somewhere? It’s going to be either in the same, like, top-level domain or, or related sites, things like that. So there are, there are ways to constrain that search. Sam Charrington: [00:36:27] Mm-hmm [affirmative]. Our conversation thus far has focused primarily on text translation. Are you also involved in voice translation? Arul Menezes: [00:36:38] Yeah, so we actually have been doing speech translation for a while. Several years ago we shipped a feature for speech translation in Skype called Skype Translator. It was, you know, really well received, super exciting. A lot of people use it even today, right? Especially, you know, people talking to their relatives in another country, and, you know, there’s a lot of interesting challenges in speech translation because it’s not that you just take the output of a speech recognition system and then just pass it to machine translation, right? There’s a, there’s a real mismatch in what comes out of speech recognition and what is needed to do a good job of translation, because of course translation’s expecting, like, you know, well-formatted text, capitalization, punctuation, br- sentence breaks, things like that. So we put a, we put a lot of effort into bridging that gap, you know, post-processing the output of speech recognition so that we have, you know, h- really accurate sentence boundaries. So that, that matters a lot. I mean, you break the sentence in the middle and you try to translate… Like, if you break a sentence in the middle, the speech recognition there is okay, because as a human reading it, you know, there’s a period in there. You just ignore it and move on. But the machine doesn’t know that, and so when you’re trying to translate, you’ve got these two separate sentences and then it does a terrible job of it. So doing, getting the sentence breaks right, getting punctuation right and so on is really important, and so, so that’s what we’ve been doing. We actually have a project going on now with the European Parliament where they are going to be using our technology, well, it’s, it, there, there’s three contestants or three bidders in this project, and so there’s an evaluation that will happen in a few months, but we’re hoping that they’ll adopt our technology for live transcription and translation of the European Parliament sessions in all 24 languages of the European Parliament, which is super exciting. Sam Charrington: [00:38:26] Oh, wow. Wow. Arul Menezes: [00:38:27] Yeah. Sam Charrington: [00:38:28] So when you think about kind of where we are with, you know, transformers and some of the innovations that we’ve talked about and, you know, relative to your 20, 30 years in the space and I’m curious what you’re most excited about and, and where you see it going. Arul Menezes: [00:38:44] Yeah, I mean, the pace of innovation has just been amazing. There’s so many things that are happening that, like, you know, would be a really dramatic impact, right? So one is just much larger models, right? As we scale up the model, we see continual improvements. And so as the hardware and the, you know, our ability to serve up these larger models keeps growing, the quality will also keep growing, right? The architect of these large models also matters, right? Like, it’s not just matter of taking the smaller model and scaling it up exactly as is, so there are things like mixture of experts models that, for example, allow you to scale the number of parameters without the cost scaling as linearly, right, because you have parts of the model that specialize in different parts of the problem. And then, you know, multilingual is definitely the future. Pretrained models is definitely the future, right? So, so, like, if you put that all together, like pretrained, multilingual, multitask trained, maybe with mixture of experts, huge models, and then we would specialize them for individual language pairs or groups of languages and then distill them do- down to something we can ship. So that’s one area that there’s a lot of innovation happening. The other thing is that, you know, 10 years ago, people were just amazed that translation worked at all, right? Sam Charrington: [00:40:07] [laughs] Arul Menezes: [00:40:08] And now we’re doing a really good job and expectations have risen, so you get to the point where a lot of sort of smaller, let’s call them long-tail problems start to matter a lot, right? So if you look at translation of names, we probably get them 99% right, right? But a few years ago it would have been fine to say, “Hey, we’re 97% accurate on names.” But maybe now that’s not good enough, right? Like, screwing up 1% of the names is not acceptable, so, you know, how do we get that last 1% of names a- and, you know, I’m just making up the nu- it, it may be 99.9%. You’re still going to have upset customers if you get, you know, 0.1% of your names or your numbers, numbers are even worse, right? Sam Charrington: [00:40:47] Mm-hmm [affirmative]. Arul Menezes: [00:40:48] Like, if you misstate a number even, like, 0.1% of the time, it could have catastrophic consequences, right? So that’s an important area. I mentioned neural hallucination before. That’s something we see where, again, may happen only 0.1% of the time, but if you get, like, a completely unrelated sentence that has nothing to do with your input but is really fluent, it’s pretty deceptive, right? Like, because especially if I’m just putting my faith in this translation that, and I don’t understand the source language at all, you’d be like, “Well, sounds okay,” and move on. But maybe it says something completely different from what the source said, right? And so that’s, that’s a challenge. Sam Charrington: [00:41:25] Mm-hmm [affirmative]. Arul Menezes: [00:41:25] Yeah, I mean, there’s lots of really cool things happening in this space. Sam Charrington: [00:41:30] Awesome. Awesome. Well, Arul, thanks so much for taking some time to share a bit about what you’re up to. Very cool stuff. Arul Menezes: [00:41:38] Thank you. You’re welcome, Sam. Sam Charrington: [00:41:40] Thank you. Arul Menezes: [00:41:40] Happy to be on, on the show. Take care. Bye. Sam Charrington: [00:41:43] Thank you. All right, everyone, that’s our show for today. To learn more about today’s guest or the topics mentioned in this interview, visit TWIMLAI.com. Of course, if you like what you hear on the podcast, please subscribe, rate, and review the show on your favorite pod catcher. Thanks so much for listening, and catch you next time.
Sam Charrington: Hey Everyone! Last week was the first week of our TWIMLcon: AI Platforms conference, and what a great first week it was! Following three days of informative sessions and workshops, we concluded the week with our inaugural TWIMLcon Executive Summit, a packed day featuring insightful and inspiring sessions with leaders from companies like BP, Walmart, Accenture, Qualcomm, Orangtheory Fitness, Cruise, and many more. If you’re not attending the conference and would like a sense of what’s been happening, check out twimlcon.com/blog for our daily recaps, and consider joining us for week two! Before we jump into today’s interview, I’d like to say thanks to our friends at Microsoft for their continued support of the podcast and their sponsorship of this series! Microsoft’s mission is to empower every single person on the planet to achieve more. We’re excited to partner with them on this series of shows, in which we share experiences at the intersection of AI and innovation to inspire customers to reimagine their businesses and the world. Learn more at Microsoft.com/ai and Microsoft.com/innovation Sam Charrington: [00:01:29] All right, everyone. I am here with Gurdeep Paul. Gurdeep is a corporate vice president with Microsoft. Gurdeep, welcome to the podcast! Gurdeep Pall: [00:01:38] Thank you, Sam. Really excited to be here. Sam Charrington: [00:01:40] I’m super excited for our conversation today! As is our typical flow, I’d love to have you start by introducing yourself. You’ve had quite a career at Microsoft culminating in your work in AI and autonomous systems. Tell us a little bit about your background and how you came to work in this field. Gurdeep Pall: [00:02:02] Thanks Sam. I’ve had a really nice long run at Microsoft, as you mentioned. And in fact, today is my 31st anniversary at Microsoft. Sam Charrington: [00:02:11] Wow. Gurdeep Pall: [00:02:12] So, yeah, it’s been a long career, but I really had a great time. In fact I feel like I’ve been into the candy store like three times. So my career can be divided into three parts. I worked on networking and operating systems. So that was sort of my first gig at Microsoft. I was very fortunate to work on a lot of the internet technologies when they were first rolled out in operating systems. I worked on VPNs, I’ve worked on remote access. And then I worked up to windows XP, I was the general manager for windows networking, where we shipped wifi for the first time in a general purpose operating system. And then at that time I moved over to work on communications and I started Microsoft’s communications business. So these are products that you may remember from the past, things like office communication server, which became link, which became Skype for Business, which is now Teams. So started that business from scratch, and all the way until we announced teams, in fact, a few days before we announced Teams, I was involved with that business. Though I’d had a stint in the middle on AI and I came back to work on AI. So it’s been, I would say, roughly three parts to my career and the latest being AI. And I’ve had lots of fun in all of them. Sam Charrington: [00:03:30] That’s awesome. I talked to so many people at Microsoft too, are working in AI and a lot of them started their careers working on Bing. You’re maybe one of the the outliers in that regard. Gurdeep Pall: [00:03:43] Well, the funny thing is that first stint had mentioned on AI was actually in the Bing team and I was running Microsoft speech. I was running some of our interesting explorations we were doing at Bing, recognizing objects. In fact, some of the image stabilization work we’ve mentioned to HoloLens actually came out of that group. So yeah, I worked on maps and lots of interesting stuff. Sam Charrington: [00:04:08] That’s awesome. So tell us a little bit about autonomous systems and some of the work you’re doing in that area. Gurdeep Pall: [00:04:14] Yeah. So, for the last four years or so, I’ve been focused on emerging technology and how it can be applied to interesting business problems. And, in that regard, I’ve worked on some interesting technology in the language space, language, understanding space. Worked on ambient intelligence where you could actually make sense of a space sort of make reality computable if you will. And then as I was exploring interesting emergency AI, which can solve business problems, we started focusing on autonomous systems. That was interesting to us, not just as a very interesting aspect of which AI was enabling, but also Microsoft didn’t have a lot of focus in that area before. So, when I talked to Satya and the time Harry Shum was here, we decided this was an area we were going to go invest in. Sam Charrington: [00:05:04] Interesting. And one of those investments was the acquisition of a company called Bonsai. This is a company that I know well. I interviewed one of the founders, Mark Hammond. This was back in 2017. It’s hard to believe it was that long ago. And the company had a really interesting take on using technologies that are still difficult for folks to put to productive use, namely reinforcement learning. Their take on it was this idea of machine teaching. Maybe you can tell us a little bit about that acquisition, the role that it plays in the way Microsoft thinks about autonomous systems and elaborate on this idea of machine teaching and some of the things that Bonsai brings to the table. Gurdeep Pall: [00:05:49] Sure. Absolutely. So, when we started focusing on autonomous systems, we were like trying to get our hands around this thing. People interpret the autonomous systems, many different ways. Some people think it’s only about autonomous driving, so let’s build a vertical stack. Some people think about robots, these humanoid robots with arms and joints and so on. And we’re thinking, what is our point of view? And, at the end of the day, we look at our own capabilities. We’re a software company, what is a software interpretation of the space? And it was with this sort of point of view that we started thinking about it. There was some work going on in Microsoft research at the time, which I’ll talk more about. And that’s when I first met Mark and team and we had a really good discussion and, as we finished the first meeting, I remember this thing going through my head, that this is like such a great approach. And it really fits into how we are starting to think about this space and makes sense to us. And then also thought, God, this feels like, just the wrong thing for a startup to do, building platforms and tools. It’s a tough thing. And Mark is such an incredible guy. I think you’ve talked to him, so you know that. So when we first finished the acquisition, he shared that with me too. He says, every VC I talked to, he says, why are you doing this? This is like the kind of thing Microsoft should be doing. So it was a marriage sort of made in heaven as it were, and C acquired that company. And it’s been really great, actually working with Mark and picking up from some incredible thinking that. You know, he and Keene had done and the team that was there, and then actually really expanding on that and really helping it realize its potential and also making it much more of an enterprise ready sort of an offering because this space is as mission critical and as important as it gets. So that’s been a very fun journey for the last two and a half years. Sam Charrington: [00:07:52] One of the ways I’ve heard you describe the way you’re approaching autonomous systems or that world broadly, and its two words and I still may butcher one of them, but it’s like this marriage of bits, and is it atoms that you say? Or molecules, or something else? But the idea is that,and this was something that was core to the way Bonsai Gurdeep Pall: [00:08:15] articulated what they Sam Charrington: [00:08:16] called then industrial AI. It’s a different problem when you’re applying AI solely in a software world, Gurdeep Pall: [00:08:23] recommendations on a website or looking at Sam Charrington: [00:08:27] customer churn, to when you’re actually trying to move physical goods or devices or systems. Elaborate on what you’ve seen in terms of the different requirements that come up in that world. Gurdeep Pall: [00:08:43] Absolutely. This is a very important point, when we start focusing on autonomous systems. I know people asking me about half the time, “oh, you’re talking about RPA, right?” No, I’m talking about RPA. Of course it doesn’t help when some of the RPA companies were calling their tech robots and, it could take action and so on. So it was in some ways, it was just a way for us to be clear about what we are doing. And we said, no, we’re actually focused on atoms, not things we just deal with bits. Of course, to digitize anything, you have to go from atoms to bits and then reason over it. But that became sort of the mainstay for us. The biggest difference, I would say, between those two worlds is that there is in the physical world, it is governed by some things like physics. The physical world, of course there’s Newtonian physics, and then you get into some of the multi-joint movements and you get into fluids, that’s a whole different kind of a physics which comes in. So you have to really think about modeling the real world and how then you can apply the tech towards that. The second thing I would say is that, most of the scenarios in autonomous systems pertain to taking action in the real world. And when you’re taking action in the real world, every time you take an action, the real world changes. And this is where reinforcement learning becomes a very natural mate as an AI technology for the problems that really apply to the real world, which is great because we have no other science which allows us to take a really sort of an unbounded state space and actually reason within it. And reinforcement learning becomes this really important piece in it. Lastly, I would say is that, every problem that we’ve looked at from an autonomous system space typically is one where there are experts who exist already. So far we haven’t been called to a problem where this is completely new and completely different and “oh, let’s solve it for the first time,” you know? And so tapping into the human expertise became a very important piece of this equation as well, which sometimes you don’t need to worry about, [inaudible] the data, you throw things at it and then maybe there is judging, certainly, if you want to sort of fine tune the models and so on, but that was another interesting aspect of this. Sam Charrington: [00:11:11] So we’ll be digging a little bit deeper into some of the technology that makes all this happen, but you started to mention some of the use case scenarios. Can you dig a little bit deeper into some specific scenarios that you’ve been working on? Gurdeep Pall: [00:11:27] Absolutely. And that’s, one of the things which makes this very, very interesting to me because it’s literally everything you see in the world around you can be a target for some of the technology that we’re building. Everything from smart climate controls. This is a field, HVAC control is a field that has, for the last 70 years, theres been very incremental improvement. Things like fuzzy logic and stuff like that has been used. And, we’ve seen incredible results using our approach. There things have plateaued out in performance. We were able to bring a much better performance, so energy savings or better climate control. We’ve seen oil drilling, horizontal drilling from companies like Shell, where you have these incredibly big machines and they look like these Bazookas, and you’re drilling with them. And these machines need a pretty high level of precision, so great human experts can do it, but you sometimes need more work than you can actually get that many trained experts on the problem. So being able to guide the drill bits through that. Cheeto extrusion is a very interesting, complicated process. You know, it’s very easy to eat, very hard to make. I always say, I know there are professional chefs out there, but certainly I cannot make the same kind of eggs every morning. Because even that simple task of heating the oil and getting it just right and putting the eggs in, you cannot replicate it every time. But if you’re Pepsi and you’re making Cheetos, that has to be consistent every time. When you open a bag of Cheetos, everybody’s familiar with the fluffiness and the crispness, and so everybody’s a judge and you have to win that every time. So very hard problem, because you have this corn meal, which was mixed with water. It’s impacted by the age of the machine which is extruding, sometimes impacted by humidity, temperature, all these things. So it’s a highly dynamical system and experts today, they sample and then they tweak, and then sample and then tweak, and they’re really, very stressful jobs of trying to keep that quality right. Otherwise the quality folks will come in and reject the material. So this is a problem we’ve been approved to apply our tools to, and basically consistently keep tweaking the parameters of this process so that you can have consistent Cheetos coming out on the other side. Chemical process control and other polymer manufacturing. Very, very hard problem. Some of these problems take six months to design the process for producing polymer for a particular grade. And, if you’ve been able to apply this problem, they’re both in the designing and the actual manufacturing process itself. Our favorite thing is flying things. Bell Flight is an incredible company, they have all kinds of commercial as well as a military applications for their vertical liftoff vehicles and so on. They’re trying to bring autonomous capability to those things. So we’ve been able to apply this towards that as well. So as you can see, anything which has control in the real world where you’re sensing and you’re picking an action, and you’re taking that action sensing again, this kind of a loop exists, this technology can be applied. Sam Charrington: [00:14:53] It’s been interesting over the past few years, just reflecting on some of the early conversations I had with Mark and the team at Bonsai around. There’s kind of this pendulum in the industry where we started out with kind of, rules, like physics and how things work. And we’ve kind of early on in the, in applying AI, we throw all those rules away and kind of leaned heavily on data and statistics. And over the past few years, there have been efforts, both in academia as well as what you’re doing, to kind of incorporate the rules and the human expertise back into the equation, without tossing everything that we’ve gained in applying data. One of the interesting challenges, when you layer on the physical world here is simulation, and how do you let an agent explore and learn without destroying helicopters and lots of Cheetos? Share a little bit about the challenge of simulation and how that’s evolved to help make some of these problems more tenable. Gurdeep Pall: [00:16:01] Yeah. Yeah. I think that’s such an important piece of this equation. Reinforcement learning is great, but reinforcement learning requires many, many, many steps, literally just to get a policy to be robust. You can be six 60 million cranks in before you start to see your policy start to develop at the appropriate level. So the question is, how do you go do that in the real world. And this is, one of the big insights I think the Bonsai folks came up with, and then this was some work that was happening at Microsoft Research coming at it from a very different direction, but they sort of merge together.   This is AirSim, and I can talk more about that, but the ability to model the appropriate aspects of the real world so that you can actually take action against them, get the right input back, and use that to train the model has been sort of the biggest insights here. Because really, what it says is you’re taking the physical world and you’re creating a mapping of it in the digital world, which then allows you to train the models quickly. And that’s where these simulators come in. Now simulators can be, depending on what they’re trying to simulate, can be very computationally intensive. And if you are nervous towards equations and things like that, cFDs. These are pretty long running simulations and some are, of course, faster. Now because we are using simulators for training AI, we want to crank this very, very quickly. So sometimes you end up with this problem where the physics, or at least how that physics is approached using these mathematical equations, actually becomes like a big piece of the problem. And so this is an area on how to take simulation, and how do you mate it with the training of the AI in a way that you can do it fast, you can do it cheap and you can frankly do it in parallel because that is one of the things, we have with some of the RL algorithms now is that you can actually take a policy, the last best known policy, you can explore in thousands of machines at the same time, you can take the samples and come back and update the policy. And then you take that, and again, you fan it out and you’ve got learners which are learning very quickly.  Getting all that figured out is actually one of the big things we managed to get done after the acquisition as well. And it’s all running on Azure and really allows us to do stuff efficiently. Sam Charrington: [00:18:33] You mentioned AirSim what is that, and what’s the role that it plays? Gurdeep Pall: [00:18:36] Yeah, so fierce them was a project in Microsoft research, which started off in a team that was exploring drones and how you bring autonomy to drones. And they had very similar experience. This was, I think they started in 2015. They would go out with their drone in the morning and they would come back with a broken drone in the evening and they will have very, very little data. And it’s like, how are we ever going to get enough data to actually get this thing to fly, to do even the basic tasks? So that’s when they looked at some of the work that is happening in, frankly, the gaming world. And they looked at some of the incredible scenes that could be rendered with unreal and unity and those kinds of things, which, if you’ve seen Forza and stuff like that, I mean, these things start to look pretty real. And they said, let’s create a simulator for perception oriented tasks, where you can create a scene and you can integrate physics into that scene for the different objects that are involved. There could be a flying object, it could be something with wheels, which is driving, et cetera.   And so you integrate the physics and now you’ve created in an environment in which you can train AI. Now it could be reinforcement learning where you’re sensing. So, you model the actual sensors inside this virtual environment, and you are able to use that for reinforcement learning and taking actions. Or you can use these sensors that are modeled inside of AirSim itself, and you can just generate lots of data on which you can do supervised learning offline. For both these purposes. So AirSim, they created this tool for themselves and they realized it’s so powerful, so they put it out as an open source utility. So today it has more than 10,000 stars on GitHub. It is really one of the most popular tools because others are realizing that, this idea of being able to simulate the reality is a very powerful approach. Sam Charrington: [00:20:35] So, can you maybe talk us through for some of the, any of the use cases you described when you go into an environment with a real customer, with, real problems. What’s the process to actually get something up and running and demonstrate value that they can build on meaning concrete value as opposed to theoretical POC value. What, what does it take to really do that? Gurdeep Pall: [00:21:02] I think, and this is something that we’ve been working on and we will continue to work on because our goal is to get this to a point where people are able to identify that this is a great tool for the problem that they have. It’s not like some sort of a speculative exploring exercise. They know that they’ll definitely get the results if they adopt this tool chain and going from there, to actually training the policy and to be able to export the brain, and actually start using it at the real world. That period is pretty short. So this is a journey for us, it started off fairly long. And now we are at a point where we are focusing on these so-called solution, accelerators, these areas where, the problem is very clear, what we are solving, how to solve it is very clear. And then some of the things that you need, like what simulators do you need sometimes, folks already have simulators, other cases, they need a simulator. And then the entire thing is stitched together and all they need to do is come in and create the variations for the problem, create the policy, and then go ahead and use it. But this is what is needed to take a customer from, “Hey, I’ve got a problem. I don’t know what this thing does. Maybe I’ll understand that.” No. Okay. Now I know kind of a problem. I don’t know if the problem can be solved with this or not. So this is what we’ve been targeting. And as we’ve gotten our solution explorations to be very crisp, our own how we talk to customers because there’s, as you’re alluding to. There’s an education thing here, there is a confidence thing here. So we have to address all those pieces and we’re bringing the customers along the journey. The great thing is, customers like Pepsi moment, one thing they thought successful. They looked around the factory and said, I can put this approach on many things and that’s the conversation we’re having right now. The same thing with Shell, same things at Dell. So, this is the journey. Sam Charrington: [00:23:01] I appreciate in that the idea that to the contrary of what you might think if you read popular reporting about AI, it’s not like a silver bullet, particularly in this domain where, you’ve got some tool chain and it applies to every problem that any customer might have. And it sounds like you’re being strategic, selective and building kind of expertise and supporting tools around specific areas, so that, to your point, at when you are engaging with someone, they can have a high degree of confidence that you’ve done this before, you know how it’s going to work and what the process is. Gurdeep Pall: [00:23:37] Exactly. And the other interesting thing that we found, which is I think a little unique compared to some of the other things we’ve done with AI, is that the experts that we end up talking to in the different industries and these application areas, they have never encountered AI before. Folks who went to engineering discipline schools, real engineers, not fake engineers like software engineers, like us. I mean, these are like mechanical chemical, what have you. And when they went through college, they did Matlab and they did learn Simulink and so on. And they have relied on a set of tools that have given them employment, giving them career success and stood the test of time. And here, these five guys walked in with a swag and, Hey, we got AI for you and it’s called reinforcement learning. You gotta it’s really awesome. You got to try it. I mean that just doesn’t work. You should really bring them along. And then they have some real, real things that we’ve had to sort of go and take in like safety. Even if this thing worked, they want to be able to assert that this thing is gonna do something crazy. I mean, when you have that horizontal drilling machine from shell, And I mean, this thing can drill through anything. I mean, it’s this huge thing. There was a wall street journal article about three years ago when we first did this project with a two years ago, we did the challenge and, for them, they want to make sure that this thing actually is going to be safe and I’m going to create another new problem while it solve one for one. Yeah. So it’s, it’s been a learning thing for us, but it’s the need for the education, the need for bringing these folks along. And this is one of the reasons we did this project more app, which is this very interesting device. It’s like a toy, basically. It’s the three robotic arms, if you will. And there’s a clear plate on top. And the task is to balance a ping pong ball on this device, on this plate. Now this problem, of course, they’ll image it. The engineers will go to pin, right? I mean, PID control is something, in college. And guess what? So we said first, let’s start with Pitt. He does a pretty good job. But then he said, okay, well, I’m going to toss the ball onto the plate and see if it catches it well, turns up it doesn’t catch it. So that starts, then he said, I’m going to add more complexity. How about we try and make the ball go around the edge of the plate. So as the problem progresses in complexity, You now realize that the only way you can solve it is if you had something like our tool chain, which we have with Bonsai, you create a simulator and you have policy that you’re training, and then you’re able to get to that level of performance. So we did this solely to bring engineers who are used to a particular way along and to start to believe, and to start to get excited about this. So we created the sort of metaphor in which we could connect together with them. Sam Charrington: [00:26:37] Interesting. Interesting. It reminds me of this idea of, why deep learning is, is so important and software 2.0 and how, what is, where, where it’s particularly powerful is. In solving problems that we didn’t know how to write the rules for like in computer vision. Like how do you identify a cat versus a dog, right. The rules for that, who knows how to do that, but the neural network and figure that out. And similarly, there is a, a range of problems that PID is easily applied to, but there’s also a level of complexity that it is difficult to apply it to. And that is where you’re finding. The value in applying RL. Gurdeep Pall: [00:27:18] Exactly, exactly. And, we’ve you seen that either. They were just too many moving parts. So the folks had achieved automation, but they have not issued autonomy. So either it’s that class of problems, wherever you’re getting traction or that with the existing methods, they’ve plateaued out and performance. You know, there is more performance to be had, and this is incredible. Like you would think like, we’ve figured everything out, right? I mean, as a society and with all the advancements that’s happened, but HVAC control in buildings, we’ve been able to get startling results. I mean, this is millions of dollars, like on a campus that you can save. And then also the green benefits that you get from that. So there’s just tremendous opportunity. Sam Charrington: [00:28:07] So maybe let’s drill into that example more because I do want to get to kind of a more concrete understanding of what is the process look like? I’ve got a data center or physical plant or something, and, I have my HVAC costs are through the roof and someone told me about this AI thing on an airplane. And I called her deep, like, what’s the first thing that I do and how do I get from there to some cost reduction or greater efficiency or whatever my goal is applying some of this. Yeah. Gurdeep Pall: [00:28:40] So in this particular case, that’s, we’re focusing one of our solution accelerators just on this use case. Okay. And so we are able to say with very high confidence that. If you can give us this information. Which is typically you can have data that you might have collected because a lot of these are now sort of IOT sort of devices, the data that you’ve collected, we’re able to go from that data to we ingest that. And then this case, which is sort of another double click on the simulation thing, we able to actually create a data-driven simulator and we are able to now start creating a policy. Now they do need to specify, and this is where machine teaching comes in. They need to specify to us what behavior they are desiring. Which means that, that specification can be, is fairly, flexible. So you could say things like, I want it to be really informed between these times of the day. Or you could say if the outside temperature, which becomes one of the state variables, which goes into creating the brain, if that variable is outside of this range, then I want this kind of a behavior, in somewhere I want it to be cooler and inventory, I want to be warmer. All those inputs that are there now create a policy for me, which automatically controls the HVAC system, which means turning on the fan or turning on the heat or turning on the cooling and to do it dynamically because once the brain is built, all you have to do is to connect the inputs and the actions. So inputs is where we are sampling the state. And actions is what you’re saying. Okay. Increase heat, decrease, heat fees, the fan done off the fan, et cetera. And by the way, it’s not just temperature in this case. It’s also the carbon dioxide and nitrogen levels. And so on, all those are making sense and then the actions will be taken based on that. So that is what the position we would have. And we, again, trying to make it as. Turn key, et cetera, but recognize that every building is different. So every building has its own climate sort of fingerprint. And so there is work required in creating the brains. So you could take a brain off the shelf and use it. You know, I can’t say whether that would work better. It might have better energy consumption, but then use the people are not as comfortable. So you have to sort of tweak it and the more efficient we can make this end to end thing, but sooner folks can realize the value and a brain in this case is essentially a model or an agent or something like that is that fair? Great question. I have had, lots of folks asked me, including bill Gates. Why do you call it brain? and I think it’s a really good question. So the way we talk about it is it’s actually a collection of models. Okay. So. autonomous system tasks, sometimes these be decomposed into different parts. Like for example, if sort of robotic hand, it had to pick up an object and to stack it, you can pick up, can reach, can be one action. Pickup can be another action in a move and then stack. These are all distinct actions. No, some are pretty easy. You can almost sort of program them, reaching as nowadays, obviously many program depending on the device you have, but some need to be trained. So now this whole collection of things has to be orchestrated. And the right piece has to be invoked at the right time. And each one of them either is programmed, or this is a model and it’s a deep learning model. The Deanna Lynn Swann, and putting all of it together, becomes the brain. In fact, that’s how the human brain works. So the name is actually quite great, the visual cortex, and then, that’s the one has a particular purpose of, then it gives us another piece which then does reasoning. And then, you want to take. The action and that invokes a different part of the brain. So that’s why we call it a brain. And, yeah. Sam Charrington: [00:32:33] Okay. Going back to the HVAC example, you mentioned that a data driven simulation, so I’m imagining you coming to my company, I guess since this is my scenario and I’ve got the data center, I probably don’t have a simulation that exists for my data center and HVAC. And so. That’s immediately a big challenge if I need that to train a brain, but you’ve got a way to generate that just from the data that I’ve collected. Gurdeep Pall: [00:33:01] Yes. And this was something that we are having to do a lot more of as we are swinging and talking to customers, some have a simulator. Interestingly, now, simulators, as have been used for designing, modeling, testing they’ve existed. But typically there’s been a human on one side of the simulator, driving the simulator for whatever purpose they want. You know, if it’s flight simulator, you’re, you’re flying it. But for our case, It’s the AI, which has been trained as sitting on the other end of the simulator. And so some cases, we were able to take their existing simulators and to actually change the use case and still make it work okay. In some cases that worked great. Now, in some cases it didn’t work great because their simulator was designed for really different booklets. Like if you do CFD. the purpose is to model this thing and you have to model it to high precision. I mean, this is going to be, a plane flying through rain. So, it has to be very precisely done, but each crank, they typically have like HPC setups for CFD simulation, but each crank can take so much. So how are we don’t crack it so fast that we could learn, right. So we said, Well, that doesn’t work or they just don’t have a similar at all, like your case. So that’s where our next step is. Can you give us data? And for many folks, they have the data. If they have the data, then we say, okay, let’s start how we can take data. And how do we can actually make it into something that we can meet with our system. That worked for certain class of problems. And then we said as a complexity of problems, started increasing, we realized that we need a new trick up our sleeve. there’s a research group as part of my team. And we started looking at how can we apply deep learning to learn from this data to create simulators there. We ran into the first insight, which is that, deep learning is designed for sort of inference, right? So you run one crank. And you get a prediction and you’re done well. It turns out the real world is not like that. You know, this real world is modeled with differential equations, differential equations. Basically, you’ve got time and you’ve got this thing, which is continue to change its behavior with time. Depending on the previous state and the actions are being taken. So there’s some work, great work that is being done right now. And we are publishing it right now. In fact, some of it is already out in deep simulation networks and basically it’s like a noodle competitional fabric where you have, it’s kind of like ordinance where. You have with every crank, you take the output and sort of feed it back into the next time cycle. Of course you have to have, so the sampling of time can be actually variable. So you have to that neural competition fabric has to do with that, which is a pretty big thing in itself, but it also allows you to have many different components inside the simulation each, which is sort of learning in a different way. For example, if you’re tossing a ball. The ball has it’s physics. And then there’s the environment that has physics, which is new for me in physics, but turns out the Newtonian physics doesn’t change. You can toss a ball, you can toss up a water. So if you are training those components, it’s give me some of these pre-trained components. If you will, that can be trained ones, then you can, maybe tweak it based on the, the object will have different physics. But now, so you did this noodle competition fabric, which plays out in time. You are now able to have multiple components and you train this thing. This new architecture we believe is a pretty transformative thing in simulation because it now allows us to offer any complex simulation space. Which basically has lots of differential equations that are sort of running around inside of it. And we can train it reasonably quickly. Really.  It’s kind of like a graph noodle network because you have time and you have space. If you look at the components that actually make space. So there’s message passing, which is happening between every stage and that allows the learning to happen. And this backpropagation, which happens in which each of the components, like eventually you’re able to get a trained model, which can run like a simulator. So you stopped at some state to take an action, distinct States changes and you’re able to crack it. So we’re really excited about it. We think this will be a big accelerant in the approach that we have. Again, we get the data, use it, we can go at it and this similarly, they can also learn from other simulators. So if you have something that is quite inefficient, in terms of competition and stuff like that, this thing can learn of it. And then it can execute very fast. Because once it learns the fundamental differential equations that are underlying, this is just inference. It’s not doing any kind of a big competition once a string. So that is an area that we’re really excited about right now. Sam Charrington: [00:38:09] Awesome. So first step is capture some data. Next step, use that to train a simulator using this idea of deep simulation networks, potentially. Then you mentioned kind of using that to create a brain. It sounds like part of that is you corrected me when I said it’s a model. So part of that I’m imagining is figuring out the right level of abstraction for these different components or pieces. And then individually, I guess one of the questions that I had around that was. And when we talk about reinforcement learning and kind of a academic sense and how difficult it is to put it to use in real world situations. A lot of it has to do with like carefully crafting this objective function or cost function and all of the issues associated with that. You described what the customer has to do as more, less about describing this objective function and maybe constraining what the solution looks like. Am I kind of reading that correctly? And maybe you can elaborate on that and help us understand. Gurdeep Pall: [00:39:17] Absolutely. And you’ve, you’ve hit the nail on head on with reinforcement learning the reward, specification, the reward function that he had, the specification of that becomes the next problem. In fact, we have a very famous researcher at Microsoft research. Blackford, he’ll tell you that. He says, if you have a problem, And you modeled it as a reinforcement learning problem. You don’t have to, it really gets to the core of it, this thing, which is that getting the reward function. Right. And there’s lots of funny stories about bad reward functions and unintended consequences, but we ran into that and they still allow that in our tool chain, you can specify the board function, but now we are actually. The machine teaching, we read exploring what are other ways for an expert to describe what they want done and we’ve come to the concert or goal. So they specify the goal, using a particular approach, the semantics of which are contained within the problem and the environment. And we will automatically generate the reward function. Under the covers based on the goal. And we found this to be a very, much more approachable thing for, for our customers. In fact, a lot of our new engagements with customers, most of the time we ended up using goals. So that’s been, you know, and like I said, you know, we’re on this learning thing ourselves. And, you know, we’re seeing what’s working, what’s not working how to enhance it and move from there. Sam Charrington: [00:40:45] And so some of these like classical challenges with reward functions, like delayed attribution and things like that, that you see in reinforcement learning does goals as an approach. Side skirt those in some ways, or are those still issues that you see in the autonomy systems world? Gurdeep Pall: [00:41:06] Yeah. I mean, those are still issues we see and separately the algorithms are getting pretty good too. So he, you know, there’s an active area of research and better algorithms coming up. we are, you know, we are, we stay on top of that and be an incorporating more and more algorithms now into our tool chain because there’s some albums. Better suited for certain class of problems. Others are better for suited for another other type of problems, which then of course moves the problem to the next layer, which is which one do you select for? Which kind of problem. And you don’t want, obviously folks who’ve never done programming or AI to say, Oh, you tell me, do you want SAC? Or do you want this. No idea. Right? So we are also trying to put in that intelligence, so that it’s a, it’s a meta reasoning thing, which says, you know, given this kind of a goal, given this kind of a problem, and this is a sampling rate. So state space let’s automatically select the best algorithm. And we will use that for training. So, you know, nobody ever has to know, like, you know what craziness you had walked under the covers, but staying on top of this has been a really important piece for us. You know, we use this framework called re which has come out of a lot of the book please. you know, still can source Facebook. We are one of the. Big users of it and contributors for it now, in fact, the rate team 13, which is building that my team in Berkeley are literally in the same building on one floor apart. So there’s a lot of good intermingling there as well. So because we using that framework V relive is how people are adding more and more algorithms, you know, being able to really tap into that and what we find, of course, sometimes, you know, people will write an algorithm to publish a paper, but it’s not really Production grade. So then these come back and do our own implementation of it and contribute that. Sam Charrington: [00:42:54] So, kind of in this journey, we started with data, we built a simulation, we built a brain out of that simulation. Then that brain is able to then help me control my data center. HVAC. I’m imagining in this scenario that, you know, I still care about the safety issue that you mentioned. Maybe not, you know, it’s not a drill, that’s going to destroy my data center, but you know, I don’t wouldn’t want the policy that you recommend to decrease the life of my coolers or chillers. And then there’s also maybe explainability issues that arise. Like, why are you telling me to, you know, my HVAC engineer has always set the XYZ at six and you’re saying it should be at eight. Why is that? Gurdeep Pall: [00:43:40] Yeah, no, this is, it’s such a great topic. And, I’ve talked to my team and given my, experience at Microsoft. I remember when we were building windows NT and putting, networking into it. And so on, we had no idea how stuff was going to be attacked when the internet was starting out In fact, I was the development manager for the TCP IP stack for windows from 95 to 2000. I still managed to keep some of my sanity, but I can tell you, there were folks on my team who really were pushing 20 updates a week because we were starting to get attacked with every layer bottom of the network, moving its way up. All the way up into sockets, you know, all the tear drop API’s and all that. And then when they got to the top layer, that’s what is really started the most sophisticated attacks. That’s where I don’t know if you remember back after windows XP shipped the entire team took one year to harden the system. Because it was no longer just my problem as the networking guy, it was everybody’s problem. People who do buffer overruns and they would insert code and all that. So literally every component had it So the reason I’m telling this story is that I think that safety is a problem like that. And when we came into it, Hey, we got really good control and I can show you it better performance, but then there’s all this hidden stuff that you have to deal with. That’s been a big realization for us. it’s a multifaceted approach. So the first thing is, you know, you talked about like the wear and tear of the machine or breaking it down. A bunch of our use cases right now with customers are with those are factored in, and actually they’re factored in at the time of the teaching. So when you talk about the state space and something that has to be specified so that the policy is taking that dork out, so that component gets handled. The hardest safety things that are, there are like when the brain is operating, like, are we really at the mercy of the, sort of a deep learning model, which is going to say, take this action. And then, you know, the consequences of that are actually out of scope for, for, for what we’re doing. And this is where we started, you know, this is going to be ongoing work. This is never done. You know, kind of like what cyber security right now, we’re learning. It’s never going to be done, but we want to take some pretty concrete steps. So one really important work. And there was a newspaper that is published on this is that he developed a policy and the policy suggests an action. What do you do is you introduce another layer after that to decide if the action is a safe action or not. Now what goes into deciding, is it a safe action or not? Can be many things can be predicate logic. It can be temporal logic, you know? So you can pretty much assert no. Yes, because it is outside some range or it actually can be trained things itself. Like imagine adversity. Models which go into that component. So now when you are specifying in machine teaching right upfront, you can now start to insert ways where, you know, safety can be specified and that actually follows a very different path. Some of it will actually follow the path of the policy building itself because some things can be caught there, but other things are actually more brought into bear at operation style. And that is very important because, you know, you probably heard about some of the discussions on how like level five autonomy is going to be rolled out in cities. And they’re saying, you know, these bus lanes and stuff like that. And I think it’s a wonderful idea because you’re solving the other side of the equation, which is you can control. So imagine like, you know, I always talk about this example and my team just sort of looks at me strange. So imagine you have the sort of armed robot and it is working the space with humans, also working. It is very common. You see this in machines in factories, they will have a red line or dotted red line around the protection. And the humans know they’re not going to go there. And now you’ve created a rule which says, regardless of what policy, what action, the policy tells you, if it is outside of radial, whatever distance that is. You will not take that action. So you’ve created an environment in which humans and this armed robot to swing around can actually co-exist in the same place. So it’s a very pragmatic approach, but it has to be part of your solution. Otherwise you don’t, the engineers are right. I mean, these crazies are showing up with reinforcement learning and it’s going to create all kinds of issues for, for us safety issues and so on. Sam Charrington: [00:48:33] Yeah. I love that analogy and just taking it one step further. It would be a lot more difficult to build into your kind of motion trajectories, for example, a way for this arm to avoid a human that stepped into the zone, then building something that determines that a human has stepped into the zone and just shuts everything down. And I think what I’m taking away from what you’re saying here is that. Safety is a multi-layered problem. And it’s not all about kind of making the neural net responsible for everything it’s about identifying, you know, how you can enforce safety in these different levels. And thinking about it as a system, like from an engineering person. Right. Gurdeep Pall: [00:49:16] Exactly. I think that has been a big learning for us as well, that, you know, it’s not just resolved the hardest they have problem and suddenly, you know, everything and they will come, right? No, you have to really think about it that way. And I think this, you know, the safety layer, which evaluates after every action is recommended, you know, it has to be this amazing, like. This is where a lot of the new capabilities will come in in the future adversity stuff. But you can imagine a completely separate model, which is basically trying to, this is going to give you this one or zero. If anybody human has stepped into the red line, it is going to give you a one and it shut off. Right. And that keeps improving the perception and things like that. So, yeah. So it is, it is a system thing as you, as you know, that’s, that’s very good to think of. Sam Charrington: [00:50:03] Right, right. So maybe to help us wrap up. It’s the very beginning of 2021 autonomous systems is a. Kind of a broad area, where do you see things going over the next few years? How does this all evolve? Gurdeep Pall: [00:50:18] Yeah. You know, we believe that we’re entering the era of autonomous systems and you know, it’s always hard to predict, right? This is famous billboard thing. Prediction is hard, especially about the future, but, you know, I remember looking on windows, NT, the networking of the internet, you know, these things just, they explode. And some right elements have to be there for this explosion to happen. And I think with the breakthroughs in AI, with the focus on solving business problems in a complete way, like we talked with safety with the industry coming along, like, you know, we’ve been spending a lot of time on data during simulators, but we believe that the simulation industry that is there, you know, we really want to partner with them. We’ve got great partners with MathWorks, you know, with you to bring them along. So that. Together. We can create an end to end tool chain in which these autonomous systems can be created without, you know, requiring, you know, the level of high level of expertise. That for example is going into a lot of the autonomous driving. I mean, the teams that are building this dominance, driving stacks are just super deep driving. There’s super experts and they’re building it all in the sort of silo way, very vertical way. We want it to be horizontal components. Then you’ll have some of the vendors of autonomous systems where anybody can come in, they come and describe the problems. They’re able to create the brain and employ it. That’s going to explode the number of autonomous systems that are out there. And I think this is great for many different things, including our climate, including, you know, resilience that we’ve seen during COVID where logistics and these things just have to continue. Production has to continue. So I think now’s the time and, you know, I think it’s going to happen. Sam Charrington: [00:52:05] Awesome. Awesome. Well, good deal. Thanks so much for taking the time to chat and sharing a bit about what you’re up to there. Gurdeep Pall: [00:52:13] Totally my pleasure. And you know, you have a great podcast, so it’s great to be here talking to you about my stuff. Sam Charrington: [00:52:25]Awesome. Thank you. Thank you. Take care. All right, everyone. That’s our show for today to learn more about today’s guest or the topics mentioned in this interview, visit twimlai.com. Of course, if you like what you hear on the podcast, please subscribe, rate, and review the show on your favorite pod catcher. Thanks so much for listening and catch you next time.
Today we're joined by Subarna Sinha, Machine Learning Engineering Leader at 23andMe. 23andMe handles a massive amount of genomic data every year from its core ancestry business but also uses that data for disease prediction, which is the core use case we discuss in our conversation. Subarna talks us through an initial use case of creating an evaluation of polygenic scores, and how that led them to build an ML pipeline and platform. We talk through the tools and tech stack used for the operationalization of their platform, the use of synthetic data, the internal pushback that came along with the changes that were being made, and what's next for her team and the platform.
Sam Charrington: Hey, what’s up everyone! We are just a week away from kicking off TWIMLfest, and I’m super excited to share a rundown of what we’ve got in store for week 1. On deck are the Codenames Bot Competition kickoff, an Accessibility and Computer Vision panel, the first of our Wellness Wednesdays sessions featuring meditation and yoga, as well as the first block of our Unconference Sessions proposed and delivered by folks like you. The leaderboard currently includes sessions on Sampling vs Profiling for Data Logging, Deep Learning for Time Series in Industry, and Machine Learning for Sustainable Agriculture. You can check out and vote on the current proposals or submit your own by visiting twimlai.com/twimlfest/vote/. And of course, we’ll have a couple of amazing keynote interviews that we’ll be unveiling shortly! As if great content isn’t reason enough to get registered for TWIMLcon, by popular demand we are extending our TWIMLfest SWAG BAG giveaway by just a few more days! Everyone who registers for TWIMLfest between now and Wednesday October 7th, will be automatically entered into a drawing for one of five TWIMLfest SWAG BAGs, including a mug, t-shirt, and stickers. Registration and all the action takes place at twimlfest.com, so if you have not registered yet, be sure to jump over and do it now! We’ll wait here for you. Before we jump into the interview, I’d like to take a moment to thank Microsoft for their support for the show, and their sponsorship of this series of episodes highlighting just a few of the fundamental innovations behind Azure Cognitive Services. Cognitive Services is a portfolio of domain-specific capabilities that brings AI within the reach of every developer—without requiring machine-learning expertise. All it takes is an API call to embed the ability to see, hear, speak, search, understand, and accelerate decision-making into your apps. Visit aka.ms/cognitive to learn how customers like Volkswagen, Uber, and the BBC have used Azure Cognitive Services to embed services like real-time translation, facial recognition, and natural language understanding to create robust and intelligent user experiences in their apps. While you’re there, you can take advantage of the $200 credit to start building your own intelligent applications when you open an Azure Free Account. That link again is aka.ms/cognitive. And now, on to the show! Sam Charrington: [00:00:00] All right, everyone. I am here with Adina Trufinescu. Adina is a Principal Program Manager at Microsoft, working on Computer Vision. Adina, welcome to the TWIML AI podcast. Adina Trufinescu: [00:00:12] Thank you so much for having me here. Sam Charrington: [00:00:14] Absolutely. I'm really looking forward to digging into our chats. We'll be spending quite a bit of time talking about some of the interesting Computer Vision stuff you're working on, in particular, the spatial analysis product that you work on, and some of the technical innovation that went into making that happen. But before we do that, I'd love for you to share a little bit about your background and how you came to work in Computer Vision. Adina Trufinescu: [00:00:40] Definitely. I have joined Microsoft in 1998 so I'm a veteran here, and I started as an Engineer. So, I have an Engineer background, not a [research] background. Then after spending more than 10 years as an Engineer working on primarily Windows OS, I switched for Program Management, and I worked on a bunch of products until eventually, I started working Windows on speech recognition. At the time I was working on Cortana speech recognition, and then, later on, I worked on speech recognition for HoloLens, the mixed reality device. Then for the past year and a half, I transitioned to computer vision.  So I'm a Program Manager. I'm working with both the engineering and the research teams on shipping special analysis, and then special analysis - it's a feature of  Computer Vision in Azure Cognitive Services. Then it just shipped as of this week, at Ignite in the public preview. Sam Charrington: [00:01:37] Nice. In any other year, I'd asked you, what's it like down in Orlando? Because that's where Ignite is historically held. I've been to the last several, and I've done podcasts from Ignite, but this time, we're doing it a little bit virtually as Microsoft is with the event. But super excited to bring to our audience a little bit of this update from Ignite.   Tell us a little bit about the spatial analysis work that you're doing there, and start from the top. What's the problem that the spatial analysis is trying to solve? Adina Trufinescu: [00:02:13] So, before I talk about spatial analysis, let me give you a bit of background information about Azure Cognitive Services for Computer Vision because it's important to highlight the difference and the novelty that spatial analysis brings. So, the existing Computer Vision services are image-based, meaning that basically, the developer passes in an image at the time, and then the inference happens either in the cloud or in a container at the edge. Then the result of the inference image by image is being sent back to the developer. Spatial analysis brings the innovation of actually running Computer Vision AI on video strips. So basically it analyzes life. It can also be recorded but primarily it was designed for live video streams and real-time analysis of these video streams, and in this case for the purpose of understanding people’s movement in physical space. Then when you talk about people’s movement, we're talking primarily about four things. The first one is the more basic scenario of people counting. So, basically in a video stream, we run people detection and then either periodically or when the count of people changes, we provide the insights indicating how many people. Then we have social distancing, which is actually called people distance, but we call it social distancing for the obvious reason. But basically you can configure the desired threshold at which you want to measure the distance between people, and then let's take the magic six feet number, right? So basically, the AI is going to detect the people in the video stream, and then every time, when the people are closer than the minimal threshold, then an event is being generated, indicated that the minimal distance has not been respected. So these are the first two, and then the next two are what we call entry and exit of physical spaces. So to actually detect when people enter or leave a physical space, we have two operations. One is called person crossing a zone - in and out of a zone, and the person crossing a line. Let's take the example of person crossing a line. Let's say that you have a doorway, so you can draw a directional line, and then every time the bounding box of the detected person is crossing and intersecting the line, then we can generate that event, telling you that the person enter the space or exit the space. Sam Charrington: [00:04:43] Awesome. So the context in which this is being offered, as you mentioned, the comparison to the image-based services and image-based service might be something I'm using to do object detection or segmentation of an image. I'm passing that to an API and I'm getting a result back where the service is telling me what it thinks is in the image and the probabilities and this is extending that same general idea to video, essentially. Adina Trufinescu: [00:05:17] That's right, and that we started with the spatial analysis for people movement. We're looking to extend this to other domains for other relevant scenarios in the future. Sam Charrington: [00:05:28] Can you give us an example of the other types of scenarios that folks might want to perform on video? Adina Trufinescu: [00:05:36] So there are many industries where this is relevant. So, basically you can think about retail which currently is targeted towards this person movement analysis but think about,  vehicle analysis. So, that would be like another kind of audit that when detected in a video, then you can have interesting AI insights generated and interesting scenarios. Sam Charrington: [00:06:02] So, yeah, from even that explanation, I get that unlike an image-based service where generally, these work along the lines of ImageNet where you have these many classes of things that can be detected - toys and fruit and oranges, and things like that. In video, you're starting with very specific classes. Can you talk a little bit about why that is? Is it use case driven and that counting people and vehicles, and very specific things are more interesting in video than counting random objects? Or is it more a technical issue or limitation? Adina Trufinescu: [00:06:46] Oh, it's not a limitation. So, we started with understanding people movement because this is where the customer signal was. So, I've mentioned retail. We also have many scenarios in manufacturing or in real estate management, and also the current events was also informing our decisions on when to start, but the way the video pipeline and the detection models are being inserted in the video pipeline is fairly generic, which is why we're looking at enabling other domains in the future. So basically,  the detector model that we have for people today can easily swap with a different detector model for a different domain. Sam Charrington: [00:07:24] Okay. Okay. I'm thinking about the use cases. It sounds like the use cases that you are envisioning are camera-based video streams, as opposed to, I'm going to pipe in a stream of commercial television and ask your service to find anytime a particular can of Coca Cola shows up, or something like that. That's another use case that I see every once in a while, but clearly it's not one you're going after at this point. Adina Trufinescu: [00:07:59] Not for now, not for now. Speaking about the cameras, the cameras that we work with, we don't actually provide like a given model. So, any model that supports the RTSP protocol, which is like the universal protocol. Well, I shouldn't say universal, but it's a common protocol for video streaming. So, you can have a camera or you can have an NVR; any video management system that actually is capable of streaming over the RTSP protocol. We work with that. Sam Charrington: [00:08:31] Okay. NVR being network video recorder, surveillance use case, or a technology used in that use case. Adina Trufinescu: [00:08:41] Yeah, that's right. So basically we're looking at not only at a greenfield site areas where customers install new cameras, but also at existing cameras and existing video systems. Sam Charrington: [00:08:51] So when I think about this type of use case, it makes me think of something like a ring camera where maybe I can grab a raspberry pie or something like that, and have it call out to the service, put in a little USB camera on my raspberry pie and stick it by my door, and do a roll your own ring camera and have it count people that go into some zone or something like that. Could I do that with this? Adina Trufinescu: [00:09:22] You can do something like that but the device that we are supporting and we have tested extensively for, it's actually more of a heavyweight device, it's an Azure Stack Edge.  But the idea here is that these spaces where you can dozens of cameras or you can have hundreds of cameras. So, imagine a warehouse where you could  potentially have hundreds of cameras. Basically, we want you to have a way where you can deploy at scale and you can manage this cameras at scale. Then because of video, the sensitivity around the privacy concerns and data control concerns, basically that's where Azure Stack Edge comes in where you can actually keep the video on your premises. Then all the processing happens on the Azure Stack Edge device, and then only the result of the identified data about the people movement can be sent to the cloud, put your own service in the cloud to your own tenant, and then you can and build a solution in the cloud. Then I should be more specific that the Azure Stack Edge device that we are running with is actually the one that has the Nvidia T4 GPU. So even a more departure from just a [Nano]. This is  the initial release. This is the public preview, and then we're looking at extending the range of devices and hardware acceleration capabilities for something lower, let something less than Azure Stack Edge. Sam Charrington: [00:10:55] Got it. Then for folks that aren't familiar, Azure Stack Edge is essentially a way. It's a pretty heavyweight hardware set out where you're essentially running the Azure Cloud in your data center. That's the general idea, right? Adina Trufinescu: [00:11:09] Yeah, that's right, and if you have a small space where you have, let's say 20, 50 cameras, you don't really need something of the extent of a data center. You need a room, a server closet with a reasonable temperature where you can run these devices. Sam Charrington: [00:11:33] Okay. Okay. So I'm going to have to wait quite a while for this technology to be democratized, if we will, to the point where I'm running it on a raspberry pie with a USB camera. Adina Trufinescu: [00:11:48] I was hoping it's not quite a while but not yet. Sam Charrington: [00:11:53] Not yet, and I think in this day and age, I think we have to talk about surveillance and the role of technologies like this, and enabling different types of surveillance use cases, some of which are problematic and some of which are necessary in the course of doing business. What's the general take on making this kind of service available for those kinds of use cases? Adina Trufinescu: [00:12:24] So when we release spatial analysis, we had in mind what Microsoft calls responsible AI and innovation. So this is where we recognize the potential of harmful use cases, and then with this release, we also released a set of responsible AI guidelines which had three things in mind. The first one is protecting the privacy of the end-user; providing transparency such that the end user and the customer understands the impact of the technology, and then in the end promoted trust. Then the idea there is that we want to pass this responsible AI guidance and practices to our developers. and people that actually build the end to end solutions, such that the end users, the people actually impacted by the technology, can actually be protected, and the human dignity of these people is actually uphold. Sam Charrington: [00:13:18] So it sounds like even if I did have an Azure Stack Edge, I couldn't necessarily just turn on the service and do whatever I want with it. Adina Trufinescu: [00:13:26] So, we have a process for that. We take our customers through, at least for this public preview, where you get access to the container. I'm not sure if I mentioned this, but we started not with an Azure service in the cloud but with a Docker container that you run on your premises on Azure Stack Edge, and basically the container, anybody can download it, but to actually access the functionality in the container, we want you to fill in this form. You describe the use cases that you are considering for  your solution and your deployment, and then we will look together whether these use cases align with the responsible AI guidance and then, if they do, obviously you can proceed, and then if they don't, we'll have that conversation to make sure that the responsible AI guidance is upheld. Sam Charrington: [00:14:15] Okay. Well, let's maybe shift gears and talk a little bit about some of the tech that went into enabling this. In order to do what you're doing, you're doing some kind of standard things like object detection. Is this fresh out of research papers, new techniques to do the detection and classification, or what are some of the things that you're doing there and the challenges that you ran into and productizing this? Adina Trufinescu: [00:14:44] So, I think the challenges, they vary depending on the four use cases. So let me try to break it down and then address each one. So for instance, we are running a DNN for people detection, and then we started with something like more heavyweight, and then we had to transition because of the performance concerns. I'm going to come back to that in a second, but basically we had to transition to a lighter model. Sam Charrington: [00:15:09] A big ResNet...? Adina Trufinescu: [00:15:11] Let's say a big ResNet or a smaller ResNet. Sam Charrington: [00:15:16] Okay. Adina Trufinescu: [00:15:17] I'm going to leave it at that. But the idea there is that for instance, for something like people counting, initially for all operations, we started thinking that we can stream at 15 frames per second, and hen we did that. Then we've noticed that to get maximum usage out of that Azure Stack Edge, which is quite heavyweight, right? We want to run as many video streams as possible. So basically we try to actually go as low as possible in terms of frame rate, and then for something that's person count, the person count from one second to another doesn't change dramatically. So for something like person count or a person distance, we went from 15 frames per second to one frame per second. Then we were able to maximize the usage of the GPU because now the DNN runs at the lower frame rate, and this way you can fit in more video strips. The challenge we had, for instance, with social distance with person count was around generating ground truth. So [we create] a 10 minute video. Let's say you have point in the video and you have to allocate the distance between the people, just looking at the video, you cannot figure out the physical distance between people. So that is where we use synthetic video data. So basically, we are using the same technology that our colleague teams in mixed reality for HoloLens are using, where we generate this game scenes where we can control the positioning of the people and then their relative positioning. So that was the first challenge for person distancing. The second challenge is that the DNN is going to tell you whether there are people in a frame, but it's not gonna tell you the actual physical distance. So for that, you need the camera to be calibrated. So this is where the initial thinking was that we will ask the customer for the camera height, for the angle, for the focus distance, but that wasn't practical either. So this is where we had to actually come up with a calibration algorithm for the camera, such that before the actual operations, where the DNN runs for the purpose of the operation, the algorithm for calibration kicks in such that we ask the customer to have at least two people in the camera field of view. Then the algorithm runs for detecting these people and makes assumptions for their positioning and this way, the camera height and the focal distance are actually calculated. Then we pass it back to the customer as output and we want to make sure that, that reflects the reality, but between a ground truth and the camera calibration, these were the two challenges for person detection. Sam Charrington: [00:18:06] All right. So just maybe taking a step back. We started out talking about counting people and, it sounds like there's some research or work that went into getting from this big heavyweight model to the smaller model. So that was one element of it, but also, just fine tuning the end to end process in terms of how quickly you're able to do it. In other words, what the frame rate you're using for counting people. That was part of counting people? Adina Trufinescu: [00:18:43] Yes, that's right. Sam Charrington: [00:18:44] It was just an iterative process. Keep reducing the framework until things start breaking and you're not able to count accurately or was that something where you're building out models to tell you how low you can go or something. What all went into that? Adina Trufinescu: [00:19:02] So, it was a little bit of both. It was like a constant measurement of performance and accuracy in terms of frame rate, we would go lower and lower to the point where we can maintain the accuracy and precision rates. Then you reach a breaking point and then that's how you know that you have to stop. Then when you have to stop that, I wouldn't say that this was exactly how it happened, but when you talk about frame rate and doing all these tests, this is where the engineering comes in. Then when you come about the performance of the DNN and the models, this is where research teams are making progress in parallel. So basically, it was an iterative process where, between engineering and research, they both worked together to arrive to what seems to be the best balance between performance and accuracy. Sam Charrington: [00:20:01] As part of that counting people process, you've got two sub problems there. One is identifying the people in the frame, and then you also have to know from one frame to the next, which person is which. Is that a part of the challenge here? Adina Trufinescu: [00:20:16] Yeah, that's right. So see, especially for person crossing in and out of a zone and person crossing the line, that's where the tracking part of the algorithm comes in, and to be able to tell that  it's the same person from one frame to another, in addition to the DNN model, we are running a combinatorial algorithm such that detection is telling you that I have these people. Then by extracting features, we can run the  combinatorial algorithm to tell that from frame P minus one to frame T, we have the same set of people, then the S people are detected across the frames. They are getting this anonymous identifier which tells you that there is the same person from frame one to frame ten. Something like that. Sam Charrington: [00:21:09] You mentioned extracting features to help the commonitorial algorithm. Are you pulling those out of the bowels of the DNN or is this a separate pipeline or a separate flow that is identifying features and a kind of more traditional computer vision way? Adina Trufinescu: [00:21:28] So we actually pull it from the DNN and we have the typical feature that you would expect like motion, vector, velocity, and direction in the 2D space and frame by frame, we're  looking at all these attributes. Then we're making the decision whether the same person shows up across the various frames. Then I should say that each person gets an identifier and that is an anonymized identifier. There is no facial recognition or anything of this sort. Sam Charrington: [00:22:03] Okay. Adina Trufinescu: [00:22:04] Then I should say that in our pursuit of a performance, we started this process at running at 15 frames per second because when you actually look closely at how people move in and out of a zone or cross a line, the action of crossing and the time the person crosses that line is fairly short. So we had to run it more than 15 frames per second. This is where we initially started by running the DNN for the people detection every 15th frame, still keeping it at one frame per second, and running the association algorithm every frame. The problem that we had was the accuracy and the performance had all the typical challenges where the identity of the people will be switched or the identity of two people will be merged. This is the fragmentation and merging typical challenges with association. So, if you don't actually run the detection on each frame, every time when a person is occluded or every time when a person disappears from the frame or a new person appears, that's when you have all these association problems of merging and fragmentation. So that was another motivation for us to go to a lighter DNN four-person detection. Something that we can actually run each frame at 15 frames per second. Sam Charrington: [00:23:31] Okay, but you mentioned that there are some parts of the problem that you do down at one frame per second? Adina Trufinescu: [00:23:36] Right. So, just to recap, a person counting and social distancing, we keep doing it at one frame per second, and then person crossing a line and person crossing in and out of a zone, we run at 15 frames per second. Sam Charrington: [00:23:51] Got it.  The main idea there is that for counting people and counting distance, it's not an associative problem. You're just looking at what's in the frame. Adina Trufinescu: [00:24:03] Right, right. Sam Charrington: [00:24:04] Someone bounces in our out between frames, if they're not in the frame, you don't count them. But when you're talking about entering and exiting physical spaces, you want to keep track of who was already in the space versus who wasn't in the space in order for you to provide an accurate account. So you have to, there's a bit more accounting that has to happen, and then you get these challenges with people disappearing  because they were at the edge or something like that.  That's where you have to focus on these segmentation, emerging problems. Adina Trufinescu: [00:24:34] Yeah, that's right. So, imagine that counting people over social distance, not a lot whole happened in a second. So, imagine that you have a railway station and you have a doorway where a dozen of  people needs to pass through. At that point, you have to run people detection at a higher frame rate such that you do not lose the people, or you do not lose them when they show up and you want to lose them when they disappear. Sam Charrington: [00:25:00] Yeah. Yeah. Yeah. So you mentioned a bit about the training data challenge that you ran into there, and this is related to that last problem we talked about with entering and exiting physical spaces. Is that correct? Or is it--? Adina Trufinescu: [00:25:18] Yeah, that's right. So this is where ground truth was also challenging. Take videos and these videos can be 10 minutes to one hour. You could have, depending on which space are you using, you could have few people or you could have a dozen or you could have a hundred people, right? So annotating that data frame by frame at 15 frames per second, that's a lot of work. Not only that. You have to track the same person from this frame across all the 15 frames times this many minutes is the same person. It's possible but you don't want to do that. You don't want to ask any human to do that. So this is where-- Sam Charrington: [00:26:01] If I can just jump in. If the network isn't tracking  the people but it's a combinatorial type of algorithm, is that a non-learned algorithm where you don't need to train on associating people or do you also move that-- Adina Trufinescu: [00:26:22] That is not a DNN. It's an algorithm and you don't have to train it. So what we are training is the people detection model, and then we are testing independently first the people detection model, and then we are testing the tracking aspect of it, and then we are testing the combinatorial algorithm. So that's where the ground truth needs to cover all the use cases. But then the most challenging one is the one where you have to generate ground truths that annotates each person and the anonymized identity of each person across the frames. Sam Charrington: [00:27:04] Okay. Yeah. I was trying to make sure that you actually had to track that because that would seem to make the data collection process quite a bit more challenging when you're annotating the identity of folks. That can be, if we're talking about images, that look like a overhead image of Grand Central Station or something, I would imagine that to be difficult for a human annotator. Adina Trufinescu: [00:27:26] Yeah, right. So this is where synthetics plays the same role as before. We are generating all these synthetics videos where, not only that we want to make sure that it's the same person across the video, but you want to make sure that the padding of the people in physical spaces across the use cases is most realistic, and then you want to annotate that. You have the different camera angles, you have the different heights and you have the lighting conditions. So trying to go into the real world to collect all that data, and then to annotate that data, that would be a real challenge. So this is where synthetics played a huge role and was a huge time saver. Sam Charrington: [00:28:12] Where does this synthetic data come from? Did you take an Xbox game that kind of looked like it had people in a crowd and try to use that, or did you develop a custom data generator for this problem? Adina Trufinescu: [00:28:27] It's pretty much the same technology that is being used for HoloLens and for mixed reality; the same kind of the technology that powers the [same] generation. We didn't take a game but the concept is very much game-like where you can overlay an image of actual physical space, and then you can start placing all these characters into the 3D space, and then generating the video streams out of that. Then, because you can play with the physics and then with the lighting, you can have a great variation. That is actually what we need to assure the high quality of the AI models and of the combinatorial algorithm. Sam Charrington: [00:29:14] Is that synthetic data approach also related to the camera placement approach that you mentioned? Are you varying the camera angle as part of this synthetic generation? Adina Trufinescu: [00:29:25] Yeah, that's right. So, Computer Vision has a custom vision and we want people to go and create custom vision models, but to the extent where they don't have to, and then we can actually save them time by creating these high quality models which perform great in all of these conditions. We want to do that. So the goal there was that when we train and when we test, we test with data from all these various conditions. So, part of the synthetic data was to-- Like the ceiling in a retail space is different than a ceiling in a manufacturing space. So this is where you need to bring in that variation. Sam Charrington: [00:30:08] Okay. From a customer perspective, are they sending you pictures from their camera  and there's a model that figures out where their camera might be? You said that you don't want them to have to send you measurements or anything like that. What's the input to that process? Adina Trufinescu: [00:30:27] So, we do not collect data from customers.  In the product, none of the video that is being processed is used for training. So the way we are approaching this is visiting customers, looking and learning about their environment, and learning about the parameters of the environment such that we can simulate it. Then we also create simulations of the real world scenarios. Obviously not manufacturing but you might use something like a store layout. That's something that you can emulate fairly easily, and then in that scenario, you have something where the camera is at 10 feet or camera is at 20 feet. Then you're  looking at the different angles and the different areas in the store where you want to apply the person crossing zone, person crossing line. That's how you  generate the synthetics data. Sam Charrington: [00:31:24] Got it.  Okay. Finally, you started to mention a kind of measurement and some of the challenges that measurements pose for this problem. Can you elaborate on the way you score these models and how you assess their accuracy? Adina Trufinescu: [00:31:45] So we applied the MOT Challenge, and then we used the data set to track the accuracy of the person detection and the person tracking model. We applied the MOT Accuracy and precision formulas. Sam Charrington: [00:32:04] MOT Challenge - Multi-Object Challenge, [inaudible]? Adina Trufinescu: [00:32:09] Multi-Object Tracking Challenge. So, we apply the industry standards to assess the precision and accuracy of the model. But, the thing that we did a bit different was that the actual output that goes to the customer is not actually frame by frame, the result of the detection or the tracking. What we actually send to the customer is the count of people, the distance between people, the time they spent in a zone, or the entry and exit events in the zone, such that they can calculate the dwell time. So we looked at the use cases, and we came up with accuracy measures specific to the scenario, and then we generated ground truths such that we can test holistically, not only the tracking part of the algorithm but the entwined algorithm between tracking association and applying this logic, like person crossing in and out of the zone or person crossing in and out the line. Sam Charrington: [00:33:11] So did you extend the challenge benchmark to your specific use cases in the higher level metrics that you're providing to customers, or did you have a separate parallel path that was more reflective of your specific kind of use case specific numbers? Adina Trufinescu: [00:33:30] It's pretty much specific to the use case. To give you an example, for the person entering and exiting the zone, we looked at the, what we call dwell time, which is a fairly common use case for what people want to measure. Then we looked at the timestamps for the ground truth. We created ground truth by looking at the timestamps of people entering and exiting the zone. Then we created measures for dwell time entering or exiting. It helped us assure that the accuracy of the end product, which is what the customer is consuming, is at a level that is satisfying the customer requirements. Sam Charrington: [00:34:22] With these measurements in mind, did you give up a lot going from the huge DNNs to a more compact DNNs and changing frame rates, and things like that? All these things that you needed to do to deliver a product that worked in the kind of environment that you were looking to do, did you lose a lot in accuracy for the measurements that you're trying to provide? Adina Trufinescu: [00:34:49] Not really. The goal is to gain in accuracy. You have to make tradeoffs and then you have to balance. It's always like a tug of war between accuracy and performance,  working with  customers, thats why we have these public previews. Before the public preview, we had the private preview. So, we work closely with a set of customers to validate the accuracy of the entwined algorithm for their use cases.  There were some learnings that we took away and then that's how we arrived by making the right trade offs, such that both the accuracy and the performance and the cost of the end to end solutions make sense. Sam Charrington: [00:35:31] Awesome. Awesome. You presented  on this at Ignite this week when you unveiled the public stage of release. Any takeaways from your presentation or the reception to it? Adina Trufinescu: [00:35:44] So, it was well received. I would say that you stay so much focused on performance and accuracy, and then the feedback that we got was, it was very strong feedback. For instance, the measure between people, we provided only in fit. Obviously, you have to stay focused on everything that matters. I mean, we'll try to move fast and everything happened so fast and that this is something that we plan during the pandemic months. Then the six feet that you hear every day stuck with us. Then we realized that our customers needs the metric system. So we had feedback like that. But then at this point, we are very excited to have the customer [stride] and I'm pretty sure that there will be more learnings. Sam Charrington: [00:36:41] Awesome. Awesome. Well, we'll be sure to link out to the service where folks can find it in the show notes, but thanks so much for taking the time to share with us an update on this new service, and what you're up to. Adina Trufinescu: [00:36:58] Yeah, it was my pleasure. Thank you for having me. Sam Charrington: [00:37:00] Thanks, Adina.
Sam Charrington: Hey, what’s up everyone! We are just a week away from kicking off TWIMLfest, and I’m super excited to share a rundown of what we’ve got in store for week 1. On deck are the Codenames Bot Competition kickoff, an Accessibility and Computer Vision panel, the first of our Wellness Wednesdays sessions featuring meditation and yoga, as well as the first block of our Unconference Sessions proposed and delivered by folks like you. The leaderboard currently includes sessions on Sampling vs Profiling for Data Logging, Deep Learning for Time Series in Industry, and Machine Learning for Sustainable Agriculture. You can check out and vote on the current proposals or submit your own by visiting https://twimlai.com/twimlfest/vote/. And of course, we’ll have a couple of amazing keynote interviews that we’ll be unveiling shortly! As if great content isn’t reason enough to get registered for TWIMLcon, by popular demand we are extending our TWIMLfest SWAG BAG giveaway by just a few more days! Everyone who registers for TWIMLfest between now and Wednesday October 7th, will be automatically entered into a drawing for one of five TWIMLfest SWAG BAGs, including a mug, t-shirt, and stickers. Registration and all the action takes place at twimlfest.com, so if you have not registered yet, be sure to jump over and do it now! We’ll wait here for you. Before we jump into the interview, I’d like to take a moment to thank Microsoft for their support for the show, and their sponsorship of this series of episodes highlighting just a few of the fundamental innovations behind Azure Cognitive Services. Cognitive Services is a portfolio of domain-specific capabilities that brings AI within the reach of every developer—without requiring machine-learning expertise. All it takes is an API call to embed the ability to see, hear, speak, search, understand, and accelerate decision-making into your apps. Visit aka.ms/cognitive to learn how customers like Volkswagen, Uber, and the BBC have used Azure Cognitive Services to embed services like real-time translation, facial recognition, and natural language understanding to create robust and intelligent user experiences in their apps. While you’re there, you can take advantage of the $200 credit to start building your own intelligent applications when you open an Azure Free Account. That link again is aka.ms/cognitive. And now, on to the show! Sam Charrington: [00:03:14] All right, everyone. I am here with Cha Zhang. Cha is a partner Engineering Manager with Microsoft Cloud and AI. Cha, welcome to the TWIML AI podcast. Cha Zhang: [00:03:25] Thank you, Sam. Nice to meet you. Sam Charrington: [00:03:27] Great to meet you as well. Before we dive in, I’d love to learn a little bit about your background. Tell us how you came to work in computer vision. Cha Zhang: [00:03:38] Sure. Sure. I actually have been at Microsoft for 16 years. I joined Microsoft originally as a researcher at Microsoft Research. I was there for 12 years. My research was primarily applying machine learning to image, audio, video; all of these different applications. I started 2016. I joined the product side, and currently I’m working as an Engineering Manager, and my primary focus is on document understanding. Sam Charrington: [00:04:11] Awesome. Awesome. So, we will be focusing quite a bit on OCR and some of your work in that space, and, you know, I think people often think of OCR as a, you know, a solve problem, right? It’s, you know, we’ve been scanning documents and extracting texts out of those documents for a long time. Obviously the advent of deep learning, you know, changes things, but I’d love to get the conversation started by having you share a little bit about, you know, what’s new and interesting in the space. How has it changed over the past few years? Cha Zhang: [00:04:50] Sure. Actually, it wasn’t very long ago, when people talk about OCR, what comes out of mind was firstly scan documents. In many people’s eyes, OCR for scan documents is sort of a solve the problem. More likely, I think there’s two major development. One is with a mobile first kind of word where everybody now have mobile phones and they take pictures everywhere. So there’s a lot of demand to do a text recognition out of images in the wild, and that certainly is a much more challenging problem than scan documents, and then technically, because of the advances in deep learning, we have realized that with deep learning, we can do OCR at a different level. We can make it a lot more accurate than before, and we can solve OCR problem in kind of imaging the wild scenario. So I think it started at 2000, early 2010 ish. I think there’s a lot of big advent advances in this area, and now we’re seeing basically OCR becomes something really that works. You know, people don’t need to worry about quality, etcetera, just mostly works. Sam Charrington: [00:06:08] Can you talk a little bit more about the challenges that arise when you’re trying to do OCR in the wild? Cha Zhang: [00:06:16] Of course. I think for documents, usually it’s white background and black text, but for images in the wild, essentially it’s a photo. So in the photo, there’s a lot of variations in the text. First there’s a huge scale variation, so some texts, if you capture a picture of a street, there might be some store name that are super big, and then there are some tiny texts that’s hard to see. So there’s a big variation in scale of the text and the aspect ratio of these texts can be a really long cause text string can be very long compared to regular objects, like a cat or a dog. Because of the mobile capture scenario, usually it’s difficult to integrate close these texts by and access a line of rectangles. For example, you’re not, there might be perspective just portions of the text when the camera sees them. The background in the image in the wild is much more complicated than the typical white background you see in scan documents, and some of these backgrounds, such as fences, breaks, and stripes, are even though they appear quite simple for human beings, but think of like fences can be a perfect, a bunch of ones, you know, on the street sitting there and they look very similar to two characters. So those create additional challenges, and I think one of the biggest one, I think technically for OCR, that’s challenging is the localization accuracy. So, typically in object detection, the localization accuracy, if it’s measured by intersection of a union, and if that criteria is bigger than 0.5, people think this is good enough, but for OCR, if you actually, the intersection is only half of the union, a lot of the characters will be missing. So, usually OCR will need a 0.9, 0.95 level kind of accuracy in order to recognize all the characters properly. So… Sam Charrington: [00:08:31] Can you explain that in more detail? What is intersection over union and how is that used in convect detection? Cha Zhang: [00:08:39] So, in order to measure the accuracy of a particular detection algorithm, you need to ground truth label the data, and so, typically what people do is they create a bounding box of the object to be determined, to be detected, and then you use a automatic algorithm to figure out where the object is, then that will also create a bounding box. Now you have two bounding boxes. and the question is how do you measure how well these two boxes align and, a common measure is to take the intersection of these two bounding boxes and you take the union of these two bounding boxes that you get two areas. You can imagine if the two bounding boxes are very close to each other, overlapping a lot, then that intersection of a union would be very high, but if they are off, they’re offset by quite a bit, then you know, the number is low. So that’s kind of academia standard, how people measure detection accuracy with this criteria. Sam Charrington: [00:09:46] Got it. And so, you were saying that the threshold that you need in the case of texts is higher because of what? Cha Zhang: [00:09:58] Because of… Let’s just think about, you know, you have a ground truth text, let’s say, “Hello world,” and it’s elongated a rectangle and you say, I have a text detection algorithm that creates also a bounding box, but have a intersection of a union, let’s say roughly 0.5, and so what that means is that the intersection area divided by the union of the two bound inbox is 50%. So very likely the detective bounding box will miss a few characters because, you know, the overlapping is not there. So, you might be missing at, you might miss a D as an N and all this will cause the OCR to produce wrong results. And so that’s the main challenge here. Sam Charrington: [00:10:48] So in the case of a traditional object detection scenario, you may miss a half of the face but you can tell that there’s a face there in the case of OCR, you’re just missing letters and it makes it a lot more difficult for the algorithm to guess what was there. Cha Zhang: [00:11:07] Yes, exactly. Sam Charrington: [00:11:08] Got it, and maybe taking a step back just to the problem as a whole, granted mobile is driving, you know, this transition to these in the wild pictures and people trying to OCR them, but what are the high value use cases there? Like, is it, you know, I’m thinking of some interesting ones as like the… when it’s in conjunction with translation, you know, maybe I’m in another country and I’m, I’ve done this. You know, you’re taking pictures of, of words and another character to try to read the menu or something like that. I’ve also done things like scan documents on a phone and, and you won’t want to OCR those, but that’s kind of back to the traditional OCR problem in a lot of ways. What are some of the other use cases that are common? Cha Zhang: [00:11:58] If you look at this kind of business opportunities, I still think the traditional document, you know, scan document, I think, some traditional kind of OCR problems that like, for example, receipts, where people can scan in the old days, but nowadays people mostly do reimburse them by taking or snapping a photo. So I think in term of the market, the revenue, I think that’s still quite a big one. There are a few others. The one that you mentioned, if you have a phone, you go to a foreign country, you snap a photo and you want to translate them as one. There’s also a lot of applications in digital asset management. So this is when you, either you are a big company or you are a personal kind of, you have some big storage of photos and where you want to organize these photos. We have shown that with OCR capability, you can increase the accuracy of processes, photos, and retrieve these photos. As a matter of fact, you know, the big search engines like Google and Bing, when they search images, OCR is integral part of that as well because the OCR, the content can help a lot in getting the best images. Sam Charrington: [00:13:22] Okay. And so, you were mentioning kind of some of the technical challenges and localization of the texts in these images is one of those challenges. How do you go about it? Is it the case that, you know, deep learning is so powerful off the shelf. Deep learning techniques just solves it for you or do you, you know, you reengineer the whole pipeline? How do you approach that? Cha Zhang: [00:13:53] So in text, this action, usually the detection pipeline is different from a traditional object detection. What’s been most popular for kind of OCR for imaging in the wild today is something called anchor free detection. So the idea… Anchor free. In a typical object detection, usually most well known anchors, like fast RCN and faster RCN, etcetera. They basically create these anchors and then they regress the actual bounding box of the objects. The challenge of using that kind of approach is that these anchors need to be preset, and so typically for normal object detection, you set at a certain density, and then you set a certain set of aspect ratios. Like your anchor box are one to two, one to three, one to one. Typically you go about there, but texts, some of the text can go like 20 to one so really you cannot, it will be a huge computational cost to go with anchor based approach. So modern days for OCR, we go anchor free, and the high level concept is essentially by using convolutional neural networks. You almost do kind of a per pixel level, a decision or classification saying, well, this region nearby this particular pixel, it looks like part of text. So there is a text/non-text classification almost kind of per pixel level. Then you rely on a few algorithms to group these into text vines by looking at how well two, for example, two texts, the region are similar to each other and you can decide, well, these two looks like the same textures and color, and maybe they should be connected. In this regard, there are quite a few well known algorithms to do this connection. In earlier days, people use a relatively kind of a rule-based approach like stable link where they link based on some features, but it’s kind of a rule-based. More recently, people start looking to new networks like relation network. So are kind of estimating the relation of two regions are features, and based on that to decide, well, these two should be connected or not. So that way you started kind of bottom up; you start with perfect kind of classification, and then you do grouping, and you come out with these text lines. Very powerful approach. It can not only detect kind of a straight lines, but even curve lines, you can handle them pretty well with those approaches. Sam Charrington: [00:16:44] So it sounds like you’re describing a pipeline. That’s not like a, end to end train single neural network that you give it images and train it on label data. It is, telling you what the text is, but rather a bunch of independent steps. Cha Zhang: [00:17:04] Yes, that’s a very good observation. Actually, so for OCR, detection is only the first step and after detection, we typically run a character model where you take the detected text lines, you normalize them into a straight line with a fixed height, and then you run a character model to actually decode the image into a character, a list of characters. There are a lot of approach actually similar to speech where, you know, speeches going from acoustic similar to these texts. But here we’re going from image to text. But a lot of the approaches that we use, like LSTM, language modeling, these are very similar. Now your question is certainly valid because in speech today, you know, people do end to end training you. They start from audio so they can directly go to text. For OCR, we are not a year yet. I think the main challenges, well first is how much data you have. I think speech, you can collect a lot more data compared with OCR. OCR data are usually very expensive to collect in a label and so, going stage by stage at this point is more economically doable than, you know, do end to end training. Sam Charrington: [00:18:25] Why is that? It seems that we have tons of pictures with words in them that we know particularly, is it just in the wild, the, in the wild examples where we don’t have the label data or is also this document use cases because I’m imagining, Microsoft has probably labeled a ton of receipts and business cards and that kind of thing. Cha Zhang: [00:18:50] Yeah. I think certainly a labeling is very, very expensive. For Microsoft, we are a company paying a lot of attention to privacy, you know, those kinds of issues and the collecting OCR data has been a major, I would say, blocking issue to go for this kind of end to end approach because if you think about it, a lot of the document that we actually carry, like if you say, talk about invoice, talk about receipts, business card, they all contain PI information. Those are data extremely difficult to obtain, and we follow very strict kind of guidelines – how we can collect them, how we can label them. So in some way we are limited by these privacy restrictions, but we do respect those a lot. So we, as a result, you know, we are now going end to end at this point. Sam Charrington: [00:19:48] Got it, got it. It makes me think a little bit about the, some of the issues with neural networks, remembering data. So for example, there are examples where you’re, you train a CNN and there are some attacks that you can do that will reproduce some of the images, you know, it’s to some degree or another, that the model was trained on. Likewise, with these very large language models, you can start to see some of the texts that the models were trained on, come out in the, in the output. I would imagine if you were training end to end, at least then that becomes an issue as well, and maybe more so than in the case of images.   What’s your intuition there? Would it be worse or are better than images? Cha Zhang: [00:20:39] I would imagine it will be similar, I would say. So after all, you know, OCR, you come from image to text, but during the learning of this OCR process, language model is actually very helpful to help improve the OCR accuracy. So, for example, during decoding of these texts lines into a text, we use some of the, like LSTM or, you know, basically these very popular language modeling schemes. Certainly it remembers the contextual information of the language in order to help the OCR to recognize these texts properly. So, I think when you go to end to end, when the amount of data that you use for training is humongous, I think, it’s difficult to imagine for me, you know, we’ll have similar level of data for training like BERT models or TBT models. Those are huge, huge amount of data, but still you will learn something from the text and they might leak into the model as well. Sam Charrington: [00:21:51] Along those lines, what enabled BERT and many of the recent innovations around language models is a shift from supervised to the semi-supervised way of framing the task. Is there a semi-supervised framing for the OCR test? That makes sense? Cha Zhang: [00:22:13] Actually for OCR today, we are not, although I think it’s definitely a very interesting research problem. I think BERT is a super nice framework for transfer learning. You know, you, you go from pre-trained model and then, you know, on a supervisor, you can… In the image word, I think, transfer learning probably exists earlier in image than language. So earlier days when we have ImageNet, we trained like a resident, those are already being used for transfer learning. So, unsupervised kind of image learning is also, I think it’s still ongoing. There’s a lot of interesting projects going on. I think for OCR right now, we’re not there yet. Like one of the main issues for building a product like OCR to use some of these pre-train model is the computational cost. I think this happens in language as well, BERT model, the GPT Model 3, like, you know, multi billions of parameter is very difficult to turn them into a product for OCR. It’s also, you know, we have the same problem. Computational cost is very sensitive. We need to make it fast, and so we’re using it relatively small models and normally we train from scratch. Transfer learning does show some benefit, but when the data reaches a certain amount, we found training from scratch is perfectly fine. Sam Charrington: [00:23:49] When you have a certain amount of data to train from? Cha Zhang: [00:23:53] Yeah. In the very early days when we started doing different learning OCR, we actually rely a lot on trans distillation – that’s teacher-student learning, where we first train a big and model, and then we gradually use teacher-student learning to create a small model so that it can run efficiently. Nowadays, we have figured out that you can train these models from scratch. The amount of data that we have on the order of, you know, hundreds of thousands and millions of images are sufficient to train from scratch on smaller model, and reach about the same accuracy. Sam Charrington: [00:24:31] Can you elaborate a little bit on that? Are you saying that you need more data to train smaller models? Cha Zhang: [00:24:37] No, I’m saying that… Take BERT as example. BERT is super beneficial for transfer of learning because it has seen so many documents. So giving any new language task, presumably your data is not much, there’s not much data that you have to train this new task, and therefore, leveraging BERT, where it has seen so many documents, will help through transfer learning to transfer some of the knowledge that the BERT has learned from this huge set of document, to the small kind of task so that it can reduce the amount of documents required to train the smaller task. The same thing happens in ImageNet transfer learning where, you know, if it’s a ResNet train on ImageNet, you learn a lot of visual information from the ImageNet dataset. Then if you have a tiny detection task, like detecting a helmet, let’s say, and you can do the transfer learning and you can use a very small amount of dataset to actually train a very good helmet detector. What I was saying just now was that for the problem of OCR where, you know, it is certainly a very important computer vision problem. Every company who invest in OCR tend to collect quite a bit of data, not to the level of, you know, billions, but hundreds or thousands, millions to that level, that amount of data is sufficient that you do not need to go transfer learning. You can train the model from scratch and you get very good results. Sam Charrington: [00:26:19] Got it. Got it. So when you were using transfer learning where you’re using models based on ImageNet, you know, along the lines of ResNet and others, or whether… Okay. Lets see… so the smaller models that you’re training are they, you know, some of the traditional architectures that we’ve already brought up or are you building out new architectures for the models themselves for this specific problem? Cha Zhang: [00:26:53] Right now we’re using some of the traditional models. There are some active research going on regarding searching the best effective architecture for OCR. We haven’t seen convincing results yet, but I think that’s a very active research area that we’re still kind of looking into, particularly when we try to make it smaller and smaller, you know, faster and faster. Sam Charrington: [00:27:20] When you say searching the best architecture for OCR, are you speaking using the word searching generally, like you have researchers are looking at different models and trying to find the best one for OCR, or are you suggesting a domain specific neural architecture search kind of…? Cha Zhang: [00:27:38] I mean neural architecture search. So that certainly can be applied to OCR and we were still exploring it, but I think that’s a very promising direction. Sam Charrington: [00:27:49] Okay. Interesting. Interesting. Earlier in the conversation you talked about one of the big use cases is some of these semi-structured data that we want to extract information out of – invoice is one example. There was a recent demonstration, or I guess that’s actually a product now of the mobile version of Excel or something. You can take a picture of a grid, grid like data, and that will, you know, both extract the text and organize it into a spreadsheet. Talk a little bit about the product that you’re working on the form recognizer, which is doing something similar. Cha Zhang: [00:28:35] Yeah, of course. So OCR certainly is pretty low level. Other than some of the application I mentioned earlier, like digital SMN and then photo managing, you know, translation, you can directly use OCR, but for many customers, what they want is not just OCR. They want to extract information from documents. Think about, you know,I need to process millions of invoices. I want to extract vendor name and the date, total amount, or if it’s an MS expense system where you want to process all the receipts, and either it can be a verification purpose, for example, like, okay, how do I make sure employees are not putting random numbers and they don’t match with the receipts that’s actually filed. It’s actually, it sounds kind of silly but you know, today, a lot of the company do this verification manually. Because of the huge manual amount of effort needed, they often can only do sampling. So you sample like 5% of these receipts to validate, but you kind of miss a huge chunk, and that you never even look at it? So we are looking at this space and we’re trying to build essentially two category of product – one is a previous set of product and these are solutions that works out of the box. For example, it can be a prebuilt receipt, pre-built business card, pre-built invoice. So these are, basically you’re sending an image or PDF file. It will extract all the fields that you’re, you’ll be interested in. Another big category that we think are super important is customization because, you know, the pre-build may never fit every need. So we have a solution called the custom form where we allow customer to basically send us a few sample images. You can either label or even, you know, not doing any labelling but we will be able to extract key value pairs out of these documents. Again, we see this as a much closer to what the customers need and that’s what the form recognizes its position as. Sam Charrington: [00:30:54] So we’ve talked about a bunch of the interesting technical challenges at the lower level at OCR. Does the form level, you know, is that a kind of a packaging of OCR? Does it have its own technical challenges to overcome…? Cha Zhang: [00:31:13] Actually it has a lot of very interesting challenges. So, one of the work recently is coming out from Microsoft research, whereas, you know, targeting exactly this problem. And so, just think about it. The language, I mean, passing these invoices and receipts are essentially sort of a language problem because you have these texts there. The challenge here is that these are images, so you run OCR on them, but unlike a typical language, a data set where you’ve scratched from the internet, you know, Wikipedia there’s basically have this ordering of these words already, but if these data coming from image, essentially you can detect these texts lines, but it’s actually very difficult to define the read order of these texts lines, and ordering of these texts lines by itself is a very challenging problem. When you have images in the wild, paper can be curved, you know, can be crunch, can be rotated here, the perspective, you know, all kinds of issues. They can have background text, you know, all these. So the particular approach that MSRA came out is called LayoutLM. It’s actually a modified a BERT model. It’s also a language model, but in addition to the language, we also embed 2D information, like what is the X, Y position of the bounding box of the text? So with that information, train, actually, this is all can also be trained without supervision. It’s unsupervised pre-training. We are able to learn this kind of spatial relationship in these invoices without coming out with explicit read order. With that, we actually can do a lot of these key value extraction really well. There’s also quite a lot of advanced research looking into say, relation networks where you see two text lines nearby each other, you can predict the relationship. Again, this is similar to the OCR where you have these bottom pixel level classification. You want a group of them here. You want a group P key and a value pairs. There’s also a lot of advanced research in this graphical convolution networks where you do convolution networks over a graph, where the graph is defined by connecting nearby text lines. Again, this is approach without requiring reading order, but just look at the spatial relationship. So these are all actually very exciting kind of extension of language, but also using visual information to help passing these vertical data more accurately. Sam Charrington: [00:34:09] Interesting. Yeah, I think it’s… At a quick thought would’ve imagined that, you know, maybe the top part of the stack, there is more rule-based than the bottom part of the stack was, you know, more machine learning base, but it sounds like they’re even, I don’t know, relatively, but there are a bunch of really interesting… Cha Zhang: [00:34:33] We are doing a lot of machine learning stuff on the top as well. Sam Charrington: [00:34:37] I’m imagining the, you know, when you talk about relation net, for example, on an invoice you could have date, and then the date, you know, horizontally next to it, or you can have date and then the date beneath it. Cha Zhang: [00:34:50] Yes. Sam Charrington: [00:34:50] You may have an address box and then a bunch of texts that comes beneath it. It would be nice to know that, you know, we’re talking about the address here. That’s part of the idea of the structured text extraction. So in that you mentioned relation net and graphical CNNs. Are those two approaches to solving the same problem or are they solving different aspects of the problem? Cha Zhang: [00:35:13] They solve different aspects of the problem, and they can be also used to solve the same. I mean, like right now, the main focus for us, for them for extracting key value pairs. This is both kind of pre-build and the customization. Think about, if it’s an invoice and you want a vendor name, so it’s a name. Certainly, you know, the text information because you see it looks like a vendor name. This probably is a vendor name and some invoice doesn’t even have the key in the invoice. Sam Charrington: [00:35:48] Right. Cha Zhang: [00:35:49] You don’t even have the word vendor name there, so how do you figure out this thing is still vendor name? So, there, you rely on information that’s language and that’s also kind of how the document is laid out. Like, okay, the font size may matter. You know, the position of the same may matter. So we are looking into combining all this information to come out with a better decision on those fields. Sam Charrington: [00:36:21] So, how does a graphical representation or way of thinking about the document gets you to a solution to these kinds of problems? You know, for example, the unlabeled vendor name? Cha Zhang: [00:36:33] The graphical kind of approach is basically… so you’ve got a bunch of text lines detected by the OCR and you connect to these texts lines with their neighbors. You define basically how strong these connections are. Actually it’s not defined. You actually learn these relationships by looking at the texts, looking at their relative positions, looking at their font similarity. Like one issue that you actually just mentioned was like address as you connect ’cause you have multiple lines of addresses. How do you know they actually belong to the same address? Right? So there’s this kind of, all these side information could be very helpful in determining that they should be grouped together. In the convolutional kind of graphical model, you learn a convolutional network by computing from all the neighboring nodes where each node is a text line to aggregate basically at the center node. So basically, the model learns by not only looking at the current text line that’s in focus, but also look at all the nearby text lines and decided, well, given all these contextual information, it does look like this is a vendor name. I guess that’s a very high level conceptual description of why it would work, but it’s the data driven machine learning so that the model [inaudible]. Sam Charrington: [00:38:06] As you’re solving problems like this, are you often needing to re-label your dataset? For example, imagining early on in developing an algorithm like this, you have a bunch of invoices, and you draw a bounding box around the addresses and you say, this is the address. Then you say, ‘Oh, well the font information is a whole new dataset,’ you have to label, well, this is… Are you going in and having people label Helvetica versus Arial? That seems a bit fine grain and hard to actually get an experts to label, or is it more abstract than that? Cha Zhang: [00:38:48] We usually only label the end goal, which is the field that you’re going to extract. So, for example, you want to extract a vendor name, vendor address, total text, you basically draw a bounding box in those regions and use that as a ground use data. Sam Charrington: [00:39:06] Got it. I think we’re going to the same place. When you say font… Cha Zhang: [00:39:11] When I say font, actually it’s in some way, implicit in the sense that we’re taking these bounding boxes, we’re extracting image information. Right? So think of it as let’s say, run a convolution network to extract a feature of that part of the text region, the text line. So, this feature is essentially all the visual information that can be helpful in deciding or determining the relationship between text lines. So if features are similar, it probably mean they are similar font, they are similar size, you know, so those kinds of… So, yeah, I think that seems to be sufficient. Sam Charrington: [00:39:55] So you’re not trying to kind of featurize your underlying images into these distinct things because what I inferred, when you said font. Do you look at the, you know, is there an analogy to kind of looking at the layers of the network, and when we do this with CNN, GC, like textures and things like that, is there some analogy that you’ve seen in looking at the layers of the network that says, ‘Oh, this layer is like identifying fonts.’ Cha Zhang: [00:40:32] No, we haven’t been going there yet. Well, I guess it’s certainly interesting to look at it. My take is most likely, font is just one attribute. I believe there are many other things. Yeah, I think it’ll be interesting to look at these features visually. Yeah. Sam Charrington: [00:40:54] We’ve talked throughout the discussion about kind of the ways that OCR and this form recognition problem kind of blends the vision domain and NLP domain and language models has come up quite a bit. Is there a little bit more kind of depth we can go into there? Some of the ways that, that you see, NLP, and particularly the advances in NLP over the past few years kind of influencing the problem and the way you solve it? Cha Zhang: [00:41:32] Yeah. We set up, I see NLP plays a very important role in these verticals. After all, these invoice receipt, business card, these are all human artifacts. They’re kind of language artifacts in some way. Right? So, all of the kind of latest state of the art in language modeling, we definitely want to leverage The thing I mentioned earlier, like the layout or it’s a one way to leverage them by using the language model, but also embed additional visual information, and hopefully to solve these problems effectively because input is really different, right? You know, the priorities like you take texts, it’s input here. We’re taking a bunch of texts lines to the locations and bounding boxes as inputs, and the algorithm can naturally kind of solve these problems. Sam Charrington: [00:42:30] And,is it also trying to do the traditional language model predicting the next character or word or set of texts? Cha Zhang: [00:42:38] Yeah, the way we train them are very similar, basically, merge texts – you merge some words and try to predict. Certainly you can use a lot of others. I think, you know, like I know recently people use translation targets. You can use alpha virgin coder kind of targets. This is a really active research area at this point. I don’t think, I think we’re still just scratching the surface, although we already seeing very, very promising results. So we definitely want to look deeper into this and see how well this really can push the state of the art. Sam Charrington: [00:43:21] Kind of continuing on that thread of the active research areas and what the future holds in this area, what are you most excited about in this domain of OCR and in general, extracting text from documents, vertical applications and the like. Cha Zhang: [00:43:42] Yeah, I think, we have been working on this problem for quite a while, but I think there’s still a lot of interesting problems. Only when we start to work with customers, we realize, you know, there are problems we haven’t been able to solve. I can just name one, for example, like table extraction sounds trivial, but when you actually look at all the existing tables in the word, the simplest one are those with explicit cell borders where you have straight lines but in reality, these tables can have no cell boundaries at all. It can be mixed on top with STEM, you know, all these things that are kind of making the problem extremely hard. So that’s jus, another one that is extremely challenging, but we want to solve. Another thing that I sort of briefly mentioned about earlier was the customization part of these vertical. How do you customize to customer’s own data instead of having these pre-built ’cause inevitably, you will have data that doesn’t work with these premium models. How do you allow customer to have a way to build their own models to still work, and that by itself is a very challenging problem because asking customers to label a lot of data is painful. They don’t want to go there. So either we go unsupervised or we go with very, very limited in number of supervision data. In such a case, how do we adapt our model so that it can work on this document that customer realize that the premium model has failed. That’s also very interesting kind of research problem that we are looking into. I envision in a language as low shot learning. It’s also, now it’s definitely applicable to the problem here as well. Sam Charrington: [00:45:50] In the case of some of the product ties, vision offerings, Azure does this as well. The user is able to upload its own set of labeled data and kind of the results for object detection are kind of fine tuned against the user’s data set. Cha Zhang: [00:46:13] Yeah. Sam Charrington: [00:46:14] Do the OCR and form recognition offerings, are they providing something similar? Like, can you upload it? Can I upload my own invoices? You’re doing some kind of transfer learning or, well. If you are, what are you doing to take advantage of what the user’s providing? Cha Zhang: [00:46:33] So we do have a product called a custom form which allow customer to upload a few samples here. We usually say minimum of five samples. So, say you have an invoice that doesn’t work with existing models, and so you want to solve the problem when you upload five invoices with similar is fine. These are from the same vendor or kind of looks or similar in structure, and we can figure out these key value pairs and extract them, either unsupervised or supervised. Right? Unsupervised means, customer don’t need to label anything. So you upload the file documents. The information we’re gaining by looking at these five documents is, well, these documents are supposed to be similar and therefore, they’re going to be a bunch of words in this document that actually is common across these documents. This commonality help us to tell, well this is probably part of the empathy form or the template of the form, while the thing that’s varying across forms are like, these are must be information customer has filled in as kind of different from sample to sample. So with that information, we can actually extract key value pairs out of, without any supervision. All you need is upload five similar documents. Of course that works to a certain degree, but if you’re still not happy with accuracy, we provide a way for you to label your key valued pairs. So here is like we, we have a UX where you can go and label the fields you care by essentially highlight the OCR text lines where you think this is the value I want to extract. Then we actually learn a model out of five samples and produce a model that can be used by the customer to extract these values. The accuracy is actually normally pretty high, in the 90/95 percentage range, actually. Sam Charrington: [00:48:38] So when the customer does this, is this process entirely learned or is there a human in the loop kind of exception handling element to it? Cha Zhang: [00:48:50] I guess this is probably kind of take a step back. I think all the products, OCR process today, OCR has made a significant advance, but if you actually care about the numbers, think about the invoice. Right? If your total is wrong, it’s really that bad. So, what we recommend is definitely we recommend people to have agent backup. For all of the products we offer, we give people confidence, right? So how confident we are about the expression of a particular value, and a different customer can choose their own threshold and have an agent to look at them. But I think, today’s accuracy. we don’t recommend kind of strays through, unless you are handling certain specific applications. I can give you an example. For example, if you have a valid, if you’re verifying receipt image against a employee entered data, so there you can go automatic, right? ‘Cause if the OCR produce a different number than the employee, well, you will need somebody to look at them anyway, but if they actually merged them, well, that probably means it’s okay. Sam Charrington: [00:50:08] Right. Cha Zhang: [00:50:08] So the application, you can automate it more. Sam Charrington: [00:50:13] Got it. So, the question that I was asking is slightly different though, and you know, so say you’ve got someone using automated form recognition and they have their five examples that they haven’t been happy with, and they submit that in through some website, our API, is someone at Microsoft taking those, and going, taking them manually through some process to try to figure out why they’re not working or are they thrown into some training job and then the customer’s result gets better? Cha Zhang: [00:50:48] Okay. Now, no, we don’t look at the customer’s data. So this is a fully automated product, meaning, you know, customer basically label these files. They call a API to train a model. The whole process is automated. Sam Charrington: [00:51:04] So under the covers, are they kind of forking off their own model? The last few layers are getting cut off and it’s fine tuning, or is it more elaborate than that, or…? Cha Zhang: [00:51:17] It’s more elaborate than that. Underneath the hood, there are multiple steps. We leverage a lot of information in these sample documents. For example, as I mentioned earlier, there will be words common across these samples. Those are very strong indicators regarding this might be part of the empathy, part of the form where, you probably think these are not so interesting to the customer. Transfer learning is certainly one way of doing that. Right now we are actually train these models without transfer learning. So it’s actually, the model is training from scratch for very few customers we’re able to do this. We’re able to do this because some very interesting work that we have done tobasically augument this data to make sure that you have sufficient data to still be able to train a model out of five samples only. This can be a feedback loop as well. So, if customer’s not happy with a model trained by five samples, you can upload them more and we just train a new model for you. So every time you try and just get a newmodel, that way, it’s a feedback loop where customer can keep improving their model until it to a certain stage where it’s really performing for the customer. Sam Charrington: [00:52:53] So when you say augmenting the five that they’re providing, are we talking about data augmentation and the sense of a transformation pipeline that kind of changes, adds noise, rotates, that kind of thing? Or are we talking about, you’ve got some other data set that you’re adding to their five and training it on that aggregate data set, and that’s how you’re producing a better model? Cha Zhang: [00:53:21] Both. Although I think the latter one is more because actually, when customer label these data, they actually provide, we ask them to provide some additional information. For example, they label, this is a date. We know it’s a date. So in this way we can artificially create more data to fill the form so that we can produce more data to train the model. Also, we use a very robust machine learning algorithms that are robust to very few examples. So, that way we can learn with this limitation. Yeah. Normally, if you look at many of the other offerings that people provide. You have to train with hundreds of examples here. We’re pushing it really down to five and we hope to push it even lower in the future. Sam Charrington: [00:54:11] So I’m assuming that this is a stacked problem and you’ve got some low level OCR, for example, models that are trained with many, many documents. What you’re doing with this form recognizer custom data is more at the top end node of that stack. Is the off the shelf model that I’m using without the five example customization, is that also trained on relatively few examples? Cha Zhang: [00:54:44] What do you mean? Sam Charrington: [00:54:45] I guess what, I guess maybe I’ll jump ahead to the conclusion that I’m drawing on. What’s what’s confusing me is how are you getting better results with few examples if you’re not using any kind of transfer? I guess I heard in your explanation that you’re not doing any kind of transfer. Cha Zhang: [00:55:03] So right now the custom forms support training model and these models are usually… each model is geared towards one particular form type. So in some way you can think this problem is actually restricted. It’s actually a easier problem. It’s not like a pre-built invoice where essentially you want to handle all your invoices. Here we’re handling one particular invoice coming from, I would say one particular vendor. I say they usually use this template. Sam Charrington: [00:55:37] Got it. So the customer then, do they call a unique API to resolve invoices of this type? Or is that then ensembled, and then there’s something that decides whether it’s of the type that you’ve built the new model for? Cha Zhang: [00:55:55] Yeah. So here’s a kind of the recommendation that we give to customers, right? So you maybe start with the previous model, and the previous model may work and then your job is done. If you’re happy, go. Then you certainly say you have a lot of invoices and out of a thousand, 10 of them doesn’t work. So while we offer the customer as well, take these invoices and you can train specific models for these 10 different invoices, you might need to train more than one model as a special model because this invoice may look very different. So imagine you can train like 10 different customer models for this. We actually also offer kind of automatic invoice classification. So a API called a model compose where we can compose these 10 small models into one. So, all you need is just calling to that one. By calling into that one, we also provide you a confidence to say, well, because during testing, the customer send the invoicing. We don’t really know whether it’s one that doesn’t work with this pre-built one or whether it’s part of this. It works well with the previous. So you send this invoice first to the customized version of the model, and we will tell you, ‘Hey, it doesn’t look like any of the 10 you have trained.’ So in this case, you will revert back say, okay, now I’m calling the previous invoice ’cause you sort of know that pre-build actually works well for that. So that’s what we recommend customers to do. Sam Charrington: [00:57:34] Okay. I dug into a little bit of the detail there, but it’s interesting to see kind of how the end-end problem is put together. In a case like this, the ends of that problem are on the customer side, not just the service that you’re offering, and so seeing how the pieces are put together is kind of interesting. Awesome! Well, Cha, thanks so much for taking the time and walking us through some of the interesting things that are happening in these domains. Cha Zhang: [00:58:12] Thank you for having me. Sam Charrington: [00:58:14] Great! Thank you.
Today we're joined by Sasha Luccioni, a Postdoctoral Researcher at the MILA Institute, and moderator of our upcoming TWIMLfest Panel, ‘Machine Learning in the Fight Against Climate Change.' We were first introduced to Sasha's work through her paper on ‘Visualizing The Consequences Of Climate Change Using Cycle-consistent Adversarial Networks,' and we're excited to pick her brain about the ways ML is currently being leveraged to help the environment. In our conversation, we explore the use of GANs to visualize the consequences of climate change, the evolution of different approaches she used, and the challenges of training GANs using an end-to-end pipeline. Finally, we talk through Sasha's goals for the aforementioned panel, which is scheduled for Friday, October 23rd at 1 pm PT. Register for all of the great TWIMLfest sessions here!
In a message last week, I addressed the recent death of George Floyd, the protests, and the future we are working towards. While we all have a responsibility to engage in the fight against racism, the ML/AI community has a unique responsibility to ensure that the technologies we produce are fair and responsible and don’t reinforce racial and socioeconomic biases. We discuss bias, ethics, and fairness in ML and AI frequently on the podcast. We’ve highlighted some of the episodes focused on these topics below. I hope these episodes help you engage in conversations about these issues with your colleagues and friends. We will also be hosting an interactive viewing session of my interview with Rumman Chowdhury, Global Lead of AI Responsibility at Accenture on Monday at 2 PM Pacific. Rumman and I will be live in the chat taking audience questions. Please join us by registering here. We’re looking forward to your questions in the chat. In the meantime, take a look at the shows: AI for Social Good: Why “Good” isn’t Enough with Ben Green - Does political orientation have a place in building technology? Ben comments on the controversial topic and shares how the notion of “good” is often elusive, and lacking in rigorous political or social depth despite enthusiasm from computer scientists to integrate these concepts into their work. The Measure and Mismeasure of Fairness with Sharad Goel - Sharad shares how machine learning can be used to expose unregulated police behavior, and why mathematical definitions are not sufficient for determining bias in algorithms. We also discuss The Stanford Open Policing Project, a data-gathering and analysis initiative started by Sharad. Algorithmic Injustices and Relational Ethics with Abeba Birhane - Abeba wants to shift our focus away from one that’s fundamentally technology-first (explainability, transparency) to one that reframes ethical questions from the perspective of the vulnerable communities our technologies put at risk. AI is just the latest in a series of technological disruptions, and as Abeba notes, one with the potential to negatively impact disadvantaged groups in significant ways. Trends in Fairness and AI Ethics with Timnit Gebru - Timnit provides an overview of the ethics and fairness landscape currently surrounding AI. She shares a ton of insights on diversification, and how groups like Black in AI, and WiML are helping make huge strides towards trends in the fairness communities. Operationalizing Responsible AI - This panel from TWIMLcon: AI Platforms features experts discussing the tools, approaches, and methods that have found useful for teams to implement responsible AI practices. Responsible AI in Practice with Sarah Bird - Sarah focuses on bringing machine learning research responsibly into production, as well as work on differential privacy. She walks through Microsoft's interpretability platform, Azure, and discusses the idea of going from "Black-Box models" to "Glass-Box models." The Ethics of AI-Enabled Surveillance with Karen Levy - Karen discusses how rules and technologies interact to regulate behavior, especially the legal, organizational, and social aspects of surveillance and monitoring. And how these data tracking and surveillance methods are often exploited in ways that impact marginalized groups. Fairness in Machine Learning with Hanna Wallach - Hanna shares how lack of interpretability and transparency show up across machine learning. We discuss the role that inadvertent human biases can impact machine learning. Along the way, Hanna points us to a TON of papers and resources to further explore the topic of fairness in ML. We have more coming your way! Subscribe to our newsletter to keep up to date.
What a week. Those of you who follow me on Twitter may have seen that, a week ago today, I was expressing my anger over the behavior of Amy Cooper, a white woman who, after being asked to leash her dog in an area of Central Park where this is required, proceeded to call the police on the Black man, Christian Cooper who simply asked her to obey the law. All of this was caught on video, and as a Black man it really made my blood boil because it was obvious to me, and many others in fact, white and black, that her 911 call was a threat of violence, a call to authorities who would likely hear her tone and his description and respond brashly and harshly. Little did I know at the time, that while I was tweeting about the Central Park video, another video had surfaced, demonstrating plainly the depraved brutality of four Minneapolis police officers who murdered George Floyd, an unarmed Black man. Those officers were called by a shop owner who accused him of using a counterfeit $20 bill. George Floyd’s death, as you know, has since set off a powder keg of unrest across the United States and around the world. I think while the catalyst of what we’re seeing may have been, or was the act perpetrated by the officers, the cause is much deeper, it’s a centuries-old tradition of racism and race-based violence that this country, the United States, has not yet been able to come to terms with, and seems unwilling to come to terms with. Not to mention the backdrop of a global pandemic and record unemployment. Both of which, at least here in America, have disproportionately impacted Blacks, and people of color, and the poor. Like many of the folks I’ve talked to over the past few days, it’s been really difficult to articulate all of the emotions I’m experiencing right now. As I’ve watched the week’s events unfold, everything that’s been going on this weekend, I’ve personally been in touch with everything from disappointment and anger, and rage, to gratitude, appreciation, and hope, and many more points in between. Mostly though I’m just dealing with frustration. Of having been here before and knowing that I’ll be here again. Of wanting to do more, of thinking that I should be doing more, but not knowing exactly what that should be. And knowing that it won’t be enough. And knowing how little has really changed when it comes to race and class equity in America, and how much everything just seems to be getting worse. As much as I want to resolve these emotions, to have the answers, and to be able to give you answers. I know that there are no easy ones, and the solution to what we’re seeing now is not something that is going to happen overnight. If there is any answer though, I have got to believe that it starts with having empathy for those who are suffering and protesting for change. It's easy to look at the destruction and violence of the last few days and look for a scapegoat, or to point out the 'right' way to protest. I’ve certainly seen plenty of that in my social media feeds. But rather, I think it’s an opportunity for us to be introspective and explore ways to be empathetic to the anger and frustrations of those in the thick of protests who feel that they’ve got no other way to air their grievances and effect change. It’s an opportunity for all of us to examine our own biases and prejudices and get to the root of them. Because it’s only then that we’ll be able to start the work of eradicating them. Empathy is the right place to start, but beyond empathy, I think we have a responsibility to speak up on behalf of those people whose voices are repressed and for whom justice has been elusive for a very long time. We can use our voices to call attention to the injustice of police brutality, and to fight other examples of racism and bigotry in our communities. And we can support those who are out there working to make a difference and get involved in efforts in our local, national and professional communities to dismantle racism and bias. This includes supporting organizations pushing for social equity like Black Lives Matter, and groups offering relief for those jailed for exercising their rights to peaceful protest. As well as, for those in AI, supporting the work of organizations looking to ensure that Artificial Intelligence is used fairly and responsibly, and ensuring that a broad range of people will have the opportunity to participate in its development. There are far too many groups worthy of our support to list here, but we’ll list a few below, and I strongly urge you to look up local organizations in your home city or state and start there. The TWIML community, and the broader AI community, are intelligent, curious, and generous, and we have the opportunity to affect real change in this moment. While the mountain is steep, it’s the one we’re on, and there’s nothing to do but keep climbing. I’m glad to be climbing with you. That’s all for today. Be safe, and catch you next time. Resources Black Lives Matter - Ways To Help Organizations and CTAs list put together by Black Lives Matter Know Your Rights Camp – education, self-empowerment of black & brown people, including how to interact w/ police Campaign Zero - Policy solutions to end police brutality. Rolling Stone - 'Resources for Those Seeking to Help Anti-Police Brutality Protesters’ includes efforts supporting bail funds, legal aid and more across the country' Twitter Thread of bail funds across the country Resources Roundup by Katrina Michie - Introducing race to children Color of Change LGBTQ Freedom Fund AI Now AI4ALL Algorithmic Justice League Data for Black Lives Center for Applied Artificial Intelligence NSBE While there are a lot of organizations at the national level doing amazing work, we also strongly urge you to look up local organizations in your home state or city. Local efforts tend to be the most effective at creating change in their specific area.
How does LinkedIn allow its data scientists to access aggregate user data for exploratory analytics while maintaining its users' privacy? That was the question at the heart of our recent conversation with Ryan Rogers, a senior software engineer in data science at the company. The answer, it turns out, is through differential privacy, a topic we've covered here on the show quite extensively over the years. Differential privacy is a system for publicly sharing information about a dataset by describing patterns of groups within the dataset, the catch is you have to do this without revealing information about individuals in the dataset (privacy). Ryan currently applies differential privacy at LinkedIn, but he has worked in the field, and on the related topic of federated learning, for quite some time. He was introduced to the subject as a PhD student at the University of Pennsylvania, where he worked closely with Aaron Roth, who we had the pleasure of interviewing back in 2018. Ryan later worked at Apple, where he focused on the local model of differential privacy, meaning differential privacy is performed on individual users' local devices before being collected for analysis. (Apple uses this, for example, to better understand our favorite emojis 🤯 👍👏). Not surprisingly, they do things a bit differently at LinkedIn. They utilize a central model, where the user's actual data is stored in a central database, with differential privacy applied before the data is made available for analysis. (Another interesting use case that Ryan mentioned in the interview: the U.S. Census Bureau has announced plans to publish 2020 census data using differential privacy.) Ryan recently put together a research paper with his LinkedIn colleague, David Durfee, that they presented as a spotlight talk at NeurIPS in Vancouver. The title of the paper is a bit daunting, but we break it down in the interview. You can check out the paper here: Practical Differentially Private Top-k Selection with Pay-what-you-get Composition. There are two major components to the paper. First, they wanted to offer practical algorithms that you can layer on top of existing systems to achieve differential privacy for a very common type of query: the "Top-k" query, which means helping answer questions like "what are the top 10 articles that members are engaging with across LinkedIn?" Secondly, because privacy is reduced when users are allowed to make multiple queries of a differentially private system, Ryan's team developed an innovative way to ensure that their systems accurately account for the information the system returns to users over the course of a session. It's called Pay-what-you-get Composition. One of the big innovations of the paper is discovering the connection between a common algorithm for implementing differential privacy, the exponential mechanism, and Gumbel noise, which is commonly used in machine learning. One of the really nice connections that we made in our paper was that actually the exponential mechanism can be implemented by adding something called Gumbel noise, rather than Laplace noise. Gumbel noise actually pops up in machine learning. It's something that you would do to report the category that has the highest weight, [using what is] called the Gumbel Max Noise Trick. It turned out that we could use that with the exponential mechanism to get a differentially private algorithm. [...] Typically, to solve top-k, you would use the exponential mechanism k different times⁠ —you can now do this in one shot by just adding Gumbel noise to [existing algorithms] and report the k values that are in the the top […]which made it a lot more efficient and practical. When asked what he was most excited about for the future of differential privacy Ryan cited the progress in open source projects. This is the future of private data analytics. It's really important to be transparent with how you're doing things, otherwise if you're just touting that you're private and you're not revealing what it is, then is it really private? He pointed out the open-source collaboration between Microsoft and Harvard's Institute for Quantitative Social Sciences. The project aims to create an open-source platform that allows researchers to share datasets containing personal information while preserving the privacy of individuals. Ryan expects such efforts to bring more people to the field, encouraging applications of differential privacy that work in practice and at scale. Listen to the interview with Ryan to get the full scope! And if you want to go deeper into differential privacy check out our series of interviews on the topic from 2018. Thanks to LinkedIn for sponsoring today's show! LinkedIn Engineering solves complex problems at scale to create economic opportunity for every member of the global workforce. AI and ML are integral aspects of almost every product the company builds for its members and customers. LinkedIn's highly structured dataset gives their data scientists and researchers the ability to conduct applied research to improve member experiences. To learn more about the work of LinkedIn Engineering, please visit engineering.linkedin.com/blog.
In the final episode of our Azure ML series, we're joined by Erez Barak, Partner Group Manager of Azure ML at Microsoft. Erez's group is currently focused on AutoML, and if AutoML is something you're interested in, this is the talk for you. In our conversation, Erez gives us a full breakdown of his AutoML philosophy, including how he defines "true AutoML" and his take on the AutoML space, its role and its importance. We also discuss in great detail the application of AutoML as a contributor to the end-to-end data science process, which Erez breaks down into 3 key areas; Featurization, Learner/Model Selection, and Tuning/Optimizing Hyperparameters. Finally, we discuss post-deployment AutoML use cases and other areas under the AutoML umbrella that are generating excitement. Get the transcript
Sam Charrington: Hey, what's up everyone? This is Sam. A quick reminder that we've got a bunch of newly formed or forming study groups, including groups focused on Kaggle competitions and the fast.ai NLP and Deep Learning for Coders part one courses. It's not too late to join us, which you can do by visiting twimlai.com/community. Also, this week I'm at re:Invent and next week I'll be at NeurIPS. If you're at either event, please reach out. I'd love to connect. All right. This week on the podcast, I'm excited to share a series of shows recorded in Orlando during the Microsoft Ignite conference. Before we jump in, I'd like to thank Microsoft for their support of the show and their sponsorship of this series. Thanks to decades of breakthrough research and technology, Microsoft is making AI real for businesses with Azure AI, a set of services that span vision, speech, language processing, custom machine learning, and more. Millions of developers and data scientists around the world are using Azure AI to build innovative applications and machine learning models for their organizations, including 85% of the Fortune 100. Microsoft customers like Spotify, Lexmark, and Airbus, choose Azure AI because of its proven enterprise grade capabilities and innovations, wide range of developer tools and services and trusted approach. Stay tuned to learn how Microsoft is enabling developers, data scientists and MLOps and DevOps professionals across all skill levels to increase productivity, operationalize models at scale and innovate faster and more responsibly with Azure machine learning. Learn more at aka.ms/azureml. All right, onto the show. Sam Charrington: [00:01:52] All right everyone, I am here in Sunny Orlando, actually it's not all that sunny today, it's kind of gray and gray and rainy but it is still Sunny Orlando, right? How could it not be? At Microsoft Ignite, and I've got the wonderful pleasure of being seated with Sarah Bird. Sarah is a principal program manager for Azure Machine Learning platform. Sarah, welcome to the TWIML AI Podcast. Sarah Bird: [00:02:15] Thank you, I'm excited to be here. Sam Charrington: [00:02:17] Absolutely. I am really excited about this conversation we're about to have on responsible AI. But before we do that, I'd love to hear a little bit more about your background. You've got a very enviable position kind of at the nexus of research and product and tech strategy how did you create that? Sarah Bird: [00:02:37] Well I started my career in research. I did my PhD in machine learning systems at Berkeley and I loved creating the basic technology, but then I wanted to take it to the next step and I wanted to have people who really used it. And I found that when you take research into production, there's a lot more innovation that happens. So since graduating I have styled my career around living at that intersection of research and product, and taking some of the great cutting edge ideas and figuring out how we can get them in the hands of people as soon as possible. And so my role now is specifically focused on trying to do this for Azure Machine Learning and responsible AI is one of the great new areas where there's a ton of innovation and research, and people need it right now. And so we're working to try to make that possible. Sam Charrington: [00:03:33] Oh, that's fantastic. And so between your grad work at Berkeley and Microsoft, what was the path? Sarah Bird: [00:03:42] So I was in John Lankford's group in Microsoft research and was working on a system for contextual bandits and trying to make it easier for people to use those in practice, because a lot of the times when people were trying to deploy that type of algorithm, the system infrastructure would get in the way. You wouldn't be able to get the features to the point of decision or the logging would not work and it would break the algorithm. And so we designed a system that made it correct by construction, so it was easy for people to go and plug it in, and this has actually turned into the Personalizer cognitive service now. But through that experience, I learned a lot about actually working with customers and doing this in production, and so I decided that I wanted to have more of that in my career. And so I spent a year as a technical advisor which is a great role in Microsoft where you work for an executive and advise them and help work on special projects. And it enables you to see both the business and the strategy side of things as well as all the operational things, how you run orgs and then of course the technical things. And I realized that I think that mix is very interesting. And so after that I joined Facebook and my role was at the intersection of FAIR, Facebook AI Research and AML which was the applied machine learning group with this role of specifically trying to take research into production and accelerate the rate of innovation. So I started the Onyx Project as a part of that, enabling us to solve a tooling gap where it was difficult to get models from one framework to another. And then also worked on PyTorch and enabling us to make that more production ready. And since then I've been working in AI ethics. Sam Charrington: [00:05:34] Yeah. If we weren't going to be focused on AI ethics and responsible AI today, we would be going deep into Personalizer, what was Microsoft Decision Service  and this whole contextual bandits thing. Really interesting topic, not the least of which because we talk a lot about reinforcement learning and if it's useful, and while it's not this deep reinforcement learning game playing thing, it's reinforcement learning and people are getting a lot of use out of it in a lot of different contexts. Sarah Bird: [00:06:05] Yeah. When it works, right? It doesn't work in all cases, but when it works, it works really well. It's the kind of thing where you get the numbers back and you're like, can this be true? And so I think it's a really exciting technology going forward and there's a lot of cases where people are using it successfully now, but I think though there'll be a lot more in the future. Sam Charrington: [00:06:25] Awesome. I'll have to take a rain check on that aspect of the conversation and kind of segue over to the responsible AI piece. And I've been thinking a lot about a a tweet that I saw by Rachel Thomas who is a former guest of the podcast, long time friend of the show and currently the UCSF Center for Applied Data Ethics head. And she was kind of lamenting that there are a lot of people out there talking about AI ethics like it's a solved problem. Do you think it's a solved problem? Sarah Bird: [00:06:58] No, absolutely not. I think there are, are fundamentally hard and difficult problems when we have a new technology, and so I think we're always going to be having the AI ethics conversation, this is not something that we're going to solve and go away. But what I do think we have now is a lot more tools and techniques and best practices to help people start the journey of doing things responsibly. And so I think the reality is there are many things people could be doing right now that they're not. And so I, I feel like there's an urgency date to get some of these tools into people's hands so that we can do that. So I `think we can quickly go a lot farther than we have right now. Sam Charrington: [00:07:41] In my conversations with folks that are working on this and thinking about the role that responsible AI plays and the way they "do AI," do machine learning. A lot of people get stopped at the very beginning like: Who should own this? Where does it live? Is it a research kind of function or is it a product function, or is it more of a compliancy thing for a chief data officer or a chief security officer? [Is it] one of those executive functions and oversight, or compliance is the better word? What do you see folks doing and do you have any thoughts on successful patterns of where it should live? Sarah Bird: [00:08:33] Yeah, I think the models that we've been using and are thinking a lot about... the transition  to security, for example. And I think the reality is it's not one person's job or one function. Everybody now has to think about security, even your basic software developers have to know and think about it when they're designing. However, there are people who are experts in it and handle the really challenging problems. There is of course legal and compliance pieces in there as well. And so I think we're seeing the same thing where we really need every role to come together and do this. And so one of the patterns we are seeing is part of the challenge with responsible AI and technology is that we've designed technology to abstract away things and enable you to just focus on your little problem, and this has led to a ton of innovation. However, the whole idea of responsible AI is actually, you need to pick your head up, you need to have this larger context, you need to think about the application in the real world, you need to think about the implications. And so we have to break a little bit of our patterns of 'my problem is just this little box,' and so we're finding that user research and design, for example, is already trained and equipped to think about the people element in that. And so it's really great to bring them into more conversations as we're developing the technology. So that's one pattern that we're finding adds a lot  of value. Sam Charrington: [00:10:07] In my conversation with with Jordan Edwards, your colleague, many of his answers were all of the above. And it sounds like this one is an "all of the above" response as well. Sarah Bird: [00:10:19] Yeah. I think doing machine learning in practice takes a lot of different roles, as Jordan was talking about, in operationalizing things, and then responsible AI just adds an extra layer of more roles on top of that. Sam Charrington: [00:10:32] Yeah. I guess one of the challenges that kind of naturally evolves when everyone has to be thinking about something is that it's a lot  harder, right? The developer is trained as a developer and now they have to start thinking about this security thing, and it's changing so quickly and the best practices are evolving all the time, and it's hard to stay on top of that. If we're to replicate that same kind of model in responsible AI, what sounds like the right thing to do? How do we support the people that are on the ground trying to do this? Sarah Bird: [00:11:07] Yeah. And I think it's definitely a challenge because the end result can't be that every individual person has to know the state of the art in every area in responsible AI. And so one of the ways that we're trying to do this is, as much as possible, build it into our processes and our tooling. So that you can say, okay, well you should have a fairness metric for your model and you can talk to experts about what that fairness metric should be, but you should know the requirement that you should have a fairness metric, for example. And so we first are starting with that process layer and then in Azure Machine Learning, we've built tools that enable you to easily enact that process. And so the foundational piece is the MLOps story that Jordan was talking about where we actually enable you to have a process that's reproducible, that's repeatable. So you can say, before this model goes into production, I know that it's passed these validation tests and I know that a human looked at it and said, it looks good. And if it's out in production and there's an error or there's some sort of issue that arises, you can go back, you can recreate that model, you can debug the error. And so that's the real foundational piece for all of it. And then on top of that, we're trying to give data scientists more tools to analyze the models themselves. And there's no magic button here. It's not just, Oh, we can run a test and we can tell you everything you want to know. But there's lots of great algorithms out there and research that help you better understand your model. Like SHAP or LIME are common interpretability ones. And so we've created a toolkit called Interpret ML, this is an open source toolkit you can use it anywhere. But it enables you to easily use a variety of these algorithms to explain your model behavior and explore it and see if there are any issues. And so we've also built that into our machine learning process so that if I build a model, I can easily generate explanations for that model. And when I've deployed it in production, I can also deploy and explain her with it so individual predictions can be explained while it's running so I can understand if I think it's doing the right thing and if I want to trust it, for example. Sam Charrington: [00:13:35] It strikes me that there's a bit of a catch 22 here, in the sense that the only way we could possibly do this is by putting tools in the hands of the folks that are working data scientists and machine learning engineers that are working on these problems. But the tools in their very nature kind of abstract them away from the problem and allow them, if not, encourage them to think less deeply about what's going on underneath. Right? How do we address that? Do you agree with that first of all? Sarah Bird: [00:14:09] No, I completely agree with that and it's a challenge that we have in all of these cases where we want to give the tool to help them and to have more insight but it's easy for people to just use it as a shortcut. And so in a lot of cases, we're being very thoughtful about the design of the tool and making sure that it is helping you surface insights. But it's not saying this is the answer because I think when you start doing that where you have something that flags and says this is a problem, then people really start relying on that. And maybe someday we will have the techniques where we have that level of confidence and we can do it. But right now we really don't, and so I think a lot of it is making sure that we designed the tools that encourages this mindset of exploration and deeper understanding of your models and what's going on. And not just, Oh, this is just another compliance tests I have to pass I just run this test and it says green. And I go. Sam Charrington: [00:15:12] You alluded to this earlier in the conversation, but it seems appropriate here as well, and it's maybe a bit of a tangent, but so much of pulling all these pieces together is kind of a user experience and design. Any thoughts on that? Is that something that you've kind of dug into and studied a lot? Or are the other folks worry about that here? Sarah Bird: [00:15:36] It's not in my background, but to me it's an essential part of the function of actually making these technologies usable. And particularly when you take something that as complex as an algorithm and you're trying to make that abstracted and usable for people, the design is a huge part of the story. And so what we're finding in responsible AI is that we need to think about this even more. And a lot of the guidelines are saying be more thoughtful and include sort of more careful design. For example, people are very tempted to say, well, this is the data I have so this is the model I can build and so I'm going to put it in my application that way. And then if it has too much inaccuracy, then you spend a lot of resources to try and make the model more accurate where you could have just had a more elegant UI design, for example, where you actually get better feedback based on the UI design or the design can tolerate more errors and you don't need that higher model accuracy. So we're really encouraging people to co-design the application in the model and not just take it for granted that this is what the model does and that's the thing we're gonna focus on. Sam Charrington: [00:16:53] With the Interpret ML tool, what's the user experience like? Sarah Bird: [00:17:01] It depends on what you're trying to do, there's two types of interpretability that people think about. One is what we call Glass-Box models. And the idea there is I want my model to be inherently interpretable. So I'm gonna pick something like a linear model or decision trees where I can actually inspect the model and enable you to to build a model of that, that you can actually understand. And so we support a bunch of different Glass-Box explainer or models. So then you can actually use it to train your own model. And the other part is Black-Box explainers where I have a model that I is a black box and I can't actually inspect it, but I can use these different algorithms to explain the behavior of the model. And so in that case what we've done is made it easy for you to just call and explain and ask for global explanations and ask for local explanations and ask for feature importance. And then all of those are brought together in an interactive dashboard where you can actually explore the explanations and try to understand the model behavior. So a lot of the experience it's an SDK and so it's all easy calls to ask for explanations, but then we expect a lot of people to spend their time in that dashboard exploring and understanding. Sam Charrington: [00:18:32] I did a really interesting interview with Cynthia Rudin who you may know she's a Duke professor and the interview was focused on her research that essentially says that we should not be using black box models in, I forget the terminology that she used, but something like mission critical scenarios or something along those lines where we're talking about someone's life or Liberty that kind of thing. Does providing interpretability tools that work with black box models, like encourage their use in scenarios that they shouldn't really be used in? And are there ways that you advise folks when and when not they should be using those types of models? Sarah Bird: [00:19:19] So we have people who do publish best practices for interpretability and  it's a very active area of work for the company. And we work with the partnership on AI to try to make industry-wide recommendations for that. I don't think it's completely decided on this idea that models should be interpretable in these settings versus, well, we want other mechanisms to make sure that they're doing the right thing. Interpretability is one way that we could be sure that they're doing the right thing, but we also could have more robust testing regimes. Right? There's a lot of technologies where we don't understand every detail of the technology, but we've been able to build safety critical systems on top of it, for example. And so yeah as a company we do try to provide guidance, but I don't think the industry has really decided the final word on this. And so the mindset of the toolkit is enabling you to use these techniques if it's right for you. But that doesn't specifically say that you should go use a neural net in a particular setting. Sam Charrington: [00:20:27] So in addition to the Interpret ML toolkit you also announced this week here from Ignite, a Fair Learn toolkit. What's that all about? Sarah Bird: [00:20:39] So it's the same spirit as Interpret ML where we want to bring together a collection of fairness techniques that have been published in research and make it easy for people to use them all in one toolkit with the same spirit that you want to be able to analyze your model and understand how it's working so that you could make decisions around fairness. And so there's famously, many different fairness metrics published. I think there was a paper cataloging 21 different fairness metrics. And so we've built many of these common ones into the toolkit and then it makes it easy for you to compare how well your model works for different groups of people in your data set. So for example, I could say does this model have the same accuracy for men and women? Does this model have the same outcomes for men and women? And so we have an interactive dashboard that allows you to explore these differences between groups and your model performance through a variety of these metrics that have been published in research. Then we've also built in several mitigation techniques so that if you want to do mitigation via post-processing and your model, then you can do that. For example, setting thresholds per group. And in a lot of cases it might be that you actually want to go and fix the underlying data or you wanting to make some different decisions. So the mitigation techniques aren't always what you would want to do, but they're available if you want to do that. And so the name of the toolkit actually comes from one of these mitigation techniques from Microsoft research where the algorithm was originally called Fair Learn. And the idea is that you say, I wanna reduce the difference between two groups on a particular dimension. So you pick the metric and you pick the groups and the algorithm actually retrains your model by re-wading data and iteratively retraining to try to reduce that disparity. So we've built that into the toolkit. So now you can actually look at a variety of your versions of your model and see if one of them has properties that works better for what you're looking for, to deploy. Sam Charrington: [00:22:59] Again, I'm curious about the user experience in, in doing this. How much knob turning and tuning does the user need to do when applying that technique you were describing? Or is it more, I'm envisioning something like contextual bandage reinforcement learning where it's kind of tooling the knobs for you. Sarah Bird: [00:23:18] Yeah, it is doing the knobs and the retraining, but what you have to pick is which metric you're trying to minimize. Do I want to reduce the disparity between the outcomes or do I want to reduce the disparity and accuracy or some other there's many different metrics you could pick, but you have to know the metric that's right for your problem. And then you also need to select the groups that you want to do. So it can work in a single dimension like as we were saying making men and women more more equal, but then it would be a totally separate thing to do it for age, for example. So you have to pick both the sensitive attribute that you are trying to reduce disparity and you have to pick the metric for disparity. Sam Charrington: [00:24:10] Were you saying that you're able to do multiple metrics in parallel or you're doing them serially? Sarah Bird: [00:24:17] Right now the techniques work for one, for just one metric. So it will produce a series of models, and if you look at the graph, you can actually plot disparity by accuracy and you'll have models that are on that Pareto optimal curve to look at. But then if you said, okay, well now I want to look at that same chart for age, the models might be all over the place in the space of disparity and accuracy. So it's not a perfect technique, but there are some settings where it's quite useful. Sam Charrington: [00:24:48] So going back to this idea of abstraction and tools versus deeply understanding the problem domain and how to think about it in the context of your problem domain. I guess the challenge domain or your problem domain, I don't know what the right terms are. But you mentioned that paper with all of the different disparity metrics and the like. Is that the best way for folks to get up to speed on this or are there other resources that you've come across that are useful? Sarah Bird: [00:25:23] Yeah, I think for fairness in particular it's better to start with your application domain and understand, for example, if you're working in an employment setting, how do we think about fairness and what are the cases and so in that case we actually recommend that you talk to domain experts, even your legal department to understand what fairness means in that setting. And then you can go to the academic literature and start saying, okay, well, which metrics line up with that higher level concept of fairness for my setting. But if you start with the metrics I think it can be very overwhelming and there's just many different metrics and a lot of them are quite different and in other ways they're very similar with each other. And so I find it much easier to start with the domain expertise and know what you're trying to achieve in fairness and then start finding the metrics that line up with that. Sam Charrington: [00:26:22] You're also starting to do some work in the differential privacy domain. Tell me a little bit about that. Sarah Bird: [00:26:27] Yeah, we announced a couple of weeks ago that we are building an open source privacy platform with Harvard and differential privacy is a really fascinating technology. It was first published in Microsoft Research in 2006 and it was a very interesting idea, but it has taken a while for it, as an idea, to mature and develop and actually be able to be used in practice. However, now we're seeing several different companies who are using it in production. But in every case the deployment was a very bespoke deployment with experts involved. And so we're trying to make a platform that makes it much easier for people to use these techniques without having to understand them as much. And so the idea is the open source platform can go on top of a data store, enable you to do queries in a differentially private way, which means that actually it adds noise to the results so that you can't reconstruct the underlying data and also then potentially use the same techniques to build simple machine learning models. And so we think this is particularly important for some of our really societaly valuable datasets. For example, there are data sets where people would like to do medical research, but because we're worried about the privacy of individuals, there's limits to what they can actually do. And if we use differential private interface on that, we have a lot more privacy guarantees and so we can unlock a new type of innovation and research in understanding our data. So I think we're really excited and think this could be the future of privacy in certain applications, but the tooling just isn't there, and so we're working on trying to make it easier for people to do that. We're building it in the open source because it's important that people can actually ... It's very easy to get the implementation of these algorithms wrong and so we want the community and the privacy experts to be able to inspect and test the implementations and have the confidence that it's there. And also we think this is such an important problem for the community. We would like anybody who wants to, to be joining in and working on this. This is not something that we can solve on our own. Sam Charrington: [00:28:58] Yeah, differential privacy in general and differentially private machine learning are fascinating topics and ones that we've covered fairly extensively in the podcast. We did a series on differential privacy a couple of years ago maybe and it's continuing to be an interesting topic. At the Census Bureau I think is using differential privacy for the first time next year and it's both providing the anticipated benefits but also raising some interesting concerns about an increased opacity on the part of researchers to the data that they wanna get access to. Are you familiar with that challenge? Sarah Bird: [00:29:41] Yeah, absolutely. So the reality is people always want the most accurate data, right? It doesn't sound great to say, well, we're adding noise and the data is less accurate. But, in a lot of cases it is accurate enough for the tasks that you want to accomplish. And I think we have to recognize that, privacy is one of the sort of, fundamental values that we want to uphold, and so in some cases it's worth the cost. For the census in particular, to motivate the decision to start using this for the 2020 census they did a study where they took the reports from the 1940 census and they were able to recreate something like 40% of Americans' data with the result of just the outputs from the census. Sam Charrington: [00:30:33] Meaning personally identify 40% of Americans? Sarah Bird: [00:30:37] Yeah, he talks about this in his ICML keynote from last year. So if you want to learn more you can watch the keynote. But yeah, basically they took all the reports and they used some of these privacy attacks and they could basically recreate a bunch of the underlying data. And this is a real risk, and so we have to recognize that yes, the census results are incredibly important and they help us make many different decisions, but also protecting people's data is important. And so some of it is education and changing our thinking and some of it is making sure that we use the techniques in the right way in that domain where you're not losing what you were trying to achieve in the first place, but you are adding these privacy benefits. Sam Charrington: [00:31:21] There are a couple of different ways that people have been applying differential privacy one is a, a more centralized way where you're applying it to a data store. It sounds a little bit like that's where your focus is. Others like Apple's a noted use case where they're applying differential privacy in a distributed manner at the handset to keep user data on the iPhone, but still provide information centrally for analysis. Am I correct that your focus is on the centralized use case? Or does the toolkit also support the distributed use case? Sarah Bird: [00:32:02] We are focusing on the global model. The local model works really well, and particularly in some of these user telemetry settings, but it limits what you can do. You need much larger volume to actually get the accuracy for a lot of the queries that you need, and there aren't as many queries that you can do. And so the global model, on the other hand, there's a lot more that you can do and still have reasonable privacy guarantees. And so as I was saying, we were motivated by these cases where we have the data sets. Like somebody is trusted to have the data sets but we can't really use them. And so that looks like a global setting. And so to start, we're focused on, on the global piece, but there are many cases where the local is promising and there are cases where we are doing that in our products. And so it's certainly a direction that things could go. Sam Charrington: [00:32:58] And differential privacy from a data perspective doesn't necessarily get you to differentially private machine learning. Are you doing anything in particular on the differentially private ML side of things? Sarah Bird: [00:33:11] The plan is to do that but the project is pretty new so we haven't built it yet. Sam Charrington: [00:33:19] And before we wrap up, you're involved in a bunch of industry and research initiatives in the space that you've mentioned, MLSys, a bunch of other things. Can you talk a little bit about some of the broader things that you're doing? Sarah Bird: [00:33:38] Yeah, so I helped found the, now I think named MLSys systems and machine learning research conference. And that was specifically because I've been working at this intersection for a while and there were some dark days where it was very hard to publish work because the machine learning community was like, this is a systems result. And the systems community was like, this doesn't seem like a systems result and so we started the conference about two years ago and apparently many other people were feeling the same pain because even from the first conference, we got excellent work. People's top work, which is always a challenge with research conferences because people don't want to submit their best work to an unnamed conference. Right? But there was such a gap for the community. So it's been really exciting  to see that community form more  and now have a home where they can put their work and connect.  I've also been running the machine learning systems workshops at NeurIPS for several years now. And that's been a really fun place because it really has helped us form the community, particularly before we started the conference. But it's also a place where you can explore new ideas. This last year we're starting to see a lot more innovation at the intersection of programming languages and machine learning. And so in the workshop format we can have several of those talks highlighted, and have a dialogue, and show some of the emerging trends so that's been a really fun thing to be involved in. Sam Charrington: [00:35:13] Awesome. Yeah, was it last year that there was both the SysML workshop and the ML for systems workshop and it got really confusing? Sarah Bird: [00:35:24] Yeah. This year too. We have both. And I think that's a sign that the field is growing that it used to be that it felt like we didn't even have enough people for one room at the Intersection of Machine Learning and Systems. And I think this last year there was maybe four or 500 people in our workshop alone. And so that's great. Now, there's definitely room to have workshops on more focused topics. Right? And so I think machine learning for systems is an area that people are really excited about now that we have more depth in understanding the intersection. For me, it's very funny because that is really kind of the flavor of my thesis which was a  while ago. And so it's a fun to see it now starting to become an area that people are excited about. Sam Charrington: [00:36:16] The other conference that we didn't talk about, ML for Systems is all about using machine learning within computational systems, networking systems as a way to optimize them. So for example, ML to do database query optimization. Also a super interesting topic. Sarah Bird: [00:36:36] Yeah, I know it absolutely is. And I really believe in that, and I think for several years people were just trying to replace kind of all of the systems intelligent with one machine learning algorithm and it was not working very well. And I think what we're seeing now is recognizing that a lot of the algorithms that we used to control systems were designed for that way and  they work, actually, pretty well. But on the other hand, there's something that's dynamic about the world or the workload. And so you do want this prediction capability built in. And so a lot of the work now has a more intelligent way of plugging the algorithms into the system. And so now we're starting to see promising results at this intersection. So my thesis work was a resource allocation that built models in real time in the operating system and allocated resources. And it was exactly this piece where there was a modeling and a prediction piece, but, the final resource allocation algorithm was not purely machine learning. Sam Charrington: [00:37:43] Awesome. Wonderful conversation, looking forward to catching up with you at NeurIPS, hopefully. thanks so much for taking the time to chat with us. Sarah Bird: [00:37:52] Yes, thanks for having me. And I look forward to seeing you at NeurIPS. Sam Charrington: [00:37:56] Thank you.
Sam Charrington: Hey, what's up everyone? This is Sam. A quick reminder that we've got a bunch of newly formed or forming study groups, including groups focused on Kaggle competitions and the fast.ai NLP and Deep Learning for Coders part one courses. It's not too late to join us, which you can do by visiting twimlai.com/community. Also, this week I'm at re:Invent and next week I'll be at NeurIPS. If you're at either event, please reach out. I'd love to connect. All right. This week on the podcast, I'm excited to share a series of shows recorded in Orlando during the Microsoft Ignite conference. Before we jump in, I'd like to thank Microsoft for their support of the show and their sponsorship of this series. Thanks to decades of breakthrough research and technology, Microsoft is making AI real for businesses with Azure AI, a set of services that span vision, speech, language processing, custom machine learning, and more. Millions of developers and data scientists around the world are using Azure AI to build innovative applications and machine learning models for their organizations, including 85% of the Fortune 100. Microsoft customers like Spotify, Lexmark, and Airbus, choose Azure AI because of its proven enterprise grade capabilities and innovations, wide range of developer tools and services and trusted approach. Stay tuned to learn how Microsoft is enabling developers, data scientists and MLOps and DevOps professionals across all skill levels to increase productivity, operationalize models at scale and innovate faster and more responsibly with Azure machine learning. Learn more at aka.ms/azureml. All right, onto the show. Sam Charrington: [00:01:52] All right everyone, I am here in sunny Orlando, Florida at Microsoft Ignite and I've got the pleasure of being seated across from Jordan Edwards. Jordan is a Principal Program Manager for the Azure Machine Learning Platform. Jordan, welcome to the TWIML AI Podcast. Jordan Edwards: [00:02:08] Oh thanks. Sam Charrington: [00:02:10] I'm really looking forward to talking with you about our subject for the day, MLOps and related topics. But before we do that, I'd love to hear a little bit about your background. It sounds like you got started off at Microsoft where a bunch of folks that are now working on ML and AI got started, in the Bing group. Jordan Edwards: [00:02:29] Right. I started Microsoft a little over seven years ago. I started off working on the big data platforms and related machine learning platforms. Then I ended up working on engineering systems for those platforms, then we decided to take those engineering systems and apply them to machine learning as well. Hence, the internal machine learning platform was born. And as you mentioned, like a bunch of other folks who used to work on Bing, we all got moved into, "Hey, let's take the cool stuff we built for Bing's internal engineering platform and bring it to external customers on Azure. And so I've been in the Azure machine learning team a little bit over a year now. Sam Charrington: [00:03:08] Nice, nice. And your role here on the team? Jordan Edwards: [00:03:11] Yes. I'm the product area lead for what we call MLOps.  Which is really all about how do you bring your machine learning workflows to production. Sam Charrington: [00:03:19] A topic that we spend a lot of time talking about here on the podcast as well as our recent TWIMLcon AI platforms event. Maybe starting, kind of directly connecting to your background, I'm curious the transition from a team that largely came out of this internal product or project, Bing, and is now trying to generalize those systems but broader knowledge and learnings to the market. What are the commonalities and differences that you encounter in trying to do that? Jordan Edwards: [00:03:57] So there's actually a lot of commonalities when you double click on it. But the biggest thing is Bing and Office 365,  internal Microsoft teams have been doing AI and ML for a long time. And so they built up a lot of habits and tools and technologies, but also a lot of things that don't necessarily map to how we see enterprises getting started, right? So most of our external customers today are coming in wanting to do Python-based development and we have some of that internally. But we also have languages that predate the popularity of Python as a data science platforms. We have engineers doing machine learning work in .NET and C++. And so those workflows are a bit different. Also, a lot of the machine learning platforms at Microsoft as you would imagine were previously Windows-based. Whereas the new customers coming in want to do things using Linux and containers and there are newer techniques that are being applied as well. There's similarities in there. The ways they wanna solve the problem. But just different tools that they're using. And also just different amounts of contacts that have been built up. There's also the matter of scale.  So when you look at teams like Bing, they've got a thousand data scientists that are collaborating together to train these huge models.  Most of the enterprise customers that we're talking to, they have small teams scattered all over the place or they're trying to staff a team. Or they have a team and they're not sure how to make best use of their time. And also the most common problem that we're seeing that they come to us with is, "Hey, we have all these data scientists who are doing work in Jupyter notebooks whatever, the work is happening on their local machines. We have no idea where the code is. If the codes even checked in." And they're doing all this work, but we can't leverage any of it on the business side. Sam Charrington: [00:05:49] There's so many, so many problems in that problem statement, right? Jordan Edwards: [00:05:53] Correct. Sam Charrington: [00:05:54] There is,  kind of a reproducibility problem. There's a business value path to production problem. There is kind of an accountability problem. When you unpack that, do you prioritize those? Jordan Edwards: [00:06:11] So we try to put it in terms of like a process maturity model. It's exactly how you framed it. There's the reproducibility of the work. So another data scientist in the team could reproduce the same work that one person did. And then an automated system could also reproduce that work. Which means you need clean modeling around the code and data and configuration that you're using in your model development process. Then there's the, how do you transition this model, this thing you've created to production. So how do you package it? How do you certify it and how do you roll it out in a controlled fashion? And then at the end,  how do you determine the business value of your model? Is it making your business more effective?  From a cost point of view, is it worth the amount of compute hours you're spending and the amount of man hours you're spending training these models? And then on the absolute end of the process maturity model is, "Okay, I've got this model, it's reproducible. I've got it deployed out. I'm using it for a production scenario.  how do I know when I might need to retrain it?" So completing the circle. And that's always the question that customers will come and start with is, "How do we do automated retrain?" It 's like, "Let's  walk back and to begin at how do you reproduce these models in the first place?" Sam Charrington: [00:07:26] That strikes me as a mature customer that's asking about automated retraining, right? Jordan Edwards: [00:07:31] Correct. Sam Charrington: [00:07:31] Most people are trying to get the model into production in the first place, or many. Jordan Edwards: [00:07:35] Right. They see the marketing hype, they read all the things like, "Oh, look at this company doing cool automated retraining stuff." And realistically, it takes a long time to get to that degree of maturity where you can trust that you have the high-quality data coming into your production systems to be able to analyze and compare and figure out, I do need to retrain. And even, even in the case of like, of like being in office,  ML development teams, there's never a fully automated retraining loop. It's always that there's a scorecard that gets generated and humans go and do some sort of review process prior to your new larger models going up. Especially when they deal with things like how do you monetize ads for instance. Sam Charrington: [00:08:14] So there's a lot there to dig into, but before we do that one of the questions that I had for you is, you've got MLOps in your title, what does that mean to you? Jordan Edwards: [00:08:24] So, that means to me that it's all about how do you take the work that data scientists are doing and make their lives easier, but also make it easier for others, other personas to come into the fold and take advantage of data science. So the three personas I like to talk about is you have your data engineer, who has got this giant lake of data. They want to figure out what value they can derive from it. You've got your data scientist who's tasked with finding interesting features in that data and training models on top. And then you've got this new emerging persona called the ML engineer whose responsibility it is to take the work that data scientist is doing and bring it to production. And so my job is to help the ML engineer be able to be successful and help the ML engineer be able to,  interact well with the data engineering and data science personas that are required to sort of complete that circle. And of course you also have the hub and the center of it. Your IT ops persona, who's giving them all of the raw compute and storage resources to get started. Making sure everybody plays nicely together and actually connects things end-to-end. Sam Charrington: [00:09:36] And so there's kind of an obvious echo to DevOps. To what extent is that, is it inspirational? Is it kind of directly applicable or is it counter applicable meaning just don't try to do exactly what you're doing in DevOps? Jordan Edwards: [00:09:54] I think it's sort of all three of the things that you mentioned. Shocking, right? Tee me up.  so as far as how it's inspirational, definitely the practices that have been developed in the DevOps field over the past 20 years or so are useful. However, data scientists are not software engineers. And they're not even engineers. A lot of them are scientists. So telling them they need to care about things related to the infrastructure and package version management and dealing with all of the intricacies of how to run a production infrastructure. That's just not something that they're interested in at all. So trying to force these habits onto them. We've seen this, even trying to get them to write tests for their code. It takes a lot of education on the net value add they're going to get from it before they're willing to onboard. So definitely inspirational from a process point of view. A lot of the same tools are applicable,  but then you also need new tools that are domain-specific too: How do you do data versioning? How do you do model versioning?  How do you validate and run integration testing on models?  How do you release and do AB comparison on a model as opposed to a normal software application and know if it's better or not? So yeah. Inspirational, applicable and you'll get hit in the face by a data scientist if you tell them to go and implement all these things themselves. Sam Charrington: [00:11:23] One of the things you mentioned earlier was testing. What's the role of testing in an MLOps process and what kind of experiences have you had working with real customers to implement testing procedures that make sense for ML? Jordan Edwards: [00:11:41] Right. So,  the place we try to start is by integrating some sort of tests on the data itself.  So ensuring that your data is of the same schema. You have high-quality data, like a colonized feature hasn't just been dropped or the distribution of values and that feature haven't changed dramatically. And so a lot of the stuff that we've built into the machine learning platform, especially on the dataset profiling side are designed to help you with that, to help you with skew testing and analyzing. Is your data too different to the point where you shouldn't need to be training it? Or is your data too similar, or is it in that sweet spot where the same training pipeline is actually applicable to go and solve the problem. That's on the profiling side. And then we also have some advanced capabilities on the drift side.  So analyzing over time, how are the inputs or features into your model changing? Whether that's a training versus scoring or day over day, week over week, month over month of the data coming into your model when it's making predictions.  Has the shape of that data changed over time? Do you still trust the model based on the input values? And then, of course, you have the other end of it too. Which is looking at the predictions the model is making. Whether it's from a business application. So say I'm using the Outlook app on my phone and I've got the smart reply model running there. Now either they didn't click on any of my suggestions,  they clicked on a different suggestion from the one I did, they clicked on the top suggestion that I had or they said, "I didn't like any of these suggestions." All those types of feedback come into telling you is the quality of data that you've trained your model on giving you a useful model on the prediction side? So skew testing, validating your data's quality correctness, consistency between training and inference of all those things. Sam Charrington: [00:13:46] Okay. So I'm kind of pulling at threads here maybe taking a step back, you talked a little bit about a maturity model that when you look at customers, they kind of fall into these different buckets. Is there a prerequisite for starting to think about MLOps? Jordan Edwards: [00:14:05] So I think the prerequisite is you have to have desire to apply a model to a business need. If your only goal is to write a model to say publish a paper and read like, "Hey, I have this model to solve this school problem." Then you don't really need any of the MLOps stuff. And if you're just mucking around in a Jupyter notebook trying some different things by yourself, it's also a stretch to say like, "Oh, you need these MLOps practices now." But the second you go beyond I'm keeping all my notes in Jupyter or I'm dumping them into OneNote somewhere and just keeping track of all my experiments on my own. The second you want collaboration or reproducibility or the ability to scale up and scale out to run your jobs in the cloud, that's where MLOps starts coming into play. Sam Charrington: [00:14:52] I agree that collaboration is a big driver, but even an individual researcher that's tracking hyperparameters and file names or on Post-it Note or something even worse can benefit from some elements of the tooling that we kind of refer to as MLOps. Would you agree with that? Jordan Edwards: [00:15:11] I would. Yeah.  but just trying to sell them on using everything from the very beginning is a tougher sell. So we start by saying, just start by tracking your work. So it's the whole process maturity flow is you start with work tracking, then making sure everything's in the reproducible pipeline and then making sure that others can go and take advantage of that pipeline. And then you actually have the model that you can go and use in other places. Sam Charrington: [00:15:35] Okay. Yeah. I liked the way you pulled that together because in a lot of ways one of the questions that I've been kind of noodling around for a while now is where does MLOps start and end relative to platforms and tooling and the things that enable and support MLOps. And it's very much like the conversation we were having around DevOps, like DevOps isn't you know, containers and Kubernetes and things like that. DevOps is a set of practices and it's very much to your point that end-to-end process. So you might need any one of a number of the tools that someone might use to enable MLOps, but that doesn't necessarily mean that you need MLOps. Jordan Edwards: [00:16:22] Right. And sure, I work on Azure Machine Learning. When I'm talking to customers about, well how does MLOps actually work? You're going to have at least three different tools and technologies being used, right? 'Cause you have three different personas. You have data engineering, data science and DevOps ML engineering, which means you're going to have some sort of a data pipelining tool- something like Data Factory or Airflow in the open-source world. Something to help with managing your training pipelines. Whether it's  Azure ML as a managed service or something like Kubeflow if you're in the open-source community. And then same thing on the release management side. Whether it's, you're using Azure DevOps or get up actions or you're running your own Jenkins server. Either way, there's gonna be at least those three different types of tools with different personas and they all need to work together and interoperate.  So that's another key part of our pitches: make sure that you're being flexible in how you're producing and consuming events because MLOps is more than just model ops and you need to make sure it fits into your data and Dev side of the house. Sam Charrington: [00:17:28] Yeah. Yeah. Awesome.  You mentioned,  Azure DevOps playing a role in here and Jenkins on the open-source side. These are tools that from the DevOps perspective, you associate with CICD, continuous integration and continuous delivery. The idea being that there's a parallel on the model deployment side. Can you elaborate a little bit on how those tools are used? Jordan Edwards: [00:17:51] Yeah. So the way we like to look at it from a DevOps point of view is we wanna treat a model as a packaged artifact that can be deployed and used in a variety of places.  So you have your pickle file or whatever, but you also have the execution context for... I can instantiate this model into a class and Python or I can embed it into my Spark processing pipeline or I can deploy it as an API and a container onto a Kubernetes cluster, something like that. So it's all about how do you bring the model artifact in as another thing that can be used in your release management process flow. Sam Charrington: [00:18:27] It does not have to be a pickle file, it could be... Jordan Edwards: [00:18:30] It could be anything exactly. Yeah. This is my serialized graph representation,  here's my code file,  my config that I'm feeding in. So it's a model is just like any other type of application. It just happens to come from or have some sort of association to a machine learning framework which have come from some data.  Which is actually another important part of the MLOps story is what's the end-to-end lineage look like, right? So ideally you should be able to go from I have this application that's using this model,  here's the code and config that was used to train it. And here is the data set that this model came from. Especially when we're talking to customers in more of the highly regulated industries. So, healthcare, financial services.  Say you have a model deployed that's determining if it's gonna approve or reject somebody for a loan. You need to be very careful that you've maintained your full audit trail of exactly where that model came from in case somebody decides to come in and ask further of this. This also becomes more complicated, the more of a black box that your model is. But in general, the goal of having all of these different technologies work together and interoperate is so that you can track sort of your correlation ID or correlation vector across your entire data and software and modeling landscape. Sam Charrington: [00:19:56] We talk about that end-to-end lineage. Is that a feature? You use Tool-X,  use Azure ML and click a button and you have that? Or is it more than that and a set of disciplines that you have to follow as you're developing the model? Jordan Edwards: [00:20:15] So yeah. The latter leads to an ailment of the former. So assuming that you use the- Sam Charrington: [00:20:23] I think you're all of the above guy. Jordan Edwards: [00:20:26] Yeah, yeah, yeah. You're teeing it up right. So when it comes to using the tools the right way, sure you could just have a random CSV file that you're running locally to train a model on. But if you wanna assert, you have proper lineage of your end-to-end ML workflow. Like that CSV file should be uploaded into blob storage and blocked down and accessed from there to guarantee that you can come back a year later and reproduce where this model came from. Same thing on the code and packaging and the base container images that you're using when you're training the model. All that collateral needs to be kept around. And what does that allow you to do? So, we have the inside of the machine learning service,  internal,  meta store that keeps track of all the different entities and the edges that connect them together. And,  right now we have sort of a one hop exposure of that. But one of the things we're working on is a more comprehensive way to peruse the graph. So it's like, "Hey, across my enterprise, show me every single model that's been trained using this dataset." Not scoped toWa single project that my team is doing, but across the entire canvas. Show me everybody using this data set.  What types of features are they extracting from it?  Is somebody doing work that's similar to mine? Can I just fork their training pipeline and build on top of it? And going back to how has this work we've done for internal teams inspired the work we're doing on Azure?  That's probably the most powerful part of our platform for internal Microsoft teams is the discovery, the collaboration, the sharing. That's what allows you to do ML at high scale, at high velocity. And so we want to make sure as much as we can that the tools and technologies that we have on Azure provide that same capability,  with all of the enterprise-ready features that you would come to expect from Microsoft and Azure. Sam Charrington: [00:22:27] Yeah. So in that scenario you outlined the starting place as a dataset that's uploaded to blob storage.  Even with that starting place you've kind of disconnected your ability to do lineage from the source dataset which may be in a data warehouse or something like that. Is there also the ability to point back to those original sources? Jordan Edwards: [00:22:55] Oh yeah. sometimes you'll have a CSV there, but you can also connect to a SQL database or to your raw data lake and have a tracking of, okay, this is the raw data. Here's say the data factory job that did all these transformations. Here's my curated dataset. Here's all the derivations of that data set. Here's the one I ended up using for training this model. I took this model and transfer learned on top of it to produce this new model. And then I deployed this model as this API and you can trace things all the way back to there. And then going the other way, when this model is now running,  I can be collecting the inputs coming into my model and the predictions my model is making. I log those into Azure monitor and then my data engineer can set up a simple job to take that data coming in and put it back into the lake or put it back into a curated data set that my data scientist can now go and experiment on and say, "Well how's the data coming into my model that's deployed compared to when I trained it." That's completing the circle back to the beginning. Sam Charrington: [00:23:57] Nice. Nice.  Which conceivably,  you could, as opposed to talking about a data set, this data set has produced what models you could point to a particular row in a data warehouse or something like that or a value and say what's been impacted by this particular data point. Jordan Edwards: [00:24:17] Exactly. And that's the value that we're trying to get out of the new generation of Azure data lake store and some of the work we're doing on the Azure data catalog side is to give you exposure into what's all the cool stuff that's being done or not being done with this data. It goes back to letting your,  decision-makers know, am I occurring business value from these ETL pipelines that I'm spinning all these compute dollars to go and cook these curated data sets in. And that's a large part of what our larger ML platform team did before as well, we helped with creating curated data sets for Bing and Office to go and build models on top of. So we had the data engineering pipelines and the machine learning pipelines and the release management pipelines all under the same umbrella. Which helped to inform the way we're designing the system now to be designed to meet enterprises where they are and help them scale up and out as they go. Sam Charrington: [00:25:16] I'm curious, what are some of the key things that you're learning from customers kind of on the ground who are working to implement this type of stuff? How would you characterize where folks are, if you can generalize and what are the key stumbling blocks? Jordan Edwards: [00:25:35] So if we were to think about it in terms of four phases where phase one is kicking the tires, phase two is model is reproducible, phase three is models deployed and being used in phase four is I have all the magical automated retraining wizardry. They're mostly between phase one and phase two right now. Very few of them have actually gotten a model deployed into the wild. If they have it deployed, it's only deployed as  a Dev test API. They don't trust it yet. So that's one learning is customers were a lot earlier in the journey than we've been expecting coming from doing this for internal Microsoft teams. Another one is that for the customers we're talking to, their internal organizations are not always structured to let them innovate most effectively. So they'll have part of their org, their data team and their IT department and their research teams are totally disconnected, disjointed, don't communicate to each other, don't understand each other. And so IT just sees what the researchers are doing and says, "There's no way you're doing any of this in production." But data engineers are unsure what data the data scientists are using. A data scientist might be off running SQL queries in the side, but they have no idea from which tables, the tables will disappear under the data scientist. So  instead of  doing a pure, "Okay, here's how to use the platform." It's more, "Hey, let's get all the right people in the room together from IT and research and your data platform and your software development platforms and start a conversation and build up the domain expertise and the relationships on the people side before you get started by the process or the platform. That's been, yeah. One big learning is to step back and focus on getting the right people involved first and then they can figure out the process that's going to work well for their business. And then they can adopt the platform tools that we've been building to help them be more efficient at doing end-to-end ML. Sam Charrington: [00:27:38] Are you finding that there's a pattern in organization that allows organizations to move more quickly? Like centralized versus decentralized or quote on quote 'center of excellence' or embedded into business units? Are there any of those that works best? Jordan Edwards: [00:27:58] I think what we've seen work best is to have one business unit sort of act as the incubator to vet the end-to-end flow and actually get a model working in production. But then have the overall center of excellence, a centralized team, observe what they're doing and take notes and let them flesh out what the canonical reference, MLOps architecture pipeline should look like. So out of all the patterns, we've seen a lot of patterns being applied, that one seems to be the best so far though is to let a small team give them some flexibility to go and build a model, take it to production with some light guardrails and they can build out the reference architecture. The Git Repository and CICD pipeline templates that the rest of the teams and the company can use. Sam Charrington: [00:28:51] And is the salient point there that the end business unit that has the problem owns the deployment of the model as opposed to the centralized but somewhat disconnected,  data science or AI COE? Jordan Edwards: [00:29:06] Yes. So your DevOps team for your business unit needs to know and understand the fact that a model is gonna be entering their ecosystem and needs to be able to manage it with the same tools they manage their other application releases with. Hence the integration with Azure DevOps to make sure that all your pipelines are tracked and managed in one place. And there's not this one row release pipeline that's coming in and causing issues and havoc with the rest of your production system. Sam Charrington: [00:29:34] And generally, when you look at these production pipelines did the pipelines and the tooling resonate with the DevOps teams or are they like this strange beast that takes a long time for them to wrap their heads around? Jordan Edwards: [00:29:48] So they freak out until they see the Azure DevOps integration. Then they're like, "Oh, okay, I understand that." Hence where I'm like, you need to have the tools that your audience can understand or you show them a Jupyter notebook, they'll jump out of their seats and run away scared. Whereas you show them, "Oh, here's a managed multi-phase released pipeline with clearly defined declarative ammo for the different steps." That resonates well with them. Whereas data scientists, you show them a big complex approval flow and they're going to be like, "I'm never using any of this." You showed them a Jupyter notebook, they're happy or an ID with low friction Python. And then your data engineers, again you showed them a confusing notebook process flow. They're not going to like that as much, but you show them a clean like ETL where they can drag and drop and run their SQL queries and understand, are their pipelines running in a stable fashion? That resonates with them. So yeah different personas, different tools they need to work together and figure out what process is going to work for their business needs. Sam Charrington: [00:30:48] As I've kind of looked at primarily this machine learning engineer role that has been emerging over the past few years and now we're talking about the DevOps engineer as a separate thing, but the line is kind of a gray moving and blurred line, right? Jordan Edwards: [00:31:02] Yeah. What we've seen in terms of... We've had customers ask us, "Well, how do we hire these ML engineers?" And it's like basically, you need a person who understands DevOps but also can talk to your data scientists or can [laughs], can figure out the work they're doing, help them get their work into a reproducible pipeline on the training side and help with deploying the model and integrating it into the rest of your, your application life cycle management tools. So yeah. Your engineer needs to be a DevOps person with some understanding of ML. Sam Charrington: [00:31:33] And is a DevOps person necessarily a software engineer that is coding a model? Jordan Edwards: [00:31:40] Not necessarily. They just need to be really good at operational excellence. So do they understand how to how to write things declaratively? How to set up process control flows so that things work nicely end-to-end? Like you don't need to understand the ML the data scientist is doing. You need to understand the process they're going through to produce that model. So they have a bunch of code and a Jupyter notebook, help them factor it into modules that you can stitch together. But you don't need to understand the machine learning framework that they're using specifically in that context. Sam Charrington: [00:32:15] You've mentioned Jupyter notebooks a few times one of the things that I see that folks are trying to figure out is like, should we do ML in notebooks or should we do ML in IDEs, Microsoft has a huge investment in IDEs. But you’ve also been in visual studio code, making it more kind of interactive, integrated kind of real time to incorporate some of the notebook-esque style of interaction. Jordan Edwards: [00:32:43] Right. So we want it to be fluid to go from one or the other. We've seen the value in the interactive canvases for doing rapid-fire experimentation. We've also talked to large companies like Netflix to learn how they use notebooks and automation at scale. Sam Charrington: [00:33:00] Your Papermill project for example? Jordan Edwards: [00:33:02] Exactly. So we've actually integrated Papermill into our platform as well. So if you're designing your training pipeline, you can stitch together a mix of scripts and notebooks and data processing steps together, and we try to be as fluid as we can. And we're working with the developer division as well to figure out how to more cleanly integrate notebooks into our ID experiences. And you saw some of that in the BS code side and there's more stuff coming to help with that. Sam Charrington: [00:33:31] We've talked a little bit about this automated retraining, aspect of managing model life cycles. Are there other aspects of managing model life cycles that you find important for folks to think about? Jordan Edwards: [00:33:44] Yeah knowing when to retrain the model is one thing. Knowing when to deprecate the model is another thing too. So say that the data the model is trained with is stale, or can't be used anymore, or got removed for GDPR reasons. This is why having the whole lineage graph is so important to be able to figure out exactly what data was used to train the model.  Other things around model life cycle management: know who is using it, know where the model is running. Know if the model is adding business value.  Know if the data coming into the model has changed a lot since you trained it. Know if the model is dealing with some type of seasonal data and needs to be retrained on a seasonal basis there. And then also,  know the resource requirements for your model. So another big thing we see a trip a lot of our customers up is they train the model on these big beefy VMs with massive GPUs and then you go to deploy and it's like, "Hey, my model's crashing. What do I do?" And so we've tried to build tooling in to help with that as well. So profiling your model, running sample queries into it.  different sizes of sample queries too. Not always the same thing and making sure you know, does your model have enough CPU and memory and the right size GPU to perform effectively. We're also doing some work on the Onyx Framework to help with taking those models and quantizing them or optimizing them for a specific business use case on the hardware side. Which is really, slowly coming in, especially we have customers in the manufacturing sector who want to run models quickly on the edge, on small hardware. So how do you manage that transition from this model I trained on this beefy machine to this model running on this tiny device? Sam Charrington: [00:35:33] Are you finding that most customers are deploying models or even thinking about an individual: I've got this model that I've created and I'm going to think about the way I deploy this model versus I've got a model, I built it to a standard, it's just like any other model. And then I'm going to just kind of throw it into my model deployment thing. Are they there yet? Jordan Edwards: [00:35:58] Some of them are there. The ones that have been doing this for a while longer and develop like their template for their model deployment flow. We try to provide as much tooling as we can in the platform and in the registry for you to track all the relevant things about it. But really just getting the model deployed into your existing app ecosystem, making sure that you have the ability to do,  controlled rollout and AB testing. 'Cause you don't want to just always pave over the previous model. So the most advanced customers are just getting to that point now where they're ready to start doing AB testing and looking for our help to go and do that. Yeah. So along the lines of testing, we've talked about this a little bit.  There's both the online testing of your models freshness, but then also all kinds of,  deployment scenarios that have been developed in the context of DevOps. Like Canary then red, green, blue kind of stuff. All the colors, right? So do you see all of that stuff out in the wild? Yes.  The main difference we've seen with models compared to normal software being rolled out is oftentimes they'll develop a model and test it offline and batch for awhile before using it. So they wouldn't need to necessarily deploy it to receive real traffic right away. They'll get the new model, they'll wait a week, run the model and batch against the past week's worth of data, and then compare how different it is to it. So it's just the fact that you can test the model offline as opposed to having to do everything in an online fashion. That's probably the biggest Delta. But otherwise we see all the same patterns as with normal software. Sam Charrington: [00:37:46] Because you're testing two things, right? You're testing the model's statistical ability to predict something, but then it's also software. And you don't necessarily want to put a broken piece of software out there. Jordan Edwards: [00:37:57] Right. Especially because it's a software with uncertain behavior or more, more uncertain behavior than any normal software application you'd throw out there. Sam Charrington: [00:38:06] What can we look forward to in this space from your perspective? Jordan Edwards: [00:38:10] So as far as things to look forward to,  there's lots of investments coming in, improving our story around enterprise readiness, making it easier for customers to do security, science and ML workloads. Work to help improve collaboration and sharing across the enterprise-- how do I figure out,  which other teams have been doing modeling work similar to mine? How do I take advantage of that? So, accelerating collaboration, velocity,  more work on the enterprise readiness front and then a tighter knit integration with the rest of the big data platform stuff. So integration with data lake, data catalog, data factory, devOps get up and it's all about helping customers get to production ML faster. Sam Charrington: [00:38:55] Well, Jordan, thanks so much for chatting with me. Jordan Edwards: [00:38:57] Thanks for having me. Yeah, appreciate it.
Continuing the live interviews from #TWIMLcon! Last night we sat down with a few of the awesome #TWIMLcon speakers, sponsors, and attendees to chat about what they are working on, their favorite TWIML podcast episode, the best #TWIMLcon session so far and more! Weiping Peng Software Architect at Salesforce, longtime TWIML podcast listener! Drew Bollinger & Mark Wronkiewicz working on Infrastructure and ML Modeling at Development Seed, using machine learning to analyze satellite images in the humanitarian and climate sphere (they are hiring!). Ameen Kazerouni Lead Data Scientist at Zappos - hear about his case study presentation yesterday at the conference! Vince Jeffs Senior Director, Product Strategy, Marketing AI & Decisioning at Pegasystems, former TWIML podcast guest (twimlai.com/talk/154)
Over the past couple weeks I got to sit on the other side of the (proverbial) interview table and take part in a few fantastic podcasts and video conversations about the state of machine learning in the enterprise. We also cover current trends in AI, and some of the exciting plans we have in store for TWIMLcon: AI Platforms. Each of these chats has its own unique flavor and I’m excited to share them with you. The New Stack Makers Podcast.I had a great chat with my friend, Alex Williams, founder of The New Stack, a popular tech blog focused on DevOps and modern software development. We focused on MLOps and the increasingly significant convergence of software engineering and data science. Minter Dialogue. I spoke with Minter Dial, host of the popular podcast, Minter Dialogue, and author of the book Heartificial Empathy: Putting Heart into Business and Artificial Intelligence. We had a wide ranging conversation in which we talked about the future of AI, AI ethics, and the state of AI in businesses. Datamation. In this video chat with James Maguire for Datamation, we discuss some of the key trends surrounding AI in the enterprise, and the steps businesses are taking to operationalize and productionalize machine learning. Hope you enjoy the talks! If you're not already registered for TWIMLcon we'd love to have you join us! Register now!
Another #TWIMLcon short with the wonderful Rosie Pongracz and Trisha Mahoney, from a Founding sponsor who you all know, IBM. Rosie is the  World Wide Director of Technical Go-to-Market and Evangelism for Data Science and Trisha is a Senior Tech Evangelist. We chat about the latest IBM research, projects, and products, including AI Fairness 360, which will be the focus of Tricia’s session at TWIMLcon. The IBM booth also promises to bring the heat, with a variety of open source projects and resources for the data science community. See you there! Sam Charrington: [00:00:00] All right everyone. I've got Rosie Pongracz and Trisha Mahoney from IBM on. Rosie is the Worldwide Director of Technical go-to-market and Evangelism for Data Science and AI and Trisha is a Senior Tech Evangelist in Machine Learning & AI and they are both instrumental in IBM's support for the TWIMLcon: AI Platforms conference. Rosie and Trisha, it's so exciting to be able to talk to you. Rosie Pongracz: [00:00:27] We are excited to be here, Sam! So happy to be a supporter of TWIMLcon and all the great work you do. Trisha Mahoney: [00:00:33] Thanks for having us, Sam. Sam Charrington: [00:00:34] Absolutely. Thank you. So, I don't know if it makes sense to say who is IBM? [laughs] You know in this context, I think most people who hear this know what IBM is but, you know, maybe you can talk a little bit about the company's involvement in the AI Platform space and, you know, why you're, you know what really kind of created the interest in supporting this, this conference. Rosie Pongracz: [00:01:00] Absolutely. So, yes, I would imagine most of the listeners already know IBM. We are long-standing, I'd say, evangelist, product producer, supporters of open source, anything for AI. And I'd say most of the current recognition goes back to Watson, of course, and the Jeopardy challenge.   But from that, IBM has evolved...what was that, almost ten years ago, to create some significant products. Not only have we made our way to the cloud should I say, and supports hybrid clouds for our clients and bringing them through the digital transformation, but we also have a good range of, of tools that help people not only do data science and machine learning but also scaled those, operationalized those, and bring them to production. I think if anything, IBM is known for its expertise in enterprise-scale and wide range of industry solutions. And that's really what we're doing. We're involved in open source. So quite a few open-source projects that are AI and data science and ML related, as well as products that can help our clients bring that AI to their business.  Sam Charrington: [00:02:16] Awesome. And I know that I've covered some of those products in our recent e-books in the platform space. Both the Fabric for Deep Learning open source project, which I talked about in our Kubernetes for ML and DL e-book, as well as the Watson Studio products which I believe came up in the ML Platforms e-book. Are there other products that IBM is kind of focused on in this space? Rosie Pongracz: [00:02:43] I think you captured the main ones. Especially the ones my team has been involved in. There's Watson Studio, Watson Machine Learning, Watson Open Scale. And if you look at Studio, it's more or less it's an IDE of sorts for data scientists, built on Jupiter Notebooks. ML, uh Watson ML is for running those machine learning algorithms. And then Watson Open Scale is for at scale.  And actually one of the big pieces of that pipeline, if you look at all those pieces along the pipeline or the platform, if you will is one of the areas that Trisha's going to be talking about which is the AI fairness and bias, which is a really important piece of the pipeline that we're proud to be incorporating.  I think you caught all the products. There's a significant amount of open-source that we're also involved in and, like I said, bringing those into our products and also supporting those communities like the Jupiter community, like the Linux Foundation, AI. Those are also very important projects and places where IBM has been involved as well.  Sam Charrington: [00:03:53] That's right. We recently did a podcast with Luciano Resende, who is at IBM and works on the Jupiter Enterprise Hub project, I believe is the name of it? Rosie Pongracz: [00:04:03] Yup. Jupiter Enterprise Gateway is correct. Yes. Sam Charrington: [00:04:05] Got it. Jupiter Enterprise Gateway. Rosie Pongracz: [00:04:07] Yeah. Sam Charrington: [00:04:08] So in addition to all of the products and open-source that you're working on in this space, you're also out there evangelizing the whole idea of ML Ops. You ran a workshop on this topic at the OzCon Conference recently. Maybe talk a little bit about your perspective on ML Ops and why that's so interesting to you. Rosie Pongracz: [00:04:29] Yeah. I think it goes back to where, where IBM can really make a difference is that we have, we'll have literally hundreds of years, decades, of experience in helping our enterprise clients do things at scale. And that is across industry. So if you look at all of the products that we have and you also look at something like cloud pak for data, which is bringing those containerized applications to any cloud, really, it is about giving our clients flexibility, helping them modernize. It's helping do things at scale.   Now a lot of our clients also have businesses that they're trying to transform so when you talk about ML Ops, certainly, you look at data science, I kind of look at that akin to a desktop where a developer works on. It's great to be able to develop those algorithms on your desktop and test that out on data sets, but when you really want to implement it, you're talking there's a whole kind of dev-ops cycle, if you will, applying that to AI and then machine learning.   And IBM has been there with its clients in the early days of Java. It's been there in the early days of cloud. And we're also taking that now into kind of the next realm if you will, the next era of bringing AI to businesses at scale. So how do you take your current applications and embed AI in those? Or how are you creating new ways to use your data and to modernize your business? And IBM you know, it's just near and dear to our client's heart. It's near and dear to who we are as a company in being able to do things at scale. And you have to have a platform. You have to have a way to operationalize that. It's great to run little science experiments to try things out and test things and fail fast, but when you start to operationalize, that's where the ML at scale, ML Ops, is really going to start to be important. Sam Charrington: [00:06:25] Mm-hmm [affirmative]. I was at the last IBM Think Conference, which is its big user conference and had an opportunity to hear Rob Thomas talk about, you know, one of the key things that he sees as being a determinant of enterprises finding success in machine learning and AI is the number of experiments that they're able to run and being able to scale that so that they can run those experiments en masse. Rosie Pongracz: [00:06:51] Yeah absolutely. That's an important piece of what IBM is helping enable our clients to do. And with our products that is definitely what we're striving for. You've got to be able to experiment. And then when you do want to operationalize, you got to be able to do that at scale.   Some of the clients we work with have some of the biggest applications running for their enterprise for their customers. And they depend on IBM to do that. So how do we bring that into, you know, this experimentation mode? Because you're absolutely right. Now it's not, you know, much more in...it's not about, you know, building one app and then releasing that. It's, as you know, the world is very much agile, you've got to fail fast. You've got to experiment. You've got to understand.   And with data science, that is absolutely sort of the MO. That's sort of the way you operate; is how do you, how do you know what works? And then if, when you...you know, you also have to retrain. So there's a lot of differences to building AI and building data science in a [inaudible] scale that is slightly different than just building applications if you will. Sam Charrington: [00:07:55] Mm-hmm [affirmative]. Mm-hmm [affirmative]. So, Trisha, you're going be speaking at the conference. Tell us a little bit about your topic and what attendees can expect when they come to your session. Trisha Mahoney: [00:08:06] Right. So, I'm going to be speaking on AI Fairness 360. And this is a comprehensive toolkit created by IBM researchers. And what we focus on is detecting and understanding and mitigating unwanted machine learning bias. So the toolkit is open source. It's in Python and it contains over 75 fairness metrics, ten bias mitigation algorithms, and fairness metrics with explanations. So one of the key components to this is that it has some of the most cutting edge metrics and algorithms across academia and industry today. So it's not just an IBM thing, it includes the algorithms from researchers from Google, Stanford, Cornell. That's just a few.   But what it really focuses on is teaching people how to learn to measure bias in their data sets and models. And how to apply fairness algorithms throughout the pipeline. So you know the big focus is on data science leaders, practitioners, and also legal and ethic stakeholders who would be a part of this.  So, just a few things that I'll go through in the talk is when you would apply pre-processing algorithms to manipulate your training data, in- processing algorithms, for incorporating fairness into your training algorithms itself, as well as post-processing, de-biasing algorithms. And, you know, one of the key things we wanted to get across is, I'm working on an O'Reilly book on AI fairness and bias with our researchers. So, you know, the key thing is that you know, this is a problem we think may prevent AI from reaching its full potential if we can't remove bias.   So, the thing we want to get across is that this is a long data science initiative. If you want to remove bias throughout your pipeline, so it involves a lot of stakeholders in your company, and that it can be very complex. So the way you define fairness and bias leads down into the types of metrics and algorithms you use. So, you know there are a lot of complexities. And the hope is that data science teams need to work with people throughout their org; they can't really make these decisions on their own, as they may actually break the law in some cases with their algorithms.   So, you know, I'll go into in the short period of time, kind of some of the trade-offs that data science teams have to make between model accuracy and removing bias, and talk about what they do for acceptable thresholds for each.   And the last thing on the ML Ops piece is I'll also do a demo in Watson Open Scale. And this is where you, you know, have models in production and you need to detect and remove bias from models, you know that are, aren't in an experimentation environment, right? So within Watson Open Scale, you can automatically detect fairness, issues at run time. And we essentially just do this by comparing the difference between rates at which different groups receive the same outcomes.  So are different minority groups, or men or women being approved for loans at the same time. So that's just an example. So that's kind of the top things that I'll go through on the toolkit and, I've heard many people say that others do bias talks on the problem that we have. But AI Fairness 360 is one of the few that's bringing a solution to the table on how to fix this within the machine learning pipeline.  Sam Charrington: [00:11:29] Yeah, I think that's one of the most exciting things about the talk from our perspective is that it's not just talking about the challenges that exist, but also how to integrate a concrete toolkit into your pipeline. And whether it's Fairness 360 or something else, but how to, integrate tools into your pipeline so that you can detect and mitigate bias, just very concretely as opposed to talking about it abstractly. Trisha Mahoney: [00:11:58] Correct. And I think the bridge that this creates is, you know, there are a lot of new fairness research techniques out there, but this toolkit sort of gets them into production and accessible in a way that data scientists can use. So, I think this is considered the most comprehensive toolkit to do that on the market today. Sam Charrington: [00:12:18] Mm-hmm [affirmative]. So Rosie in addition to Trisha's session, you'll also be exhibiting at the conference in our community hall. What can attendees expect to see at the IBM booth there? Rosie Pongracz: [00:12:30] Yeah, we're excited to be there too. So you'll see several things. We are going to be talking about the relevant open source projects like AI Fairness 360 that Trisha mentioned and also AI Explainability 360, which is another new toolkit. And we have, actually, a whole host of, projects that I won't go into here, but we can talk through those and see where IBM is contributed and working on open source projects like the Jupiter Enterprise Gateway that you mentioned as well.   They'll also see our, our products, and how those work together in helping operationalize and bring AI platforms to reality. And we'll also be talking about our data science community, which is a place that not only can product users go and share and collaborate, but also we have some great technical solution type content on there, with the goal of that being that IBM has a lot of deep rich solutions that we're building. As I mentioned earlier, industry-specific, or transformation type of projects and those are the types of materials that we're building there.  We've heard many people, both academic and industry, say it's great to talk about all this theoretical AI and what we'd really like to see is how are people putting that to work and solutions. So that's something that we're trying to bring to life on the community with many of [our] IBM experts all across any of our implementation folks, to our research folks. Sam Charrington: [00:14:01] Fantastic. Fantastic. Well, I'm really looking forward to seeing both of you at the event. And I am very gracious for your and IBM's support of the conference. Rosie Pongracz: [00:14:14] We are really excited to support what you're doing, Sam. I know you and I have worked together for many years through some technology transitions, so this is really appropriate and fun and fitting that we get to work together on something as exciting as what you're doing at TWIMLcon. Sam Charrington: [00:14:29] Absolutely. Thank you both. Rosie Pongracz: [00:14:31] Thank you. TWIMLcon: AI Platforms will be held on October 1st and 2nd at the Mission Bay Conference Center in San Francisco. Click here to learn more  
I had the chance to sit down with Scott Clark, Founder & CEO of SigOpt, a Founding sponsor of the upcoming TWIMLcon: AI Platforms! Scott discusses what SigOpt has been up to, the unique value he sees #TWIMLcon bringing to the ML/AI industry and what you can expect from the expert-driven SigOpt session and booth! Sam Charrington: [00:00:00] All right everyone, I am excited to have Scott Clark, founder and CEO of SigOpt. If you know Scott's name, it's because he's one of the few who has been on the TWIML AI podcast multiple times. Scott, welcome once again. Scott Clark: [00:00:13] Thanks, Sam. Always a pleasure to chat. Sam Charrington: [00:00:16] For those who haven't heard one of the previous episodes, why don't we get started by having you give us a really brief overview of your background. Scott Clark: [00:00:23] Yep. So I did my PhD in optimization in applied mathematics at Cornell. Spent a couple years at Yelp on their advertising team, helping them tune the various aspects of it, and working on releasing things like the Yelp academic dataset challenge. And then about five years ago, started SigOpt. Sam Charrington: [00:00:42] And so what is SigOpt? Scott Clark: [00:00:45] We're a software company. We build tools to help people build better models. We do this via an experimentation and optimization platform that bolts on top of any model or AI platform, allowing people to tune and tweak all the various configuration parameters of their models better, faster, and cheaper than alternative methods. We do that today with asset managers with over $300 billion of combined asset center management, Fortune 500 firms with $500 billion of total market cap, several dozen universities and research institutes around the world, as well as the US intelligence community, and many many more. Basically, anyone who has a model, we help them configure it and experiment with it to get it to the best performance. Sam Charrington: [00:01:29] So, Scott, SigOpt, and you personally as well, have been huge supporters of everything that we've done here at the podcast in terms of AI platforms, from the e-book that we recently published, The Definitive Guide to Machine Learning Platforms, to the upcoming conference, TWIMLcon: AI Platforms. Tell us a little bit about why you're so excited about the conference, TWIMLcon: AI Platforms, and the space in general around machine learning platforms. Scott Clark: [00:02:03] Definitely. We're super excited about this because we have the privilege of working with some of the most advanced firms in the world when it comes to AI and ML, and we've noticed that a lot of them have started to build these platforms over the last few years. As you start to productionalize AI, as you start to solve some of the low hanging fruit problems, you start to notice areas of overlap. Areas where engineers can build tools to make the entire modeling process a little bit more efficient, a little bit more scalable, a little bit more robust, et cetera. So a lot of our customers have been building these over the last few years, and SigOpt is kind of a seamless component, via our rest API, to bolt into them and help supercharge that experimentation and configuration tuning aspect of modeling.  So we're very excited to have someone be shining a light on the problem, and helping those firms that may not have been doing modeling and production for the last decade kind of get a leg up and skip over some of the trials and tribulations that those that went before them have already solved. Sam Charrington: [00:03:04] That's awesome. And so you're personally going to be delivering a session at the conference. What can attendees expect to get out of your session? Scott Clark: [00:03:15] Yeah. I'll be building upon some of the stuff in your ebook, talking about how people can make different trade-offs as they look to standardize various components of their machine learning platforms. How they think about what things need to be bespoke and built for their very specific use cases, and other things that can be standardized. Whether that's using open source tools or partnering with firms like SigOpt. I'll talk about the different trade-offs there, and how you can have standardization without necessarily constraining what your researchers and modelers are doing as well. Sam Charrington: [00:03:51] Yeah, I'm really glad that you're going to be talking about that because I think one of the things that I try to convey in the e-book is that there's really no one size fits all. Everyone's coming from a different technology legacy, solving a different set of problems, have a different set of skill sets, and it is important that everyone that starts off on this journey or, you know, is just taking the next step on their journey to figure out, you know, what are the things that make the most sense for them, and how to evaluate the increasing number of options that are available in this space, from open source to commercial products, to as-a-service offerings.   It sounds like you get a pretty broad view of that across the different customers that you get a chance to work with.  Scott Clark: [00:04:38] Definitely, and we see that every customer has a unique, domain they're working in, a unique set of problems, a unique context that they need to be aware of, and what might work well for market making high frequency trading firm is fundamentally different than an oil and gas company, which is fundamentally different than a credit card company. And by making sure that you leverage your expertise where- ... it can make a difference, and then use the best tools in the world where it's an orthogonal approach can really allow you to accelerate and amplify what you can do individually with constrained resources. Sam Charrington: [00:05:15] And so your session is called Exploring Trade-Offs and Experiment Management as Part of AI Platforms, and it'll be on Tuesday at 10:50 on our technology track. SigOpt is also going to have a presence in our community hall. Can you tell us a little bit about what folks can expect to find when they show up to your booth? Scott Clark: [00:05:37] Definitely. We'll have a handful of experts from our team on site walking through all the different lessons we've learned from working with these leading firms for many years now. We'll talk about different trade-offs they found, different pitfalls they found, and how they leverage experimentation to really empower their experts to get the most out of their modeling. Sam Charrington: [00:05:57] Awesome. Awesome. Well, I'm really looking forward to seeing you and the team at TWIMLcon, and I really can't express enough my thanks to you and the company for supporting the conference. Really, really excited about this. Scott Clark: [00:06:11] Likewise, Sam. It's always a pleasure to chat, and we're really looking forward to seeing you and this amazing conference that you put together. Sam Charrington: [00:06:17] Awesome. Thanks, Scott. Scott Clark: [00:06:19] Cheers.  TWIMLcon: AI Platforms will be held on October 1st and 2nd at the Mission Bay Conference Center in San Francisco. Click here to learn more  
Welcome to #TWIMLcon Shorts - a series where I sit down with some of our awesome Founding Sponsors and talk about their ML/AI journey, current work in the field and what we can expect from them at TWIMLcon: AI Platforms! First up is Luke Marsden, Founder & CEO of Dotscience. Based in Bristol, UK, Luke joins me to share the Dotscience story and why he is most excited for #TWIMLcon next month! From a stellar breakout session featuring the Dotscience manifesto to live demos at their booth, we can’t wait! Sam Charrington: [00:00:00] All right everyone, I am on the line with Luke Marsden. Luke is the founder and CEO of Dotscience a founding sponsor for TWIMLcon: AI Platforms. So Luke, we go back a little bit from your involvement in the docker space. I remember introducing you at a session at Dockercon quite a few years back, but for those who aren't familiar with your background, who are you? Luke Marsden: [00:00:51] So hey Sam, and thanks for having me on. My name is Luke Marsden, I'm the founder and CEO of Dotscience and I come from a devops background. My last startup was called Cluster HQ, and we were solving the problem of running stateful containers in docker. And so I'm a sort of serial entrepreneur based out of the UK. I live in the beautiful city of Bristol in the southwest and very excited to be involved with TWIML. Sam Charrington: [00:01:28] Awesome. So tell us a little bit about Dotscience and what the company is up to in the AI platform space.  Luke Marsden: [00:01:36] Yeah, sure. So we started Dotscience a couple of years ago. Initially, we were targeting the area of data versioning and devops but we quickly realized that the tool that we built which is an open source project called dotmesh was actually much more relevant and important to the world of AI and machine learning which has a big data versioning and reproducibility problems. So we pivoted to that about a year in, and we've been building an AI platform around that core concept of data versioning.  Sam Charrington: [00:02:13] So tell me a little bit more about that. How are you taking on data versioning? And why is that an important element of the puzzle for folks that are doing AI? Luke Marsden: [00:02:25] Absolutely. So there's really sort of four main pieces of the puzzle that I believe need to be solved to achieve devops for AI, devops for machine learning, and number one is reproducibility - and that's where the data versioning piece comes in. So what we've seen is that there's a lot of chaos and pain that happens when AI or ML teams start trying to operationalize the models that they're developing. And one of the big pain points is if you can't actually get back to the exact version of the data that you use to train your model, then you can't go back and solve problems with it. You can't fix bugs in the model or or really reliably understand sort of exactly where that model came from. So that's kind of that fundamental problem of like which version of the data that I trained is this model on and that's what we solve with with Dotscience. Every time you train a model in Dotscience, you are automatically versioning all of the dependent data sets that that model training happens on. And by using copy-on-write technology, which is a file system technology and in dotmesh, which is part of the Dotscience platform, it does that very efficiently using no more disk space than is required to achieve reproducibility. Sam Charrington: [00:03:52] Awesome. So tell me why are you excited about TWIMLcon: AI Platforms? Luke Marsden: [00:03:59] TWIMLcon looks to be an awesome event. We were actually planning on hosting our own event around the same time in San Francisco to promote Dotscience, but TWIML was such a good fit for what we're trying to do, and the themes and the topics that are being discussed in the space, that we decided to join forces with you guys and become a Founding sponsor rather than running our own things. So yeah, really, really excited and looking forward to it.  Sam Charrington: [00:04:34] That's fantastic and we are super appreciative to have you on board as a Founding sponsor, it is great to have your support in that way. When folks come to your breakout session at TWIMLcon, tell us a little bit about what you'll be covering there, who will be presenting, what can attendees expect to learn from the breakout session. Luke Marsden: [00:04:57] Yes, so the session will be run by my colleague Nick who's our principal data scientist, and the basic premise of the talk really touches on some of the things I mentioned earlier. There's a lot of chaos and pain trying to operationalize AI and that we have this manifesto of things that we believe are needed to go from, sort of the "no-process" process that is the default. So when you start an AI or machine learning project and you have maybe a small number of data scientists or machine learning engineers doing that work, they'll invent a process, right? Any technical group that's doing technical work will make up a process as they go based on the tools that they're familiar with and they'll do their best. But the point of the talk is that the "no-process process," it gets your first model into production when your team is small, but that's really where the problems begin and (Laughter) you end up with this sort of this kind of mess of models and data sets and deployments and hyperparameters and metrics and all these different things flying around, because machine learning is fundamentally more complicated than software development software engineering. And so, by just sort of doing things in an ad-hoc way, you get yourself into this sort of mess quite quickly, and this is something we've seen across hundreds of companies that we've spoken to in the industry. And so basically what we're proposing is a Manifesto, that you should make your machine learning process, the whole process of building, training, deploying, monitoring machine learning models that you should make that that whole process reproducible, accountable, collaborative, and continuous.  And so what I mean by reproducible is that somebody else should be able to come and reproduce the model that I trained now, like 9 or 12 months later without me still needing to be there, without me needing to have kept meticulous manual documentation. Somebody else should be able to go and rerun that model training against the same version of the data with the same version of Tensorflow with the same code, with the same hyperparameters, and get the same accuracy score to within a few percent. If your development environment isn't reproducible, then you won't be able to do that, but we believe that that is key to achieving devops for ML.  So anyway, that's kind of a snapshot of some of the things we'll be talking about in the session. So yeah, please please come along.   Sam Charrington: [00:08:00] Awesome. You'll also be present in TWIMLcon's Community Hall, what can attendees expect to see at the company's booth? Will they be able to get hands on?  Luke Marsden: [00:08:15] Absolutely, so we'll have live demos at the booth. You can see the full end-to-end platform and our Engineers as I speak in the early part of September today, are busily working on the latest features that we're going to have ready in time for the conference in true startup conference driven development mode. (Laughter) So, we will have the deploy to production and statistical monitoring pieces ready in time for the conference. So, it's probably going to be the first time that you can come and see those pieces of the product and and get hands-on with the product will be at TWIML, so please come and check it out. Sam Charrington: [00:09:00] Fantastic.  Luke, thanks so much for chatting with me about what you're up to and what you'll be showing at the event, we are super excited to have you on board with us for TWIMLcon: AI Platforms.   Luke Marsden: [00:09:10] Awesome. Thank you Sam. TWIMLcon: AI Platforms will be held on October 1st and 2nd at the Mission Bay Conference Center in San Francisco. Click here to learn more
Today we're joined by Jeff Gehlhaar, VP of Technology and Head of AI Software Platforms at Qualcomm. As we've explored in our conversations with both Gary Brotman and Max Welling, Qualcomm has a hand in tons of machine learning research and hardware, and our conversation with Jeff is no different. We discuss how the various training frameworks fit into the developer experience when working with their chipsets, examples of federated learning in the wild, the role inference will play in data center devices and more.
Sam Charrington: Today we're excited to present the final episode in our AI for the Benefit of Society series, in which we're joined by Mira Lane, Partner Director for Ethics and Society at Microsoft. Mira and I focus our conversation on the role of culture and human-centered design in AI. We discuss how Mira defines human-centered design, its connections to culture and responsible innovation, and how these ideas can be scalability implemented across large engineering organization. Before diving in I'd like to thank Microsoft once again for their sponsorship of this series. Microsoft is committed to ensuring the responsible development and use of AI and is empowering people around the world with this intelligent technology to help solve previously intractable societal challenges spanning, sustainability, accessibility, and humanitarian action. Learn more about their plan at Microsoft.ai. Enjoy. Mira Lane: [00:00:09] Thank you, Sam. Nice to meet you. Sam Charrington: [00:00:11] Great to meet you and I'm excited to dive into this conversation with you. I saw that you are a video artist and technologist by background. How did you come to, you're looking away, is that correct? Mira Lane: [00:00:28] No, that's absolutely true. Sam Charrington: [00:00:30] Okay. So I noted that you're a video artist. How did you come to work at the intersection of ethics and society and AI? Mira Lane: [00:00:42] For sure. So let me, Sam, let me give you a little bit of a background on how I got to this point. I actually have a mathematics and computer science background from the University of Waterloo in Canada. So I've had an interesting journey, but I've been a developer, program manager, and designer, and when I think about video art and artificial intelligence, I'll touch artificial intelligence first and then the video art, but a few years ago I had the opportunity to take a sabbatical and I do this every few years. I take a little break, reflect on what I'm doing, retool myself as well. So I decided to spend three months just doing art. A lot of people take a sabbatical and they travel but I thought I'm just gonna do art for three months and it was luxurious and very special. But then I also thought I'm going to reflect on career at the same time and I was looking at what was happening in the technology space and feeling really unsettled about where technology was going, how people were talking about it, the way I was seeing it affect our societies and I thought I want to get deeper into the AI space. So when I came back to Microsoft, I started poking around the company and said is there a role in artificial intelligence somewhere in the company? And something opened up for me in our AI and Research group where they were looking for a design manager. So I said absolutely. I'll run one of these groups for you, but before I take the role, I'm demanding that we have an ethics component to this work because what they were doing is they were taking research that was in the AI space and figuring out how do we productize this? Because at that point, research was getting so close to engineering that we were developing new techniques and you were actually able to take those to market fairly quickly and I thought this is a point where we can start thinking about responsible innovation and let's make that a formalized practice. So me taking the role for the design manager was contingent on us creating a spot for ethics at the same time and so backing up a little bit, the video part comes in because I have traditionally been a really analog artist. I've been a printmaker, a painter, and during my sabbatical, I sought some more digitized, looked at digitizing some of the techniques that I was playing with on the analog side. I thought let me go play in the video space for a while. So for three months, like I said, I retooled and I started playing around with different ways of recording, editing, and teaching myself some of these techniques and one of the goals I set out at the time was well, can I get into a festival? Can I get into a music or video festival? So that was one of my goals at the end of the three months. Can I produce something interesting enough to get admitted into a festival? And I won a few, actually. Sam Charrington: [00:03:46] That's fantastic. Mira Lane: [00:03:46] So I was super pleased. I'm like okay, well that means I've got something there I need to continue practicing. But that for me opened up a whole new door and one of the things that I did a few years ago also was to explore art with AI, and could we create a little AI system that could mimic my artwork and become a little co-collaborator with myself? So we can dig into that if you want, but it was a really interesting journey around can AI actually compliment an artist or even replace an artist? So there's interesting learnings that came out of that experience. Sam Charrington: [00:04:25] Okay. Interesting, interesting. We're accumulating a nice list of things to touch on here. Mira Lane: [00:04:30] Yeah, absolutely. Sam Charrington: [00:04:31] Ethics and your views on that was at the top of my list, but before we got started, you mentioned work that you've been doing exploring culture and the intersection between culture and AI and I'm curious what that means for you. It's certainly a topic that I hear brought up quite a bit. Particularly when I'm talking to folks in enterprises that are trying to adopt AI technologies and you hear all the time well one of the biggest things we struggle with is culture. So maybe, I don't know if that's the right place to start, but maybe we'll start there. What does that mean for you when you think about culture in AI? Mira Lane: [00:05:12] Yeah, no, that's a really good question, and I agree that one of the biggest things is culture and the reason why I say that is if you look at every computer scientist that's graduating, none of us have taken an ethics class and you look at the impact of our work, it is touching the fabric of our society. Like it's touching our democracies and our freedoms, our civil liberties, and those are powerful tools that we're building, yet none of us have gone through a formal ethics course and so the discipline is not used to talking about this. It's a few years ago you're like I'm just building a tool. I'm building an app. I'm building a platform that people are using, and we weren't super introspective about that. It wasn't part of the culture, and so when I think about culture in the AI space, because we're building technologies that have scale and power, and are building on top of large amounts of data that empower people to do pretty impressive things, this whole question of culture and asking ourselves, well what could go wrong? How could this be used? Who is going to use it directly or indirectly? And those are parts of the culture of technology that I don't think has been formalized. You usually hear designers talking about that kind of thing. It's part of human-centered design. But even in the human-centered design space, it's really about what is my ideal user or my ideal customer and not thinking about how could we exploit this technology in a way that we hadn't really intended? We've talked about that from an engineering context the way we do threat modeling. How could a system be attacked? How do you think about denial of service attacks? Things like that. But we don't talk about it from a how could you use this to harm communities? How could you use this to harm individuals or how could this be inadvertently harmful? So those parts of cultures are things that we're grappling with right now and we're introducing into our engineering context. So my group sits at an engineering level and we're trying to introduce this new framework around responsible innovation and there's five big components to that. One is being able to anticipate, look ahead, anticipate different futures, look around corners and try to see where the technology might go. How someone could take it, insert it into larger systems, how you could do things at scale that are powerful that you may not intend to do. There's a whole component around this responsible innovation that is around reflection and looking at yourselves and saying where do we have biases? Where are we assuming things? What are our motivations? Can we have an honest conversation about our motivations? Why are we doing this and can we ask those questions? How do we create the space for that? We've been talking about diversity and inclusion like how do you bring diverse voices into the space, especially people that would really object to what you're doing and how do you celebrate that versus tolerate that? There's a big component around our principles and values and how do you create with intention and how do you ensure that they align with the principles and they align with their values and they're still trustworthy? So there's a whole framework around how we're thinking about innovation in the space and at the end of the day it comes down to the culture of the organization that you're building because if you can't operate at scale, then you end up only having small pockets of us that are talking about this versus how do we get every engineer to ask what's this going to be used for? And who's going to use it? Or what if this could happen? And we need people to start asking those types of questions and then start talking about how do we architect things in a way that's responsible. But I'd say most engineers probably don't ask those types of questions right now. So we're trying to build that into the culture of how we design and develop new technologies. Sam Charrington: [00:09:14] Mm-hmm (affirmative). One of the things that I often find frustrating about this conversation particularly when talking to technology vendors is this kind of default answer while we just make the guns, we don't shoot them. We just make the technologies. They can be used for good. They can also be used for bad, but we're focused on the good aspects. It sounds like, well, I'm curious, how do you articulate your responsibility with the tools that you're creating? Or Microsoft's responsibility with the tools it's creating. Do you have a- Mira Lane: [00:09:55] Well I have a very similar reaction to you when I hear oh, we're just making tools. I think, well, fine. That's one perspective, but the responsible perspective is we're making tools and we understand that they can be used in these ways and we've architected them so that they cannot be misused and we know that there will be people that misuse them. So I think you're hearing a lot of this in the technology space and every year there's more and more of it where people are saying look, we have to be responsible. We have to be accountable. So I think we'll hear fewer and fewer people saying what you're hearing, what I'm hearing as well. But one of the things we have to do is we have to avoid the ideal path and just talking only about the ideal path. Because it's really easy to just say here's the great ways that this technology is going to be used and not even talk about the other side because then, again, we fall into that pattern of well, we only thought about it from this one perspective, and so one of the things that my group is trying to do is to make it okay to talk about here's how it could go wrong so that it becomes part of our daily habit and we do it at various levels. We do it at our all hands, so when people are showing our technology, we have them show the dark side of it at the same time so that we can talk about that in an open space and it becomes okay to talk about it. No one wants to share the bad side of technology. No one wants to do that. But if we make it okay to talk about it, then we can start talking about well, how do we prevent that? So we do that at larger forums and I know this is a podcast, but I wanted to show you something. So I'll talk about it, but we created, it's almost like a game, but it's a way for us to look at different stakeholders and perspectives of what could happen. So how do we create a safe environment where you can look at one of our ethical principles. You can look at a stakeholder that is interacting with the system and then you say well if the stakeholder for example is a woman in a car and your system is a voice recognition system, what would she say if she gave it a one star review? She would probably say I had to yell a lot and it didn't recognize me because we know that most of our systems are not tuned to be diverse, right? So we start creating this environment for us to talk about these types of things so that it becomes okay again. How do we create safe spaces? Then as we develop our scenarios, how do we bring those up and track them and say, well how do we fix it now that we've excavated these issues? Well, let's fix it and let's talk about it. So that's, again, part of culture. How do we make it okay to bring up the bad parts of things, right? So it's not just the ideal path. Sam Charrington: [00:12:46] Mm-hmm (affirmative). Do you run into, or run up against engineers or executives that say, introspection, safe spaces, granola? What about the bottom line? What does this mean for us as a business? How do we think about this from a shareholder perspective? Mira Lane: [00:13:09] It's interesting, I don't actually hear a lot of that pushback because I think internally at Microsoft, there is this recognition of hey, we want to be really thoughtful and intentional and I think the bigger issue that we hear is how do we do it? It's not that we don't want to. It's well, how do we do it and how do we do it at scale? So what are the different things you can put in place to help people bring this into their practice? And so there isn't a pushback around well, this is going to affect my bottom line, but there's more of an understanding that yeah, if we build things that are thoughtfully designed and intentional and ethical that it's better for our customers. Our customers want that too, but then again the question is how do we do it and where is it manifest? So there's things that we're doing in that space. When you look at AI, a big part of it is data. So how do you look at the data that's being used to power some of these systems and say is this a diverse data set? Is this well rounded? Do we have gaps here? What's the bias in here? So we start looking at certain components of our systems and helping to architect it in a way that's better. I think all of our customers would want a system that recognized all voices, right? Because again, to them, they wouldn't want a system that just worked for men, it didn't work for women. So again, it's a better product as a result. So if we can couch it in terms of better product, then I think it makes sense versus if it's all about us philosophizing and only doing that, I don't know if that's the best. Only doing that is not productive, right? Sam Charrington: [00:14:59] Do you find that the uncertainty around ethical issues related to AI has been an impediment to customers adopting it? Does that get in the way? Do they need these issues to be figured out before they dive in? Mira Lane: [00:15:22] I don't think it's getting in the way, but I think what I'm hearing from customers is help us think about these issues and a lot of people, a lot of customers don't understand AI deeply, right? It's a complex space and a lot of people are ramping up in it. So the question is more about what should I be aware of? What are the questions that I should be asking and how can we do this together? We know you guys are thinking about this deeply. We're getting just involved in it, a customer might say, and so it's more about how do we educate each other? And for us if we want to understand, how do you want to use this? Because sometimes we don't always know the use case for the customer so we want to deeply understand that to make sure that what we're building actually works for what they are trying to do, and from their perspective they want to understand well how does this technology work and where will it fail and where will it not work for my customers? So the question of ethics is more about we don't understand the space well enough, help us understand it and we are concerned about what it could do and can we work together on that? So it's not preventing them from adopting it, but there's definitely a lot of dialog. It comes up quite a bit around we've heard this. We've heard bias is an issue. Well, what does that mean? Sam Charrington: [00:16:47] Right. Mira Lane: [00:16:47] So I think that's an education opportunity. Sam Charrington: [00:16:49] When you think about ethics from a technology innovation perspective, are there examples of things that you've seen either that Microsoft is doing or out in the broader world that strike you as innovative approaches to this problem? Mira Lane: [00:17:12] Yeah, I'll go back to the data side of things just briefly, but there's this concept called data sheets, which I think is super interesting. You're probably really familiar with that and- Sam Charrington: [00:17:25] I've written about some of the work that Timnit Gebru and some others with Microsoft have done around data sheets for data sets. Mira Lane: [00:17:31] Exactly, and the interesting part for us is how do you put it into the platform? How do you bake that in? So one of the pieces of work that we're doing is we're taking this notion of data sheets and we are applying it into how we are collecting data and how we're building out our platform. So I think that that's, I don't know if it's super novel because to me it's like a nutrition label for your data. You won't understand how is it collected? What's in it? How can you use it? But I think that that's one where now as people leave the group you want to make sure that there's some history and understanding the composition of it. There's some regulation around how we manage it internally and how we manage data in a thoughtful way. I think that's just a really interesting concept that we should be talking about more as an industry and then can we share data between each other in a way that's responsible as well? Sam Charrington: [00:18:24] Right. I don't know that the data sheet, I think inherent to the idea was that hey, this isn't novel. In fact, look at electrical components and all these other industries that do this. It's just "common sense". But what is a little novel, I think, is actually doing it. So since that paper was published, several companies have published similar takes, model cards, and there have been a handful and every time I hear about them I ask okay, when is this? When are you going to be publishing these for your services and the data sets that you're publishing? And no one's done it yet. So it's intriguing to hear you say that you're at least starting to think in this way internally. Do you have a sense for what the path to publishing these kinds of, whether it's data sheet or a card or some kind of set of parameters around bias either in a data set or a model for a commercial public service? Mira Lane: [00:19:41] Yeah, absolutely. We're actually looking at doing this for facial recognition and we've publicly commented about that,  we've said,  hey we're going to be sharing for our services what it's great for, what it's not, and so that stuff is actually actively being worked on right now. You'll probably see more of this in the next few weeks, but there is public comment that's going to come out with more details about it and I'll say that on the data sheet side, I think a large portion of it is it needs to get implemented in the engineering systems first and you need to find the right place to put it. So that's the stuff that we're working on actively right now. Sam Charrington: [00:20:25] Can you comment more on that? It does, as you say that, it does strike me a little bit as one of these iceberg kind of problems. It looks very manageable kind above the waterline but if you think about what goes into the creation of a data set or a model, there's a lot of complexity and certainly the scale that Microsoft is working it needs to be automated. What are some of the challenges that have come into play in trying to implement an idea like that? Mira Lane: [00:21:01] Well, let me think about this for a second so I can frame it the right way. The biggest challenge for us on something like that is really thinking through the data collection effort first and spending a little bit of time there. That's where we're actually spending quite a bit of time as we look at, so let me back up for a second. I work in an engineering group that touches all the speech, language, and vision technologies and we do an enormous amount of data collection to power those technologies. One of the things that we're first spending time on is looking at exactly how we're collecting data and going through those methodologies and saying is this the right way that we should be doing this? Do we want to change it in any way? Do we want to optimize it? Then we want to go and apply that back in. So you're right, this is a big iceberg because there's so many pieces connected to it and the spec for data sheets and the ones we've seen are large and so what we've done is how do we grab the core pieces of this and implement and create the starting point for it? And then scale over time add versioning, being able to add your own custom scheme list to it and scale over time, but what is the minimum piece that we can put into this system and then make sure that it's working the way we want it to? So it's about decomposing the problem and saying which ones do we want to prioritize first? For us, we're spending a lot of time just looking at the data collection methodologies first because there's so much of that going on and at the same time, what is the minimum part of the data sheet spec that we want to go and put in and then lets start iterating together on that. Sam Charrington: [00:22:41] It strikes me that these will be most useful when there's kind of broad industry adoption or at least coalescence around some standard whether it's a standard minimum that everyone's doing and potentially growing over time. Are you involved in or aware of any efforts to create something like that? Mira Lane: [00:23:02] Well I think that that's one piece where it's important. I would say also in a large corporation, it's important internally as well because we work with so many different teams and we're interfacing with, we're a platform but we interface with large parts of our organization and being able to share that information internally, that is a really important piece to the puzzle as well. I think the external part as well, but the internal one is not any less important in my eyes because that's where we are. We want to make sure that if we have a set of data, that this group A is using it in one way. If group B wants to use it, we want to make sure they have the rights to use it. They understand what it's composed of, where it's orientation is and so that if they pick it up, they do it with full knowledge of what's in it. So for us internally it's a really big deal. Externally, I've heard pockets of this but I don't think I can really comment on that yet with full authority. Sam Charrington: [00:24:03] I'm really curious about the intersection between ethics and design and you mentioned human-centered design earlier. My sense is that that phrase kind of captures a lot of that intersection. Can you elaborate on what that means for you? Mira Lane: [00:24:20] Yeah, yeah. So when you look at traditional design functions, when we talk about human-centered design, there's lots of different human-centered design frameworks. The one I typically pick up is Don Norman's emotional design framework where he talks about behavioral design, reflective design, and visceral design. And so behavior is how is something functioning? What is the functionality of it? Reflective is how does it make you feel about yourself? How does it play to your ego and your personality? And visceral is the look and feel of that. That's a very individual oriented approach to design and when I think about these large systems, you actually need to bring in the ecosystem into that. So how does this object you're creating or this system you're creating, how does it fit into the ecosystem? So one of the things we've been playing around with is we've actually reached into adjacent areas like agriculture and explore how do you do sustainable agriculture? What are some of those principles and methodologies and how do you apply that into our space? So a lot of the conversations we're having is around ecosystems and how do you insert something into the ecosystem and what happens to it? What is the ripple effect of that? And then how do you do that in a way that keeps that whole thing sustainable? It's a good solution versus one that's bad and causes other downstream effects. So I think that those are changes that we have to have in our design methodology. We're looking away from the one artifact and thinking about it from a here's how the one user's going to work with it versus how is the society going to interact with it? How are different communities going to interact with it and what does it do to that community? It's a larger problem and so there's this shift in design thinking that we're trying to do with our designers. So they're not just doing UI. They're not just thinking about this one system. They're thinking about it holistically. And there isn't a framework that we can easily pick up, so we have to kind of construct one as we're going along. Sam Charrington: [00:26:28] Yeah, for a while a couple of years ago maybe I was in search of that framework and I think the motivation was just really early experiences of seeing AI shoved into products in ways that were frustrating or annoying. For example, a Nest thermostat. It's intended to be very simple, but it's making these decisions for you in a way that you can't really control and it's starting me down this path of what does it mean to really, build out a discipline of design that is aware of AI and intelligence? I've joked on the podcast before, I call it intelligent design, but that's an overloaded term. Mira Lane: [00:27:23] Totally is. Sam Charrington: [00:27:24] But is there a term for that now or people thinking about that? How far have we come in building out a discipline or a way of thinking of what it means to build intelligence into products? Mira Lane: [00:27:37] Yeah, we have done a lot of work around education for our designers because we found a big gap between what our engineers were doing and talking about and what our designers had awareness over. So we actually created a deep learning for designers workshop. It was a two day workshop and it was really intensive. So we took neural nets, convolutions, all these concepts and introduced them to designers in a way that designers would understand it. We brought it to here's how you think about it in terms of photoshop. Here's how you think about it in terms of the tools you're using and the words you use there, here’s  how it applies. Here's an exercise where people had to get out of their seats and create this really simple neural net with human beings and then we had coding as well. So they were coding in Python and in notebooks, so they were exposed to it and we exposed them to a lot of the techniques and terminology in a way that was concrete and they were able to then say oh, this is what style transfer looks like. Oh, this is how we constructed a bot. So first on the design side, I think having the vocabulary to be able to say oh, I know what this word means. Not just I know what it means, but I've experienced it, so that I can have a meaningful discussion with my engineer, I think that that was an important piece, and then understanding how AI systems are just different from regular systems. They are more probabilistic in nature. The defaults mattered. They can be self learning and so how do we think about these and starting to showcase case studies with our designers to understand that these types of systems are quite different from the deterministic type of systems that may have designed for in the past. Again, I think it comes back to culture because, and we keep doing these workshops. Every quarter we'll do another one because we have so much demand for it and we found even engineers and PMs will come to our design workshops. But kind of democratizing the terminology a little bit and making it concrete to people is an important part of this. Sam Charrington: [00:29:48] It's interesting to think about what it does to a designer's design process to have more intimate knowledge of these concepts. At the same time a lot of the questions that come to mind for me are much higher level concepts in the domain of design. For example, we talk about user experience. To what degree should a user experience AI if that makes any sense? Should we be trying to make AI or this notion of intelligence invisible to users or very visible to users? This has come up recently in, for example, I'm thinking of Google Duplex when they announced that that system was gonna be making phone calls to people and there was a big kerfuffle about whether that should be disclosed. Mira Lane: [00:30:43] Yeah. Sam Charrington: [00:30:43] I don't know that there's a right answer. In some ways you want some of this stuff to be invisible. In other ways, tying back to the whole ethics conversation, it does make sense that there's some degree of disclosure. Mira Lane: [00:30:57] Yeah, absolutely. Sam Charrington: [00:30:58] I imagine as a designer, this notion of disclosure can be a very nuanced thing. What does that even mean? Mira Lane: [00:31:03] Yeah, yeah. And it's all context dependent and it's all norm dependent as well because if you were to look into the future and say are people more comfortable, I mean look at airports for example. People are walking through just using face ID, using the clear system and a few years ago, I think if you ask people would you feel comfortable doing that? Most people would say no, I don't feel comfortable doing that. I don't want that. So I think in this space because it's really fluid and new norms are being established and things are being tested out, we have to be on top of how people are feeling and thinking about these technologies so that we understand where some disclosure needs to happen and where things don't. In a lot of cases you almost want to assume disclosure for things that are very consequential and high stakes. Where there is opportunity for deception. In the Duplex case you have to be thoughtful about that. So this isn't one where you can say okay, you should always disclose. It just depends on the context. So we have this notion of consequential scenarios where things are if there's automated decision making, if there are scenarios where there is, there are high stakes scenarios. Those ones we think about in we just put a little bit more due diligence over those and start to be more thoughtful about those. Then we have other types of scenarios which are more systems-oriented and here's some things that are operationally oriented and they end up having different types of scenarios, but we haven't been able to create a here's the exact way you do every single, you approach it in every single way. So it is super context dependent and expectation dependent. Maybe after a while you get used to your Nest thermostat and you're fine with the way it's operating, right? So I don't know. These social norms are interesting because they are, someone will go and establish something or they'll test the waters. Google Glass tested the waters and that was a cultural response, right? People responded and said I don't want to be surveilled. I want to be able to go to a bar and get a drink and not have someone recording me. Sam Charrington: [00:33:21] Right. Mira Lane: [00:33:22] So I think we have to understand where society is relative to what the technologies are that we're inserting into them. So again, it comes back to are we listening to users? Are we just putting tech out there? I think we really have to start listening to users. My group has a fairly large research component to it and we spend a lot of time talking to people. Especially in the places where we're going to be putting some tech and understanding what it's going to do to the dynamic and how they're reacting to it. Sam Charrington: [00:33:52] Mm-hmm (affirmative). Mm-hmm (affirmative). Yeah, it strikes me that maybe it's kind of the engineer background in me that's looking for a framework, a flowchart for how we can approach this problem and I need to embrace more of the design or it's like every product, every situation is different and it's more about a principled approach as opposed to a process. Mira Lane: [00:34:18] Absolutely. It's more about a principled and intentional approach. So what we're just talking about is everything that you're choosing, are you intentional about that choice and are you very thoughtful about things like defaults? Because we know that people don't change them and so how do you think about every single design choice and being principled and then very intentional and evidence-driven. So we pushed this onto our teams and I think some of our teams maybe don't enjoy being with us sometimes as a result but we say look, we're going to give you some recommendations that are going to be principled, intentional, and evidence-driven and we want to hear back from you if you don't agree on your evidence and why you're saying this is a good or bad idea. Sam Charrington: [00:34:59] Mm-hmm (affirmative). Mira Lane: [00:35:00] That's the way you have to operate right now because it is so context driven. Sam Charrington: [00:35:04] I wonder if you can talk through some examples of how human-centered design, AI, all these things come together in the context of kind of concrete problems that you've looked at. Mira Lane: [00:35:13] Yeah, I was thinking about this because a lot of the work that we do is fairly confidential, but there's one that I can touch on, which was shared at build earlier this year and that was a meeting room device and I don't know if you remember this, but there's a meeting room device that we're working on that recognizes who's in the room and does transcription of that meeting, and to me, as someone who is a manager, I love the idea of having a device in the room that captures action items and who was here and what was said. So we started looking at this and we said okay, well let's look at different types of meetings and people, and let's look at categories of people that this might affect differently. And so how do you think about introverts in a meeting? How do you think about women and minorities because there are subtle dynamics that are happening in meetings that make some of these relationships, they can reinforce certain types of stereotypes or relationships and so we started interviewing people in the context of this sort of meeting room device and this is research that is pretty well recognized. It's not novel research, but it reinforced the fact that when you start putting in things that will monitor anyone that's in a room, certain categories of people behave differently and you see larger discrepancies and impact with women, minorities, more junior people. So we said wow, this is really interesting because as soon as you put a recording device in a room, it's gonna subtly shift the dynamic where some people might talk less or some people might feel like they're observed or depending on if there's a manager in the room and there's a device in the room, they're going to behave differently and does that result in a good meeting or a bad one? We're not sure. But that will affect the dynamic. And so then we took a lot of this research and we went back to the product team and said well how do we now design this in such a way that we design with privacy first in mind? And make users feel like they're empowered to opt into it and so we've had discussions like that, especially around these types of devices where we've seen big impact to how people behave. But it's not like a hard guideline. There's not really a hard set of rules around what you have to do, but because all meetings are different. You have brainstorming ones that are more about fluid ideas. You don't really care who said what, it's about getting the ideas out. You have ones where you're shipping something important and you wanna know who said what because there are clear action items that go with them and so trying to create a system that works with so many different nuanced conversations and different scenarios is not an easy one. So what we do is we'll run alongside with a product team and while they're engineering, they're developing their work, we will take the research that we've gathered and we'll create alternatives for them at the same time so that we can run alongside of them. We can say hey, here's option A, B, C, D, and E. Let's play with these and maybe we come up with a version that mixes them all together. But it gives them options to think about. Because again, it comes back to oh, I might not have time to think about all of this. So how do we empower people with ideas and concrete things to look at? Sam Charrington: [00:38:35] Yeah, I think that example's a great example of the complexity or maybe complexity's not the right word, but the idea that your initial reaction might be like the exact opposite of what you need to do. Mira Lane: [00:38:51] Yep. Sam Charrington: [00:38:51] As you were saying this, I was just like oh, just hide the thing so no one knows it's there. It doesn't change the dynamic. It's like that's exactly wrong. Mira Lane: [00:38:58] You don't want to do that. Don't hide it. Sam Charrington: [00:38:59] Right, right. Mira Lane: [00:39:01] Yeah. And maybe that's another piece. I'm sorry to interrupt that, but one of the things I've noticed is our initial reaction is often wrong, and so how do we hold it at the same that we give ourselves a space to explore other things and then keep an open mind and say okay, I have to adjust and change because hiding it would absolutely be an interesting option, but then you have so many issues with that, right? But again, it is about being able to have an open mindset and being able to challenge yourself in this space. Sam Charrington: [00:39:33] Do you have a sense for where if we kind of buy in to the idea that folks that are working with AI need to be more thoughtful and more intentional and maybe incorporate more of this into more of this design thinking element to their work? Do you have a sense for where this does, or should, or needs to live within a customer organization? Mira Lane: [00:40:01] Yeah, I think it actually, and this is a terrible answer but I think it needs to live everywhere in some ways because one thing that we're noticing is we have corporate level things that happen. We have an ether board. It's an advisory board that looks at AI technologies and advises and that's at a corporate level and that's a really interesting way of approaching it, but it can't live alone and so the thing that we have learned is that if we pair it with groups that mine that sit in the engineering context, that are able to translate principles, concepts, guidelines into practice, that sort of partnership has been really powerful because we can take those principles and say well here's where it really worked and here's where it kind of didn't work and we can also find issues and say well we're grappling with this issue that you guys hadn't thought about. How do you think about this and can we create a broader principle around it? So I think there's this strong cycle of feedback that happens. If you have something at the corporate level or you established just what your values are, what are our guidelines and what are our approaches? But at the engineering context, you have a team that can problem solve and apply and then you can create a really tight feedback loop between that engineering team and your corporate team so that you're continually reinforcing each other, because the worst thing would be just to have a corporate level thing and just be PR speak. You don't want that. Sam Charrington: [00:41:23] Right. Right. Mira Lane: [00:41:24] The worst thing would also be just to have it on the engineering level because then you would have a very distributed mechanism of doing something that may not cohesively ladder up to your principles. So I think you kind of need both to have them work off each other to really have something effective and maybe there's other things as well, but so far this has been a really productive and iterative experiment that we're doing. Sam Charrington: [00:41:50] Do any pointers come to mind for folks that want to explore this space more deeply? Do you have a top three favorite resources or initial directions? Mira Lane: [00:42:02] Well it depends on what you want to explore. So I was reading the AI Now report the other day. It's a fairly large report, 65 page report around the impact of AI in different systems, different industries and so if you're looking at getting up to speed on well what areas is AI going to impact? I would start with some of these types of groups because I found that they are super thoughtful and how they're going into each space and understanding each space and then bubbling up some of the scenarios. So if you're thinking about AI from a how is it impacting? Those types of things are really interesting. On the engineering side, I actually spend a lot of time on a few Facebook groups where they have, there's some big AI groups in Facebook and they're always sharing here's the latest, here's what's going on, try this technique. So that keeps me up to speed on some of those that are happening and also archive just to see what research is being published. The design side I'm sort of mixed. I haven't really found a strong spot yet. I wish I had something in my back pocket where I can just refer to, but the thing that maybe has been on the theory side that has been super interesting is to go back to a few people that have made commentaries just around sustainable design. So I refer back to Wendell Berry quite a bit, the agriculturalist and poet, actually, who has really introspected how agriculture could be reframed. Ursula Franklin is also a commentary from Canada. She did a lot of podcasts or radio broadcast a long time ago and she has a whole series around technology and it’s societal impact and if you replace a few of those words and put in some of our new age words, it would still hold true, and so I think there's a lot of theory out there but not a lot of here's really great examples of what you can do because we're all still feeling out the space and we haven't found perfect patterns yet that you can democratize and share out broadly. Sam Charrington: [00:44:18] Well, Mira, thanks so much for taking the time to chat with us about this stuff. It's a really interesting space and one that I enjoy coming back to periodically and I personally believe that there's this intersection of AI and design as one that's just wide open and should and will be further developed and I'm kind of looking forward to keeping an eye on it and I appreciate you taking the time to chat with me about it. Mira Lane: [00:44:49] Thank you so much, Sam. It was wonderful talking to you. Sam Charrington: [00:44:52] Thank you.
Today we present the final episode in our AI for the Benefit of Society series, in which we're joined by Mira Lane, Partner Director for Ethics and Society at Microsoft. Mira and I focus our conversation on the role of culture and human-centered design in AI. We discuss how Mira defines human-centered design, its connections to culture and responsible innovation, and how these ideas can be scalably implemented across large engineering organizations. Get the transcript
Sam Charrington: Today we're excited to continue the AI for the benefit of society series that we've partnered with Microsoft to bring to you. Today we're joined by Hanna Wallach principal researcher at Microsoft research. Hanna and I really dig into how bias and a lack of interpretability and transparency show up across machine learning. We discuss the role that human biases, even those that are inadvertent, play in tainting data, whether deployment of fair ML algorithms can actually be achieved in practice and much more. Along the way, Hannah points us to a ton of papers and resources to further explore the topic of fairness in ML. You'll definitely want to check out the show notes page for this episode, which you'll find at twimlai.com/talk/232. Before diving in I'd like to thank Microsoft for their support of the show and their sponsorship of this series. Microsoft is committed to ensuring the responsible development and use of AI and is empowering people around the world with this intelligent technology to help solve previously intractable societal challenges, spanning sustainability, accessibility and humanitarian action. Learn more about their plan at Microsoft.ai. Enjoy. Sam Charrington: [00:02:18] All right everyone, I am on the line with Hanna Wallack, Hanna is a principal researcher at Microsoft Research in New York City. Hanna, welcome to this week in Machine Learning and AI. Hanna Wallach:[00:00:11] Thanks, Sam. It's really awesome to be here. Sam Charrington: [00:00:14] It is a pleasure to have you on the show, and I'm really looking forward to this conversation. You are clearly very well known in the machine learning and AI space. Last year, you were the program chair at one of the largest conferences in the field, NeurIPS. In 2019, you'll be it's general chair. But for those who don't know about your background, tell us a little bit about how you got involved and started in ML and AI. Hanna Wallach:[00:00:48] Sure. Absolutely. So I am a machine learning researcher by training, as you might expect. I've been doing machine learning for about 17 years now. So since way before this stuff was even remotely fashionable, or popular, or cool, or whatever it is nowadays. In that time, we've really seen machine learning change a lot. It's sort of gone from this weirdo academic discipline only of interest to nerds like me, to something that's so mainstream that it's on billboards, it's in TV shows, and so on and so forth. It's been pretty incredible to see that shift over that time. I got into machine learning sort of by accident, I think that's often what happens. I had taken some undergrad classes on information theory and stuff like that, found that to be really interesting, but thought that I was probably going to go into human computer interaction research. But through a research assistantship during the summer between my undergrad degree and my Master's degree, I ended up discovering machine learning, and was completely blown away by it. I realized that this is what I wanted to do. I've been focusing on machine learning in various different forms since them. My PHD was specifically on Bayesian Latent Variable methods, typically for analyzing text and documents. So topic models, that kind of thing. But during my PHD, I really began to realize that I'm not particularly interested in analyzing documents for the sake of analyzing documents, I'm interested in analyzing documents because humans write documents to communicate with one another. It's really that underlying social process that I'm most interested in. So then during my postdoc, I started to shift direction from primarily looking at text and documents to thinking really about those social processes. So not just what are people saying, but also who’s interacting with whom, and thinking about machine learning methods for analyzing the structure and content of social processes in combination. I then dove into this much more when I got a faculty job, because I was hired as part of UMass Amherst’s Computational Social Science Initiative. So at that point I started focusing really in depth on this idea of using machine learning to study society. I established collaborations with a number of different social scientists, focusing on a number of different topics. Over the years, I've mostly ended up working with political scientists, and often study questions relating to government transparency, and still looking at sort of this whole idea of a social process consists of individuals, or groups of individuals interacting with one another, information that might be used in or arising from these interactions, and then the fact that these things might change over time. I often use one of these or two of these modalities, so structure, content, or dynamics, to learn about one or more of the other ones as well. As I continued to work in this space, I started to think more, not just about how we can use machine learning to study society, but the fact that machine learning is becoming much more prevalent within society. About four years ago, I started really thinking more about these issues of fairness, accountability, transparency, and ethics. It was a pretty natural fit for me to start moving in this direction. Not only was I already thinking about questions to do with people, but I've done a lot of diversity and inclusion work in my non research life. So I'm one of the co-founders of the Women in Machine Learning workshop, I also co-founded two organizations to get more women involved in free and open source software development. So issues related to fairness and stuff like that are really something that I tend to think about a lot in general. So I ended up making sort of this shift a little bit in my research focus. That's not to say that I don't still work on things to do with core computational social science, but increasingly my research is focusing on the ways that machine learning impacts society. So fairness, accountability, transparency, and ethics. Sam Charrington: [00:05:53] We will certainly dive deep into those topics. But before we do, you've mentioned a couple of times the term computational social science. That's not a term that I've heard before, I don't believe. Can you ... Is that ... I guess I'm curious how established that is as a field, or is it something that is specific to that institution that you were working at? Hanna Wallach:[00:06:19] Sure. So this is really a discipline that started emerging in maybe sort of 2009, 2008, that kind of time. By 2010, which is when I was hired at UMass, it really was sort of its own little emerging field with a bunch of different computer scientists and social scientists really committed to pushing this forward as a discipline. The basic idea, of course, is you know social scientists study society and social processes, and they've been doing this for decades. But often using qualitative methods. But of course, as more of society moves towards digitized interaction methods, and online platforms, and other kinds of things like that, we're beginning to see much more of this sort of digital data. At the same time, we've seen this massive increase, as I've said, in the popularity of machine learning and machine learning methods that are really suitable for analyzing data about social processes in society. So computational social science is really the sort of emerging discipline at the intersection of computer science, the social sciences, and statistics as well. The real goal is to develop and use computational and statistical methods, so machine learning methods, for example, to understand society, social processes, and answer questions that are substantively interesting to social scientists. At this point, there are people at a number of different institutions focusing on computational social science. So yes, of course, UMass, as I've mentioned before. But also Northwestern, Northeastern, University of Washington, in fact have been doing this for years, and of course, Microsoft Research is no exception in this regard. Part of the reason why I joined Microsoft Research was that we have a truly exceptional group of researchers in computational social science here. That was really very appealing to me. Sam Charrington: [00:08:31] Oh, awesome, awesome. So you talked about your transition to focusing on fairness, accountability, transparency, and ethics in machine learning and AI. Can you talk a little bit about what those terms mean to you, and your broader research? Hanna Wallach:[00:08:54] Yeah, absolutely. So I think the bulk of my own research in that sort of broad umbrella falls within two categories. So the first is fairness, and the second is what I would sort of describe as interpretability of machine learning. So in that fairness bucket, really, much of my research is focused on studying the ways in which machine learning can inadvertently harm or disadvantage groups of people or individual people in various different, usually unintended, ways. I'm interested in understanding not only why this occurs, but what we can do to mitigate it, and what we can do to really develop fairer machine learning systems. So systems that don't inadvertently harm individuals or groups of people. In the intelligibility bucket, so there, I'm really interested in how we can make machine learning methods that are interpretable to humans in different roles for particular purposes. There has been a lot of research in this area over the past few years, focusing on oftentimes developing simple machine learning models that can be easily understood my humans simply by exposing their internals, and also on developing methods that can generate explanations for either entire models or the predictions of models. Those models might be potentially very complex. My own work typically focuses really more on the human side of intelligibility, so what is it that might make a system intelligible or interpretable to a human trying to carry out some particular task? I do a lot of human subjects experiments to really try and understand some of those questions with a variety of different folks here at Microsoft Research. Sam Charrington: [00:11:01] On the topic of fairness and avoiding inadvertent harm, there are a lot of examples that I think many of our audience would be familiar with, the ProPublica work into the use of machine learning systems in the justice process, and others. Are there examples that come to mind for you that are maybe less well known, but that illustrate for you the importance of that type of work? Hanna Wallach:[00:11:36] Yes. So when I typically think about this space, I tend to think about this in terms of the types of different harms that can occur. I have some work with Aaron Shapiro, Solon Barocas, and Kate Crawford on the different types of harms that can occur. Kate Crawford actually did a fantastic job of talking about this work in her invited talk at the NeurIPS conference in 2017. But to give you some concrete examples, so many of the examples that people are most familiar with are these scenarios as you mentioned where machine learning systems are being used to allocate or withhold resources, opportunities, or information. So one example would be of the compass recidivism prediction system being used to make decisions about whether people should be released on bail. Another example would be from a story, a news story that happened in November where Amazon revealed that it had abandoned an automating hiring tool because of fears that the tool would reinforce existing gender imbalances in the workplace. So there you're looking at these existing gender imbalances, and seeing that this tool is perhaps withholding opportunities from women in the tech industry in an undesirable way. There was a lot of coverage about this very sensible decision that Amazon made to abandon that tool. Some other examples would be more related to quality of service issues even when no resources or opportunities are being allocated or withheld. So a great example there would be the work that Joy Buolamwini and Timnit Gebru did focusing on the ways that commercial gender classification systems might perform less well, so less accurate, for certain groups of people. Another example you might think of, let's say, speech recognition systems. You can imagine systems that work really well for people with certain types of accents, or for people with voices at certain pitches. But less well for other people, certainly for me. I'm British, and I have a lisp. I know that oftentimes speech recognition systems don't do a great job of understanding what I'm saying. This is much less of an issue nowadays, but you know, five or so years ago, this was really frustrating for me. Some other examples are things like stereotyping. So here the most famous example of stereotyping in machine learning is Latanya Sweeney's work from 2013, where she showed that advertisements that were being shown on web searches for different people's names would more typically be advertisements that reinforced stereotypes about black criminality when people searched for sort of black sounding names, than when people searched for stereotypically white sounding names. So there the issue is this sort of reinforcement of these negative stereotypes within society by the placement of particular ads for particular different types of searches. So another example of stereotyping in machine learning would be the work done by Joanna Bryson and others at Princeton University on stereotypes in word embeddings. There has also been some similar work done by my colleague, Adam Kalai, here at Microsoft Research. Both of these groups of researchers showed that if you train word embedding methods, so things like Word2Vec, that try and identify a low dimensional embedding for word types based on the surrounding words that are typically used in conjunction with them in sentences, you end up seeing that these word embeddings reinforce existing gender stereotypes. For example, so the word man ends up being embedded much closer to programmer and similarly woman ends up being embedded much closer to homemaker than vice versa. So that would be another kind of example. Then we see other kinds of examples of unfairness and harms within machine learning as well. So for example, over and under representation. So Matthew Kay and some others at the University of Washington have this really nice paper where they show that for professions with an equal or higher percentage of men than women, the image search results are much more heavily skewed towards images of men than reality. So that would be another kind of example. What you'll see from all of these examples that I've mentioned is that they affect a really wide range of systems and types of machine learning applications. The types of harms or unfairness that might occur are also pretty wide ranging as well, going from yes, sure, allocational withholding of resources, opportunities of information, but moving beyond that to stereotyping and representation and so on. Sam Charrington: [00:17:02] So often when thinking about fairness and bias in machine learning and the types of harm that can come about when unfair systems are developed, the kind of all roads lead back to the data itself, and the biases that are inherent in that data. Given that machine learning and AI is so dependent on data, and often much of the data that we have is biased, what can we do about that, and what are the kinds of things that your research is exploring to help us address these issues? Hanna Wallach:[00:17:41] Absolutely. Yeah, so you've hit on a really important point there, which is that in a lot of the sort of public discourse about fairness in machine learning, you have people making comments about algorithms being unfair, or algorithms being biased. Really, I think this misses some of the most fundamental points about why this is such a challenging landscape. So I want to just emphasize a couple of those here in response to your question. So the first thing is that machine learning is all about taking data, finding patterns in that data, and then often training systems to mimic the decisions that are represented within that data. Of course, we know that the society we live in is not fair. It is biased. There are structural disadvantages and discrimination all over the place. So it's pretty inevitable that if you take data from a society like that, and then train machine learning systems to find patterns expressed in that data, and to mimic the decisions made within that society, you will necessarily reproduce those structural disadvantages, that bias, that discrimination, and so on. So you're absolutely right that a lot of this does indeed come from data. But the other point that I want to make is that it's not just from data and it's not from algorithms per se. The issue is really, as I see it, and as my colleagues here at Microsoft Research see it, the issue is really about people and people's decisions at every point in that machine learning life cycle. So I've done some work on this with a number of people here at Microsoft, most recently I put together a tutorial on machine learning and fairness in collaboration with my colleague Jenn Wortman Vaughan. The way we really think about this is that you have to prioritize fairness at every stage of that machine learning lifecycle. You can't think about it as an afterthought. The reason why is that decisions that we make at every stage can fundamentally impact whether or not a system treats people fairly. So I think it's really important when we're thinking about fairness in machine learning to not just sort of make general statements about algorithms being unfair, or systems being unfair, but really to go back to those particular points and think about how unfairness can kind of creep in at any one of those stages. That might be as early as the task definition stage, so when you're sitting down to develop some machine learning system, it's really important to ask the question of who does this take power from, and who does this give power to? The answers to that question often reveal a lot about whether or not that technology should even be built in this first place. Sometimes the answer to addressing fairness in machine learning is simply, no, we should not be building that technology. But there are all kinds of other decisions and assumptions at other points in that machine learning life cycle as well. So the way we typically like to think about it is that a machine learning model, or method, is effectively an abstraction of the world. In making that abstraction, you necessarily have to make a bunch of assumptions about the world. Some of these assumptions will be more or less justified, some of these assumptions will be better fit for the reality than others. But if you're not thinking really carefully about what those assumptions are, when you are developing your machine learning system, this is one of the most obvious places that you can inadvertently end up introducing bias or unfairness. Sam Charrington: [00:21:42] Can you give us some concrete examples there? Hanna Wallach:[00:21:45] Yeah. Absolutely. One common example of this form would be stuff to do with teacher evaluation. So there have been a couple of high profile lawsuits about this kind of thing. But I think it illustrates the point nicely. So it's common for teachers to be evaluated based on a number of different factors, but including their student's test scores. Indeed, many of the methods that have been developed to analyze teacher quality using machine learning systems have really focused predominantly on student's test scores. But this assumes that student's test scores are in fact an accurate predictor of teacher quality. This isn't actually always the case. A good teacher should obviously do more than test prep. So any system that really looks just at test scores when trying to predict teacher quality is going to do a bad job of capturing these other properties. So that would be one example. Another example involves predictive policing. So a predictive policing system might make predictions about where crimes will be committed based on historic arrest data. But an implicit assumption here is that the number of arrests in an area is an accurate proxy for the amount of crime. It doesn't take into account the fact that policing practices can be racially biased, or there might be historic over policing in less affluent neighborhoods. I'll give you another example as well. So many machine learning methods work by defining some objective function, and then learning the parameters of the model so as to optimize that objective function. So for example, if you define an objective function in the context of, let's say, a search engine, that prioritizes user clicks, you may end up with search results that don't necessarily reflect what you want them to. This is because users may click on certain types of search results over other search results, and that might not be reflective of what you want to be showing when you show users a page of search results. So as a concrete example, many search engines, if you search for the word boy, you see a bunch of pictures of male children. But if you search for the world girl, you see a bunch of pictures of grown up women. These are pretty different to each other. This probably comes from the fact that search engines typically optimize for clicks among other metrics. This really shows how hard it can be to even address these kinds of fairness issues, because in different circumstances the word girl may be referring to a child or a woman, and users search for this term with different intentions. In this particular example, as you can probably imagine, one of these intentions might be more prevalent than the other. Sam Charrington: [00:24:57] You've identified lots of opportunities for pitfalls in the process of fielding systems going all the way back to the way you define your system, and state your intentions, and formulate the problem that you're going after. Beyond simply being mindful of the potential for bias and unfairness and just saying simply, I realize that that's not simple, that it's work to be mindful of this. But beyond that, what does your research offer in terms of how to overcome these kinds of issues? Hanna Wallach:[00:25:43] Yeah, this is a really good question. It's a question that I get a lot from people is what can we actually do in practice? There are a number of things that can be done in practice. Not all of them are easy things to do, as you say. So one of the most important things is that issues relating to fairness in machine learning are fundamentally socio-technical. They're not going to be addressed by computer scientists or developers alone. It's really important to involve a range of diverse stakeholders in these conversations when we're developing machine learning systems so that we have a bunch of different perspectives represented. So moving beyond just involving computer scientists and developers on teams, it's really important that we involve social scientists, lawyers, policy makers, end users, people who are going to be affected or impacted by these systems down the line, and so on and so forth. That's one really concrete thing you can do. There is a project that came out of the University of Washington called the Diverse Voices project. It provides a way of getting feedback from stakeholders on tech policy documents. It's really good, they have a great how-to guide that I definitely recommend checking out. But many of the things that they recommend doing there, you can also think about when you're trying to get feedback from stakeholders on, let's say, the definition of a machine learning system. So that task definition stage. Some of these could even potentially be expanded to consider other stages of that machine learning pipeline as well. So there are a number of things that you can do at every single stage of the machine learning pipeline. In fact, this tutorial that I mentioned earlier, that I worked on with my colleague Jenn Wortman Vaughan actually has guidelines for every single step of the pipeline. But to give you examples, here are some things, for instance, that you can do when you're selecting a data source. So for example, it's really important to think critically before even collecting any data. It's often very tempting to say, oh, there is already some dataset that I can probably repurpose for this. But it's really important to take that step back and before immediately acting based on availability to actually think about whether that data source is appropriate for the task you want to use it for. There is a number of reasons why it might not be, it could be to do with biases and the data source selection process. There might be societal biases present in the data source itself. It might be that the data source doesn't match the deployment context, that's a really important one that people really should be taking into account. Where are you thinking about deploying your machine learning system and does the data you have availability for training and development match that context? As another example, still related to data, it's really important to think about biases in the technology used to collect data. So as an example here, there was an app released in the city of Boston back in 2011, I think it was called Street Bump. The way it worked is it used iPhone data and specifically the sort of positional movement of iPhones as people were driving around, to gather data on where there were potholes that should be repaired by the city. But pretty quickly, the city of Boston figured out that this actually wasn't a great way to get that kind of data, because back in 2011, the people who had iPhones were typically quite affluent and only lived in certain neighborhoods. So that would be an example about thinking carefully about the technology even used to collect data. It's also really important to make sure that there is sufficient representation of different subpopulations who might be ultimately using or affected by your machine learning system to make sure that you really do have good representation overall. Moving onto things like the model, there is a number of different things that you can do there, for instance, as well. So in the case of a model, I mentioned a bit about assumptions being really important. It's great to really clearly define all of your assumptions about the model, and then to question whether there might be any explicit or implicit biases present in those assumptions. That's a really important thing to do when you're thinking about choosing any particular model or model structure. You could even, in some scenarios, include some quantitative notion of parity, for instance, in your model objective function as well. There have been a number of academic papers that take that approach in the literature over the past few years. Sam Charrington: [00:30:43] Can you give an example of that last point? Hanna Wallach:[00:30:46] Yeah, sure. So imagine you have some kind of a machine learning classifier that's going to make decisions of the form, let's say loan, no loan, hire, no hire, bail, no bail, and so on. The way we normally develop these classifiers is to take a bunch of labeled data, so data points labeled with, let's say, loan, no loan, and then we train a model, a machine learning model, a classifier, to optimize accuracy on that training data. So you end up setting the parameters of that model such that it does a good job of accurately predicting those labels from the training data. So the objective function that's typically used is one that considers, usually, only accuracy. But something else you can do is define some quantitative definition of fairness, some quantitative fairness metric, and then try to simultaneously optimize both of these objectives. So classifier accuracy and whatever your chosen fairness metric is. There is a number of these different quantitative metrics that have been proposed out there that all typically are looking at parity across groups of some sort. So I think it's really important to remember that even though these are often referred to as fairness metrics, they're really parity metrics. They neglect many of the really important other aspects of fairness, like justice, and due process, and so on and so forth. But, it is absolutely possible to take these parity metrics and to incorporate them into the objective function of, say, a classifier, and then to try and prioritize satisfying and optimizing that fairness metric at the same time as optimizing classifier accuracy. There have been a number of papers that focus on this kind of approach, many of them will focus on one particular type of classifier, so like SBMs, or neural networks, or something like that, and one particular fairness metric. There are a bunch of standard fairness metrics that people like to look at. I actually have some work with some colleagues here at Microsoft where we have a slightly more general way of doing this that will work with many different types of classifiers, and many different types of fairness metrics. So there is no reason to start again from scratch if you want to switch to a different classifier or a different fairness metric. We actually have some open source Python code available on GitHub that implements our approach. Sam Charrington: [00:33:27] So you've talked about the idea that kind of people are fundamentally the root of the issue, that these are societal issues, that they're not going to be solved by technological advancements or processes alone. At the same time, there has been a ton of new research happening in this area by folks in your group and elsewhere. Does that lead to a mismatch between what's happening in academia and on the technical side with the way this stuff actually gets put into practice? Hanna Wallach:[00:34:11] That's an awesome question. The simple answer is yes. This actually relates to one of my most recent research projects, which I'm really, really excited about. So last summer, some of my colleagues and I, specifically Jenn Wortman Vaughan, Miro Dudík, and Hal Daumé, along with our incredible intern, Ken Holstein from CMU, conducted the first systematic investigation of industry practitioner's challenges and needs for support relating to developing fairer machine learning systems. This work actually came about because we were thinking about ways of developing interfaces for that fair classification work that I mentioned a minute ago. Through a number of conversations with people in different product groups here at Microsoft and people at other companies, we realized that these kinds of classification tasks, while they're incredibly well studied within the fairness and machine learning literature, are maybe less common than we had thought in practice within industry. So that got us thinking about whether there might be, actually, a mismatch between the academic literature on fairness and machine learning, and practitioner's actual needs. What we ended up doing was this super interesting research project that was a pretty different style of research for me and for my colleagues. So I am a machine learning researcher, so is Jen, so is Howell, and so is Miro. Ken, our intern, is an HCI researcher. What we ended up doing was this qualitative HCI work to really understand what it is that practitioners are facing in reality when they try and develop fairer machine learning systems. To do this, we conducted semi structured interviews with 35 people, spanning 25 different teams, in 10 different companies. These people were in a number of different roles, ranging from social scientist, data labeler, product manager, program manager, to data scientists and researcher. Where possible, we tried to interview multiple people from the same team in order to get a variety of perspectives on that team's challenges and needs for support. We then took our findings from these interviews, and developed a survey which was then completed by another 267 industry practitioners, again, in a variety of different companies and a variety of different roles. What we found, at a high level, was that yes, there is a mismatch between the academic literature on fairness in machine learning and industry practitioner's actual challenges and needs for support on the ground. So firstly, much of the machine learning literature on fairness focuses on classification, and on supervised machine learning methods. In fact, what we found is that industry practitioners are grappling with fairness issues in a much wider range of applications beyond classification or prediction scenarios. In fact, many times the systems they're dealing with involve these really rich, complex interactions between users and the system. So for example, chat bots, or adaptive tutoring, or personalized retail, and so on and so forth. So as a result, they often struggle to use existing fairness research from the literature, because the things that they're facing are much less amenable to these quantitative fairness metrics. Indeed, very few teams have fairness KPIs or automated tests that they can use within their domain. One of the other things that we found is that the machine learning literature typically assumes access to sensitive attributes like race or gender, for the purpose of auditing systems for fairness. But in practice, many teams have no access to these kinds of attributes, and certainly not at the level of individuals. So they express needs for support in detecting biases and unfairness with access only to core screened, partial, or indirect information. This is something that we've seen much less focus on in the academic literature. Sam Charrington: [00:38:41] That last point is an interesting one, and one that I've brought up on the podcast previously. In many of the places you might want to use an approach like that, it's forbidden, from a regulatory perspective, to use the information that you want to use in your classifier to achieve fairness in any part of the decisioning process. Hanna Wallach:[00:39:04] Exactly. This sets up this really difficult tension between doing the right thing in practice from a machine learning perspective, and what is legally allowed. I'm actually working on a paper at the moment with a lawyer, Zack Conard, actually, a law student, Zack Conard, at Stanford University, on exactly this issue. This challenge between what you want to do from a machine learning perspective, and what you are required to do from a legal perspective, based on humans and how humans behave, and hundreds of years of law in that realm. It's really challenging, and there is this complicated trade off there that we really need to be thinking about. Sam Charrington: [00:39:48] It does make me wonder if techniques like or analogous to a differential privacy or something like that could be used to provide a regulatorily acceptable way to access protected attributes, so that they can be incorporated into algorithms like this. Hanna Wallach:[00:40:07] Yeah, so there was some work on exactly this kind of topic at the FAT ML Workshop colocated with ICML last year. This work was proposing the use of encryption and such like in order to collect and make available such information, but in a way that users would feel as if their privacy was being respected, and so that people who wanted to use that information would be able to use it for purposes such as auditing. I think that's a really promising approach, although there is obviously a bunch of non trivial challenges involved in thinking about how you might make that a reality. It's a really complicated landscape. But definitely one that's worth thinking about. Sam Charrington: [00:40:54] Was there a third area that you were about to mention? Hanna Wallach:[00:40:58] Yeah, so one of the main themes that we found in our work studying industry practitioners is a real mismatch between the focus on different points in the machine learning life cycle. So the machine learning literature typically assumes no agency over data collection. This makes sense, right? If you're a machine learning academic, you typically work with standard data sets that have been collected and made available for years. You don't typically think about having agency over that data collection process. But of course, in industry, that's exactly where practitioners often do have the most control. They are in charge of that data collection or data curation process, and in contrast, they often have much less control over the methods or models themselves, which often are embedded within much bigger systems. So it's much harder to intervene from a perspective of fairness with the models than it is with the data. We found that really interesting, this sort of difference in emphasis between models versus data in these different groups of people. Of course, many practitioners voiced needs for support in figuring out how to leverage that sort of agency over data collection to create fairer data sets for use in developing their systems. Sam Charrington: [00:42:20] So you mentioned the FAT ML workshop. I'm wondering as we come to a close, if there are any resources, events, pointers, I'm sure there are tons of things that you'd love to point people at. But what are your top three or four things that you would suggest people take a look at as they're trying to wrap their heads around this area, and how to either have an impact as a researcher, or how to make good use of it as a practitioner? Hanna Wallach:[00:42:55] Yeah. Absolutely. So there are a number of different places with resources to learn more about this kind of stuff. So first, I've mentioned a couple of times, this tutorial that I put together with Jen Waltman Vahn, that will be available publicly online very soon. It is in fact being broadcast next week, so it should be up by the time this podcast goes live. So I would definitely recommend that people check that out to really get a sense of how we, at Microsoft, are thinking about fairness in machine learning. Then moving beyond that, and thinking specifically on more of the academic literature, the FAT ML workshop maintains a list of resources on the workshop website. That's again, another really, really great place to look for things to read about this topic. The FAT Star conference is a relatively newly created conference on fairness accountability and transparency, not just in machine learning, but across all of computer science and computational systems. Again, there, I recommend checking out the website to see the publications that were there last year, and also the publications that will be there this year. There is a number of really interesting papers that I haven't read yet, but I'm super excited to read, being presented at this year's conference. That conference also has tutorials on a range of different subjects. So it's also worth looking at the various different tutorials there. So at last year's conference, Arvind Narayanan presented this amazing tutorial on quantitative fairness metrics, and why they're not a one size fits all solution, why there are trade offs between them, why you can't just sort of take one of these definitions, optimize for it, and call it quits. So I definitely recommend checking that out. Some other places that are worth looking for resources on this, the AI Now Institute, which was co-founded by Kate Crawford, who is also here at Microsoft Research, and Meredith Whitaker, who is also at Google, also has some incredibly awesome resources. They've put out a number of white papers and reports over the past couple of years that really get at the crux of why these are complicated socio-technical issues. So I strongly recommend reading pretty much everything that they put out. I would also recommend checking out some of the material put out by Data and Society, which is also an organization here in New York, led by Danah Boyd, and they too have a number of really interesting things that you can read about these different topics. Then the final thing I want to emphasize is the Partnership on AI, which was formed a couple of years ago by Microsoft and a bunch of other companies working in this space of AI to really foster cross company collaboration and moving forward in this space when thinking about these complicated societal issues that relate to AI and machine learning. So the partnership has been really ramping up over the past couple of years, and they also have some good resources that are worth checking out. Sam Charrington: [00:46:22] Oh, that's great. That is a great list that will keep us busy for a while. Hanna, thank you so much for taking the time to chat with us. It was really a great conversation, and I appreciate it. Hanna Wallach:[00:46:34] No problem. Thank you for having me. This has been really great. Sam Charrington: [00:46:38] Awesome, thank you.
Today we're joined by Hanna Wallach, a Principal Researcher at Microsoft Research. Hanna and I really dig into how bias and a lack of interpretability and transparency show up across machine learning. We discuss the role that human biases, even those that are inadvertent, play in tainting data, and whether deployment of "fair" ML models can actually be achieved in practice, and much more. Along the way, Hanna points us to a TON of papers and resources to further explore the topic of fairness in ML. You'll definitely want to check out the notes page for this episode, which you'll find at twimlai.com/talk/232. Get the transcript
In this episode, we're joined by Peter Lee, Corporate Vice President at Microsoft Research responsible for the company's healthcare initiatives. Peter and I met a few months ago at the Microsoft Ignite conference, where he gave me some really interesting takes on AI development in China. You can find more on that topic in the show notes. This conversation centers the three impact areas Peter sees for AI in healthcare, namely diagnostics and therapeutics, tools, and the future of precision medicine. We dig into some examples in each area, and Peter details the realities of applying machine learning and some of the impediments to rapid scale. Get the transcript
Talk 228 – AI for Earth Interview Transcript Sam Charrington: Today's episode is part of a series of shows on the topic of AI for the benefit of society, that were excited to have partnered with Microsoft to produce. In this show, we're joined by Lucas Joppa and Zach Parisa. Lucas is the chief environmental officer at Microsoft, spearheading the company's five-year, $50 million, AI for Earth commitment, which seeks to apply machine learning and artificial intelligence across four key environmental areas, agriculture, water, biodiversity and climate change. Zack is co-founder and president of SilviaTerra, a Microsoft AI for Earth grantee, whose mission is to help people use modern data sources to better manage forest habitats and ecosystems. In our conversation we discussed the ways that machine learning and AI can be used to advance our understanding of forests and other ecosystems and support conservation efforts. We discuss how SilviaTerra uses computer vision and data from a wide array of sensors like LiDAR, combined with AI, to yield more detailed small area estimates of the various species in our forests. And we also discuss another AI for Earth project, WildMe, a computer vision-based wildlife conservation project that we discussed with Jason Holmberg back in episode 166. Before diving in I'd like to thank Microsoft for their support of the show and their sponsorship of this series. Microsoft is committed to ensuring the responsible development and use of AI and is empowering people around the world with this intelligent technology to help solve previously intractable societal challenges spanning sustainability accessibility and humanitarian action. Learn more about their plan at Microsoft.ai. Enjoy the show. Sam Charrington: [00:02:17] All right, everyone. I am here with Lucas Joppa and Zack Parisa. Lucas is the CEO of Microsoft, no, not that CEO, but the Chief Environmental Officer. Zack is the Co-Founder and President of Silvia Terra. Lucas and Zack, welcome to this week in Machine Learning and AI. Lucas Joppa: [00:00:22] Thanks for having us here. It’s a huge pleasure. Zack Parisa: [00:00:24] Great to be here. Sam Charrington: [00:00:25] Awesome. Let’s dive right in. We’ll be talking about Microsoft’s AI For Earth Initiative, but before we jump into that, Lucas, as the CEO of Microsoft. I think, I’m going to run this one all day. Tell me a little bit about your background and how you came to be the CEO of Microsoft. Lucas Joppa: [00:00:48] Yeah, sure. I would say I never dreamed of being the CEO of anything that’s for sure. Particularly, in the standard context of it, much less what it means in my specific title is the Chief Environmental Officer. I mean, I grew up in far northern rural Wisconsin, I was obsessed with being outside. My approach to school in life in general was what can, how can I get done with anything that I need to get done with so I can go play out in the woods? I think, I thought I was going to grow up to be a game warden or something similar to that. Technology was not a big factor in my life as well. I mean, I’ve never had a computer growing up or a TV or anything else. I eventually found my way into university, started discovering that I was really interested in thinking about a career in environmental science, studied Wildlife Ecology. Again, not the traditional career path for somebody at Microsoft. Went off and spent a little time, in the United States Peace Corps in Malawi, working for the Department of National Parks and wildlife and then came back and did my PhD in Ecology. It was really then that I started to put together this, the two kind of incredible ages that I think we’re alive in today and the way I see our world. Which is that we’re doing business here at the intersection of the information age, and then this also incredible age of negative human impacts on earth’s natural systems. It was during my PhD, I just was really struggling with what’s the right way to do science at a way that scales with the scale of the problem. That’s when computing, programming, Machine Learning all kind of came flooding into my life at the same time. Ended up at Microsoft and Microsoft Research leading programs and environmental and computer science, and then things just progressed from there. Sam Charrington: [00:02:41] You’re actively involved in academic research and a number of organizations. Can you share a little bit about that? We talked about it a bit earlier. Lucas Joppa: [00:02:51] Sure. I mean, once you, live long enough in the academic world, the Pavlovian response stored some of the rewards that, that environment installs. I mean, I’m not proud to say it, but since I’m not proud, I should just say it. I am still that academic that checks their citations every day when I wake up over breakfast. While I definitely have a much larger and more expanded per view of roles and responsibilities here at Microsoft. I still think, science is important. Science is what drives all of the environmental sustainability decisions that we make here at this company. It’s what ultimately led to why we invested in this program AI For Earth. I firmly believe that, you have to understand the details, if you’re going to try to lead an organization somewhere with a big picture vision, if you don’t understand the details, if you don’t understand the science and then it’s difficult to do that. Just the way my brain works, the easiest way to understand the details is to get your hands dirty and be in there with the rest of the world trying to build the solutions of the future. That’s where the academic research for me comes in. It’s just that opportunity to actually like go really deep and work on both sides of the equation. I still publish in the environmental science literature. I still publish in the computer science literature, and the most depressing thing about that is how few of us there are that do both of those things. It’s one of the things that I spend a lot of my time every day doing is just trying to bring those two worlds together, and publishing is a fantastic way to do that. Sam Charrington: [00:04:35] Zach, you’re a forester. Zack Parisa: [00:04:37] Yeah, yeah. Sam Charrington: [00:04:38] I didn’t know that was a thing beyond the Subaru. Zack Parisa: [00:04:40] Right, right, sure enough. It’s absolutely a thing and an exciting, I think, there’s a rebirth in forestry now. I’m hoping that it’ll become a more broadly known thing here, before too long. Sam Charrington: [00:04:56] Tell us about your background and about Silvia Terra. Zack Parisa: [00:04:59] Yeah, sure. The start of my story actually isn’t terribly dissimilar than Lucas’s. I grew up in North Alabama though not Wisconsin, but in this funny place that was like North Alabama’s, covered in woods, but it also has NASA installation, in Huntsville, Alabama. My youth was basically just spend in the woods. When I was in first grade, I wanted to be an Entomologist. When I was in third grade, I wanted to be a Zoologist. I went through, geology and so on and so forth until I finally met somebody who was a forester. Until you meet somebody and you have somebody walk you through what that is, it’s an obscure field. What that is to me is the confluence of economics and ecology for me. It was this brilliant opportunity at the time, and that’s the way that I saw it because it brought together everything that I cared about. From the ecology side, insects and soils, geology, the interconnected nature of all of those systems, but also the economic side. Not only what the forest is, but also what we want it to be and how we value that as a society, and how we mean to take it from one place now, which is where we find it today to where we want it to be, and what we believe we need. That was my entrance into it. I believed, I would carry that out. I would live and work as a forester by managing some tract of land for some owner movement, whether that’s public or private, but that I would be focused on that landscape. Going through Undergrad, what I became really interested in, were oddly and a surprise to me was the quantitative aspects of certain problems like insects in a forest. When I first got into forestry, my freshman year, there was a massive outbreak of southern pine beetle in the U.S. South, and it was killing lots of pine trees. That was a really compelling problem to me because it relates so much not only to the trees themselves and the beetle, but also how we’ve managed them historically and sort of what, how that impacts, locally economies and that type of thing. I started into pheromone plume modeling of all things in a forest and system and trying to take measurements of concentrations of pheromones in locations, and backtrack to where that originated from in the winter, to try and deal with these beetles more effectively. What I learned from that or what I gathered was that there’s this incredible ability to scale up my interests. To still focus on the things that I loved to most, but to look at them with a different lens and to potentially affect change in a different way, than I had conceived of before. I wound up doing a work in Brazil, I was really interested in Tropical Forestry. I took some time off from Undergrad to do that, and worked in other areas, Bolivia in South America. There I got to see situations where people were dependent on different aspects of land, in different ways and more direct ways than I think I was familiar with from my youth in the U.S. South. Where, they were hurting animals, they were collecting nuts, fruits, things like that. They’re collecting fuel wood to stay warm, to cook. They were also, wanting to sell wood into a market, and to develop as communities. Forestry is about trade offs. There are a lot of things that we can do, and there are a lot of potential futures that we have before us, but we have to address the complexity of those systems in more comprehensive ways than we have in the past. There’s far more than just a timber market now, there’s far more than just a concern for delivery of wood to build houses. When we spoke just a little bit before, but that was experienced very acutely here in the Pacific Northwest. When people were confronting the issue of whether we had enough spotted owl habitat or spotted owls themselves or not. Whether we had managed appropriately in the past to accommodate those and everything that’s related to that species, or the habitats and other species that are related, or whether we haven’t, whether we’d failed. If we needed to go back and reconsider the ways that we make decisions. That was a really freighted conversation, it brought people to boiling points, and that was before my time really, before I really entered into the profession in any meaningful way. That type of conversation goes on now and it’s even more complicated, and there are more issues and more dimensions that we have to consider than there were then. To have constructive conversations, we have to have information to inform those discussions to facilitate the communication that yields solutions, that people can live with. Sam Charrington: [00:10:40] I’m presuming that, that need is what led you to found Silvia Terra? Zack Parisa: [00:10:45] It is. Yeah. Absolutely. Sam Charrington: [00:10:47] What is Silvia Terra, what is the company? Zack Parisa: [00:10:48] Right, what we do here? Failing to answer your questions here. Silvia Terra we provide information, just like what I was speaking about there. Our objective is to help people use modern data sources, like remotely sensed information from satellites, from aerial basis, from UAVs and modern modeling techniques to help get more resolution on information and get more accuracy and precision on information. Not only just about trees, but about habitats and beyond. That’s the focus of our company. We’ve been at this for about nine years, a lot of the folks that we work with are timber companies, we also work with non environmental NGOs, we work with government agencies. All of them, they have effectively the same questions, they’re very similar needs. Initially, up until now we’ve been providing data project-to-project to help them answer those critical questions that they confront on a regular basis. I guess, the reason I’m in this room with you all here today is that, we were able to start working with Microsoft AI For Earth. To begin to scale and expand that work, to build a foundational data set that we can start to use to answer these questions and to build on, to improve our ability to manage for the future. Sam Charrington: [00:12:21] This may be a good segue to taking a step back and Lucas, what is AI For Earth? Lucas Joppa: [00:12:29] Sure. Well, I think in the context of this conversation, you can think about it. What is AI For Earth? That’s why a reformed forester, who’s now the co founder of a startup and a reformed wildlife ecologists, who’s now the Chief Environmental Officer at Microsoft are at a table talking with you on TWIML. Sam Charrington: [00:12:44] I feel like we’re in this recursive. Lucas Joppa: [00:12:46] That’s right. I know exactly, I can’t even see you guys anymore. I’m just staring at myself and an Infinity Mirror here. What AI For Earth is, is as of Tuesday of this week, a one-year-old program. Sam Charrington: [00:13:00] Happy birthday. Lucas Joppa: [00:13:01] Thank you. Thank you. It was fantastic. We spent it celebrating with our colleagues at National Geographic in Washington, D.C. Sam Charrington: [00:13:08] In the woods? Lucas Joppa: [00:13:10] Unfortunately no, but at the founders table of one of the most iconic and exploration driven organizations in the world. It was an incredible time. What AI For Earth is, is a five year, $50 million commitment on behalf of Microsoft to deploy our 35 years. Actually a little bit more than 35 years of fundamental research in the core fields of AI and Machine Learning. To deploy those to affect change in these four key areas of environment that we care deeply about, which is agriculture, water, biodiversity, and climate change. The reason that we’re doing that is, because we recognize that at Microsoft, I already spoke about this tale of two ages really, this time of this information age and this time of incredible, negative impacts of human activities on earth’s natural systems. You look and you realize that as a society we’re facing almost an unprecedented challenge. We somehow have to figure out how to mitigate and adapt to changing climates, ensure resilient water supply sustainably feed, human population, rapid, the growing to 10 billion people. All while stemming this ongoing and catastrophic loss of biodiversity that we see are around the world. We’ve got to do that while ensuring that the human experience continues to improve all around the world for everybody that economic growth and prosperity, continue to grow. That’s why I say it’s an unprecedented challenge. I mean, the scope and the scale are just incredible. If you look at the scope and scale of the problem and you step back and you ask yourself the same question as a company that I asked during my PhD, which is, “Well, what are the things that are growing in the same exponential fashion as the scale and complexity of that challenge of our environmental challenge?” Well, pretty much the only trends that are happening in an analogous fashion, are in the tech sector and particularly in the broader field of AI and the more narrow Machine Learning approaches that are getting a lot of attention today. That’s when we decided to put together this program to actually say, “Hey, we’ve been investing as a company for over a decade at the intersection, environmental science and computer science.” I led research programs in our blue sky research division called Microsoft Research for a fair number of years on that. But, then the technology reached a point, the criticality of the societal challenge, I think, reached a point that it was time for a company like Microsoft to step in and actually start to deploy some of those resources. Deploy them in ways that, ensure that we ultimately change the way that we monitor model and then ultimately manage earth’s natural systems in a way that we’ve never been able to before. We started out, as I said, a year ago with basically nothing but aspiration. We looked back this past Tuesday, this event that we had National Geographic where we inducted a new set of grantees into our portfolio, and realize that in that short year we’d set up relationships with organizations all over the world. Over 200 organizations all over the world, each that are dedicated to taking a Machine Learning first approach to solving challenges in these four domain areas that we focus on. There on all set, they’re working on all seven continents now, over 50 countries in the world, 34 countries here in the United States. Today, get the opportunity to sit down with one of the grantees, right? To hear a little bit more about, just their particular experience, and talk about the ways that that Machine Learning in particular can fundamentally change our ability to understand what’s going on on planet earth. Because I think, that most people don’t take the time to step back and realize when they hear terms like information age, just how narcissistic that really is, that almost every bit of information that we’ve been collecting is about ourselves, right? It’s about where the nearest Starbucks is, it’s about what people who searched for also searched for, right? It’s at the peril of ignoring the rest of life on earth and the ways that it supports us in our economies, it’s what Silvia Terra, I think, is so focused on, is using vast amounts of data, new approaches in Machine Learning to actually just ask them simple questions like, where are all the trees in the United States? We don’t know answers to things like that. I mean, that just blows my mind, and so that’s where a lot of this came from. It’s just a fundamental desire to change our ability to monitor and model life on earth. I guess, that isn’t all that simple, but- I also think it’s completely and totally doable, right? I mean, look at where we’ve come from, from an information processing capacity over the past 25 years to where we are today. I mean, if you would’ve tried to predict every little bit of it, it would have been impossible, but it seems preordained now that you look back at it. Sam Charrington: [00:18:38] When I think about the types of systems that we’ve been talking about thus far, both the economic systems, political systems as well as the biological systems. It jumps out at me that there’s a tremendous amount of complexity in those systems, and Machine Learning, deep learning in particular has this great ability to pick out patterns and abstract away from complexity, which kind of says to me, “Oh, it’s a no brainer to apply Machine Learning to this.” We’re still very early on in our ability to put these Machine Learning to work. I guess, I’m curious, maybe for you Zack, where you think the opportunity is with applying Machine Learning and AI, for the types of problems that concern you in particular with regard to forests? Zack Parisa: [00:19:43] Yeah, yeah, absolutely. I guess, listening to Lucas there, one thing that jumps out at me from when you first spoken that, your response to the second question there are lots of people that are very interested in natural resources and there are lots of people that are very interested in Machine Learning and AI, but it is a very small community of people. I think, it’s rare that you … it’s uncommon to start out believing, you’re going to spend all your time outside and then find yourself curled up in front of some code. The first thing, I think there’s a lot of opportunity for people to make that leap and just to begin to see that as a more natural thing, because the questions are very complex. Again, just like Lucas said, most of our focus has been on how to market to somebody to buy a cup of coffee here versus there. How to think about social networks and how to think about marketing networks and transportation networks. I think, it’s exciting to see that begin to percolate down and transition to the story behind how all of those materials come into our world and life. The fact is that everything around us and I think the surprising fact is that everything around us, every little bit of technology and everything that built this room that we’re in or that your listeners are in, those things were either grown or mined. Every piece of that, every little bit has some geographic story, some geographic stories, some physical story, some environmental story. If we were to be confronted with all of those stories, just from one day of our consumption, one day of us interacting as we normally do, it would take us years to even sift through all of those stories. There’s no way, there’s no way, but those stories, all amass to have a very large impact in how we all live. To me, that is the huge opportunity here. We with Microsoft AI For Earth have worked on this data set for the continental U.S. at high resolution to inform about, down to species and diameters, where trees are and what those structures and compositions are and moving forward what they could be. That’s not going to stop, the fact that we are all consumers and that while we have a conservation need, we also have a consumptive need. I think, there’s so much opportunity to begin to investigate how we balance that and how we feel about that and to engage a meaningful conversation, as at multiple levels in society about how that can best be done. Ask about opportunities. I mean, I was never excited about AI or Stats or Machine Learning for the sake of, I mean, it is awesome, I now understand that, and I do get jammed up about exciting advances there, but it’s about what it can answer. I mean, that’s what drew me out of the woods and put me in front of a computer, it was the ability to start to even think about those big questions, and put it all like distill it to something simple and right in front of us. That’s the opportunity. It allows us to know more about our world and ourselves and to create a better world and a better image of our of ourselves. Sam Charrington: [00:23:34] Can we maybe dig into a little bit more detail of either the Dataset that you just mentioned or another project and talk through, the process through it’s Silvia Terra uses Machine Learning, the challenges that you run into maybe walk us through a scenario. Zack Parisa: [00:23:54] Sure. Absolutely. I’ll just briefly tell you where we’re coming from. People have been managing forests for hundreds, a couple 100 years and in the U.S. about 100 plus. They needed information then, as they do now, but to get that they would do a statistical survey, they would go and put measurements in and you work up in average and you make a plan based on that average. That has been effective, it’s what people use a lot still today, but what we’re focused on doing is bringing imagery into bear and model assisted and model based methods to yield small area estimates. For us it’s at a 15 meter resolution, and for a 15 meter pixel, what we’re predicting is the number of stems, their sizes and species. When I say size, I mean the diameter of the trunk of the tree at four and a half feet off the ground. From there to, in a hierarchical context to predict them, maybe the height of the tree or the ratio of crown to just clear bowl at the bottom. From there, maybe the herbaceous, since we can infer or predict they’d be the light conditions under that forest, how much herbaceous plant matter there may be there? Carrying that forward. How many herbivores that could support scaling that up? How many large carnivores that could support? For now, the primary piece, this foundational data set that we’ve worked with Microsoft on is that tree list information for each one of those pixels, which hasn’t existed before, but that opens up so many doors for what we can begin to build onto and model further down the line. Sam Charrington: [00:25:46] At a resolution of 15 meters, single pixel might contain how many trees? Zack Parisa: [00:25:54] It could contain an awful lot. Easily, and this is the tricky thing because the tree could be as small as a seedling, it can be as large as a sequoia. You could have less than one, right? It could have 300 packed, but small, tiny little trees packed and tightened. This is the fundamental difference about what we’re working on here, to me than where we’re coming from. Which is, we need to transition away from the binary or the basically qualitative classifications, forest, non-forest. That’s not actually that informative about what that forest can … what habitat it can provide. What maybe we need to do or not do to ensure that it’s the type of forest that’s going to continue providing the things we care about. Clean water, carbon out of the atmosphere, wood to build this table. Those are the types of things. Beginning to quantify those aspects is very important. When I began working with this, everything was on the table. I mean, there was a potential to use LiDAR and neural nets, to try and clarify discrete trees. We do not do that for various reasons, largely bias in results. For us, parting out species became a massive problem. If you have, let’s say 40 trees of multiple species in one pixel, how do you begin to differentiate those when you’re looking at one pixel of data from lots of imagery sources. That was a technical challenge. Lucas Joppa: [00:27:40] One of the things that I think is interesting about this is like you’re talking about forestry, right? Whether or not people know it’s a profession, it’s an extremely old one. You know some people are going to … you don’t think that you’re going to be talking about Machine Learning. You also don’t think that you’re necessarily going to be talking about philosophy or existential questions, but you asked a question about 15 meter resolution, right? Which when you work with organizations like Silvia Terra that are looking down at the world and asking what is there, you end up having these existential conversations about what is a thing, right? At what level should we be taking data points to be able to feed into these Machine Learning algorithms? Because when you incorporate the zed dimension or the Z dimension or whatever you want to call it, whatever part of this planet earth we’re from, you can be looking down at a multitude of different objects, right? Depending on what sensor you’re using, you may only see one of them or you may see many of them. If you’re using something like LiDAR and you’re able to get your laser sensors enough to see enough of those things. You start struggling with all of these questions that are actually fairlyn unarticulated in the modern Machine Learning literature quite frankly. Where, all the standard libraries taken a 300 by 300 pixel image and they all have these harsh expectations and sure, maybe we think we all left the world of frequent statistics behind, but we still carry over it the ghosts of a lot of those, harsh binary classification results. It’s just fascinating I think, to think about, not just like what’s hard in the forestry space, and how modern Machine Learning techniques can help transform that, but also what the problems in the applications that an organization like Silvia Terra, and then the rest of our AI first grantees, what that brings to the Machine Learning community, which is what’s hard here? Why can’t we just take all the deep neural network advances that we’ve made and just voila, we’ve solved all the world’s problems, right? It’s because, as you said, we’re still at the infancy of a lot of what we hope to achieve in Machine Learning. We just also recognize the severely short amount of time that we have to answer some of these bigger and environmental questions. We have got to take everything that we have at our disposal and start to deploy it. Sam Charrington: [00:30:18] You mentioned sensors and LiDARs, a very specific curiosity question. I’ve always associated LiDAR, like a local, a very short range local sensing mechanism. Is that not the case? Can you do LiDAR from satellites? Lucas Joppa: [00:30:34] Yes, yes. Sam Charrington: [00:30:35] Talking about satellites or playing- Lucas Joppa: [00:30:36] Playing. Sam Charrington: [00:30:37] What are all the sensors that come into play here? Zack Parisa: [00:30:38] A new sensor was just launched a couple weeks ago. Lucas Joppa: [00:30:42] Something like that. Zack Parisa: [00:30:43] There’s JEDI Sensor, it’s called JEDI. I’m used to it now. Lucas Joppa: [00:30:48] I was going to say it. Sam Charrington: [00:30:49] Use the LiDAR? Zack Parisa: [00:30:50] Use the LiDAR, Lucas. Lucas Joppa: [00:30:52] JEDI, here’s a … Zack Parisa: [00:30:55] Well, it’s worth [crosstalk 00:30:56]. They’re strapping this thing onto the space station. It’s going to be pulsing down, not the polls, but basically everything between. I think, it’s full-waveform LiDAR and so absolutely, even historically there was iSAT, which was a satellite-based LiDAR Sensor. Moreover, and more commonly in forestry, and a lot of even in urban areas, they’re collecting LiDAR information from airplanes at different altitudes and different point densities. Something common one might be like 12 or 24 points per square meter. When you see that over a forest at canopy, some of those pulses reached the ground. The best elevation models that you see in the U.S. right now, are LiDAR derived elevation models. That’s the source of a lot of the information that we’re getting. You see it in a lot of flood plain areas, the Mississippi Delta area, so that we can better understand how flooding may occur or may not occur in certain areas. Lucas Joppa: [00:32:02] One more thing that I’m always struck by, when you start thinking about remote sensing and just sensing in general as applied to environmental systems, is that as we start to take a more digital or computational approach to sensing, we almost by definition have got to start taking a more Machine Learning approach to driving insights. Because, what computers are able to do, and I don’t know, maybe I’m just missing the conversation or maybe the conversation isn’t as fully articulated as it could be, but computers are able to sense the world in so many more dimensions than people are. Why do we model? Well, we model because we need a simplifying function to help us understand an already complex world. What was already complex according to our five senses has now become exponentially more complicated with things like hyper spectral resolution monitoring, where you’re getting thousands of bands back of imagery plus things like LiDAR that are getting 24 points per square meter. You can’t, humans don’t even know … It’s interesting, people always complain that they don’t understand what the layers and a deep neural network do. We also have no idea how to even interpret most of the signals that are coming back from the most advanced sensors in the world because they don’t correspond to dimensionality that we live in. Sam Charrington: [00:33:22] I was just going to ask that, when I’ve talked to folks that are using LiDAR in the context of self-driving vehicles and this whole idea of sensor fusion comes into play and making sense of all these disparate data sources. That example are very local and now we’re talking about, global data sources or at least much larger scale and with overlapping tiles and capabilities. There’s a ton of complexity, are those … is that type of complexity, some of the complexity that your company is working on managing or do you count on upstream providers to sort a lot of that out for you? Zack Parisa: [00:34:09] That’s exactly the type of complexity that we deal with. I mean, there are an enormous pool of potential data sources that exist and they all have potentially very useful attributes. Some of them less so, they have different timestamps associated with them, and there’s one very nice thing about measuring forests is that, as long as you don’t mess with them, they tend not to move too much. Trees, they’re pretty willing subjects just to be measured, but they are always changing. There’s growth associated, if there’s natural, there’s naturally occurring disturbance. There is human-caused disturbance and both of those we want to keep track of. What I see our role right now as being is taking that massive pool of potential sources of remotely sensed data, and the very small and often underappreciated pool of field measurements. The things that we actually might care about and translating between those things and creating something that is more highly resolved, more accurate, more precise and more useful than what could otherwise be achieved. So, yeah draw the signal out of the noise, the classic tale. Lucas Joppa: [00:35:24] If I look at kind of the full portfolio of AI For Earth grantees, well over 200, you see that, at least in my mind, Silvia Terra is as an organization one of the most mature, right? They’re actually out of the lab, their startup business model, et Cetera, et Cetera. When I think about why that is in the context of Machine Learning, why they’re able to take advantage of that. It’s because of one thing that we just heard, which is they’re taking advantage of these ground-based data points that they can use to train their models, right? That’s because forestry is something that is so inherently tied to our broader economy that we have here in the United States and all around the world. A history of going out boots on the ground and putting a tape measure around a tree and a GPS signal next to it and saying, “This tree is here, it’s this height and it’s of this species.” That’s so rare in the broader environmental space. It’s one of the reasons that I think, organizations like Silvia Terra are unfortunately standing alone in many respects is because there’s so few data sets. It’s called Machine Learning because we’re teaching computers, right? To teach, you have to be taught or to be taught, you need to be shown examples. It’s why we’ve seen, so significant of advances in other fields of Machine Learning but not in others. There’s just so few annotations in our space that when you come into a forestry space where the U.S. government has paid money for the past hundred years to go out and figure all this out. Companies like Silvia Terra can stand on top of that and really just kind of zoom off ahead. But, they are in many ways the exception to the rule, which is unfortunate I think. Sam Charrington: [00:37:18] Do you find that the kind of work that you’re doing, we talked about the sensing and pulling all that information together. Does this put you at the research frontier of using Machine Learning techniques or you able to use off the shelf types of models? Where does your work fall in the spectrum of complexity? Zack Parisa: [00:37:45] Boy. Sam Charrington: [00:37:46] Or maybe complexity is not the right word just in terms of the innovation cycle, are you able to apply things that people are doing in other fields pretty readily? Or are you having to push the limits and pull right out of academic research or things like that? Zack Parisa: [00:38:05] It’s a little bit of both. I mean, our core algorithm has been, it’s matured over the last nine years of doing the work that we have, and we’re a small team, we’re 10 people effectively. I guess, when I got into this, I originally, when I thought this quant path was something that really resonated with me that I wanted, that I connected with, and then I saw value in. I originally, then thought I was going to a professor, I would be a researcher somewhere. I would be putting papers out because that must be how change happens. My path changed when I went around to people that I’d worked with an industry and asked them what papers they were reading to effect, to change the way that they worked? What was the most influential journals that they were reading? The answer was that they weren’t reading the journals, they were busy managing land and that they wanted a tool, not a publication. I mean, that was a little eye opening, that’s what Max my other, Max Nova my Co-Founder and I set about to do is build tools. I don’t really, accept like a full dichotomy between, is it research or is it just off the shelf type stuff? I mean, we pride ourselves in our ability not only to understand the systems that we’re working in, but also, to be abreast of what’s happening in modern computational techniques and modeling efforts, your modeling tools. Which I imagine everybody would probably say, right? Like everybody would tell you, no. We’re right on the edge. The funny thing that I learned when I got into this, I’m on the applied side. I mean, I talk with people that are trying to figure out wildfire modeling and how to pick which communities to allocate funds and efforts to help manage a forest to prevent catastrophic fires. I work with people that are trying to figure out how to manage for forest carbon. I work with people that try and figure out how to manage forests to deliver wood to a mill to make paper. What’s I guess, striking to me from where I started to now, is I thought that what people needed to see was the math. I thought I would show up at their offices and be like, “Good news. We figured it out. Check this new method out. We pipe in this data. We put in these measurements from the ground. We’re able to model this more effectively now.” What I learned is that if I can’t communicate effectively about what we’ve done, if it really truly seems like magic than it is by definition, it’s incredible in the truest sense of the word, it is not credible, and credibility counts. In some cases where, when we’re working with people, we may not use the most fantastic new thing. We may use something that is slightly more costly in terms of input data that it requires or costly in terms of model fit, but that is more easily understood and explained and more robust to, like the boot test. You go out and it just makes sense. Sam Charrington: [00:41:36] Lucas, does that experience ring true for the other grantees that you work with or are there a spectrum of experiences there in terms of where they are and applying? Lucas Joppa: [00:41:47] Some of our grantees are using almost commodity services at this moment. I mean, Microsoft for instance has a service called Custom Vision AI, sorry, Custom Vision API. They want to do, some of our grantees want to do, simple image recognition tasks and the service works for them. They literally just drag and a whole bunch of photos of one type and a whole bunch of photos of another type and the system learns it and produces a result for them and that’s fine. Right? That’s pretty far on the one side of just like commoditized services. Then there are other grantees that are out there creating exceptionally custom algorithms for their work. I think, we’ve got a grantee, called Wild Me that does basically facial recognition for species, so that they can provide better wildlife population estimates of a species like giraffe, and zebra, things that they can. Everybody knows a giraffe or everybody has heard that every giraffe’s pattern is unique, but look at a couple of photos of giraffes and you realize just how hard it is for the human eye to spot those differences. Right? They’re building algorithms to differentiate any particular, zebra or giraffe and then plug those into statistical models for estimating populations. There’s nothing off the shelf that does that. In fact, most of the main libraries, they have to go back and modify the core code of, so it’s a full, full spectrum. We’re willing to support all of it, right? Because, what we’re trying to get people to understand is, well, first and foremost, we’re just trying to break down the access barrier, right? We want to ensure that budget isn’t a barrier to getting this stuff done. Because as I think, sure you and many of your listeners are aware, sometimes the latest Machine Learning approaches can be fairly expensive. If not, it might be an open source library, but somebody needs 1000 GPUs to run this thing on, right? We make sure that the infrastructure gets in the hands of folks, et Cetera, but it’s also just awareness that you could be thinking about this, you don’t have to be. We want the world’s leading Machine Learning scientists to be thinking about what they could be doing, but we don’t want the rest of the world to think that they have to be one of the world’s Machine Learning experts to have a crack at this, right? That there’s software and services that can help them as well. We see the full spectrum and I think it’s super healthy. We also see the full spectrum of, if I would encapsulate what Zack was saying there and just two words of interest in what we would call Explainable AI, right? Do people really care why an algorithm said that this was a giraffe and that was a zebra? Not really. You don’t have to explain that to them. Right? Do they want to understand why some decision support algorithm, like land, like a spatial optimization algorithm that assigns this part of the country or this part of the county into protected land and this part into industrial use and this part into urban growth and expansion? How that works and why people thought that this was the better policy than that? Sam Charrington: [00:45:14] Probably so. Lucas Joppa: [00:45:15] Yes, they do. I think, there’s a lot of hand wringing and angst right now around conversations like Explainable AI and whatever. I think, it’s no different than the conversation we’ve always had about modeling, which is why it’s a model of a complex system. Why are you building it? If it’s being built to just do a simple classification task and it’s easy for a human to go and check the accuracy left or right then great. You can use some really advanced statistical techniques, if it’s something that, if that model instead is a model of, for instance, a human decision process, then I think the onus on kind of explainability is much higher. Sam Charrington: [00:46:03] Along those lines, we’ve used computation to understand the environment climate for a very long time. Weather for example, has been a great focus of high performance computing. Taking a step back from, the fact that we’re all really excited about AI. Where do you think AI offers unique opportunities relative to the things that we’ve done for a long time? Lucas Joppa: [00:46:31] Sure. Well, I think that the answer to that will be super complex, I’ll try to make it simple, you mentioned weather. I think sure, there’s no question that statistics, and math and then the computational platforms that started to support them over the recent decades have been used for environmental monitoring. I mean, Fisher was, it goes all the way back to some of these guys were biologists. Right? The bigger question is why are we excited about this today? For me it really is the full broad definition of what we mean by AI. It’s the recognition that we’re finally deploying computing systems that can collect unprecedented amounts of data and not just amounts, but we were talking about the full crazy dimensionality of the data that we’re starting to take on. We’ve got this breakthrough in data, we’ve got this breakthrough in infrastructure, where you can … I made a joke about needing 1000 GPUs. Well, if you need one, 1000, 10,000, you just got to turn a knob these days and get access to it. Sam Charrington: [00:47:43] Wherever you are on the novice, still a lot cheaper than a supercomputer. Lucas Joppa: [00:47:47] Extremely. We have made crazy advances and just a whole plethora of algorithms, but for a lot of the most important ones, we’ve directly accelerated the compute, through the perspective of those algorithms. For the first time, and then of course we’ve made it so easy to deploy these algorithms as web based services, as APIs, right? Then, of course the software infrastructure stack and all of that is incredible. We’ve made it commodity level infrastructure, anybody can get access to this stuff. You hear this term Democratizing AI, what we mean by that is bringing it all into a stack that anybody can use. You don’t need access to a government-run super computer anymore, that’s all one side of it. The other thing is from weather, as a great example here where traditional weather forecasting was strong numerical simulation. That’s one type of math, right? But, there wasn’t a lot of learning in real time about what was going on. We took a physical process, we built a model that we thought strongly corresponded with it, and then we ran numerical simulations of it. Fast forward and yeah, just for the simulation perspective, you need a lot of compute. The question is, but all sorts of crazy things happen when we do that, that we don’t quite understand. Right? Little eddy flux has happened in some atmospheric layer or whatever and we don’t really know why. Then the weather community started using Machine Learning to not necessarily learn why, but to be able to predict for one reason or another when those things were going to come and weather forecasting got a lot better. Same thing is happening now in climate modeling as well. We know there’s things that we just can’t do, from our traditional approach to climate modeling. There’s a whole new group that just spun out, that’s taking purely Machine Learning first approach to building a new climate model for the world and not positioning themselves as better, but positioning themselves as complimentary. I think, that there’s a lot of work that’s just happened and commoditizing all of this stuff as well as, recognizing that while we’ve taken a hugely mathematical, statistical and computational approach to doing some of the stuff in the past. Machine Learning is a different approach, right? It’s a data driven approach, and that can be very complimentary and we’ve seen it accelerate extremely economically important things like weather cap, forecasting, forestry, agriculture, and on and on. Sam Charrington: [00:50:31] As we wind up. Zack, can you share something that you’re particularly excited about, looking forward in terms of the application of AI to Forestry? Zack Parisa: [00:50:42] Yeah, absolutely. I mean, obviously we’re excited to be releasing this data set, but it’s really about what it enables. We’re excited to see more nuanced and reactive markets around environmental services like species, habitat, carbon, water, be informed by these type of data and to play a part in that process to integrate these concerns into ongoing management decisions. That’s the biggest piece. It’s what you can do with this information, as you even move it from data to information to decisions. Sam Charrington: [00:51:29] Lucas, how about from your product, as you look at this from both a very technical and research perspective, but also as managing and interacting with this portfolio of innovators that are working in this space. What are you excited about? Lucas Joppa: [00:51:48] Well, ultimately the future I see, and the way that we’ve structured the whole program is we think the world fundamentally needs is the ability or what society needs is the ability to query the planet by X, Y, and T. We need to be able to ask questions just like we ask some potentially- Sam Charrington: [00:52:10] No zed? Lucas Joppa: [00:52:10] What’s that? Sam Charrington: [00:52:11] No zed? Lucas Joppa: [00:52:12] No zed. Well, I was actually speaking with my team the other day and I had sent a slide that said X, Y, T. Apostrophe Z and I was like, I said, “Stretch goal.” So, yeah, we get the zed dimension then I can retire. But no, I think, ultimately that’s where we need to go, we need to be able to allow people to ask for any particular piece of land or water, what was there? What’s there now? What could be there? Empower policy makers to figure out what should be there. We’re far from that. Now, Microsoft has always had an empowering an ecosystem of customers and partners approach. We don’t look at the world and say, “Oh, say we buy into my X, Y, T vision.” We don’t see that as some fantastical crystal ball that the world spins around and taps on, we see it as a constellation of services and products and solutions brought by all sectors. What we’re looking to do is engage with the Silvia Terra’s of the world, unfortunately, there are far too few at the moment. Engage with those that are there, bring up the next generation and the next and the next, until eventually there’s a self supporting community of Machine Learning, we talk about born digital. I think, about born Machine Learning, these organizations that it’s just baked into their DNA, but the organization doesn’t exist because of Machine Learning. It exists because of the challenges that we face in the environmental space. They just are capable of ingesting Machine Learning approaches natively and efficiently and treat space and time as first class data citizens in this world of Machine Learning. Sam Charrington: [00:54:07] Fantastic. Well, Lucas in Zack, thanks so much for taking the time to chat with me. Lucas Joppa: [00:54:13] Thank you. It was a pleasure. Zack Parisa: [00:54:14] Yeah. Thanks Sam. Appreciate it.
Talk 227 – AI for Accessibility Interview Sam Charrington: [00:00:00] Today we're joined by Wendy Chisholm, Lois Brady and Matthew Guggemos. Wendy is a principal accessibility architect at Microsoft and one of the chief proponents of the AI for Accessibility program, which extends grants to a high-powered accessibility projects within the areas of employment daily life and communication and connection. Lois and Matthew are co-founders and CEO and CTO respectively of iTherapy an AI for Accessibility grantee and creator of the Inner Voice app, which utilizes visual language to strengthen communication in children on the autism scale. In our conversation, we discuss the intersection of AI and accessibility, the lasting impact that Innovation and AI can have for people with disabilities and society as a whole and the importance of programs like AI for accessibility in bringing projects in this area to fruition. This episode is part of a series of shows on the topic of AI for the benefit of Society that we're excited to have partnered with Microsoft to produce before we proceed. I'd like to thank Microsoft for their support and their sponsorship of this series. Microsoft is committed to ensuring the responsible development and use of AI and is empowering people around the world with intelligent technology to solve previously intractable societal challenges spanning sustainability, accessibility and humanitarian action. Learn more at microsoft.ai. Enjoy the show. Sam Charrington: [00:02:06] All right, everyone. I am here with Wendy Chisholm, Lois Brady, and Matthew Guggemos. Wendy is a principle accessibility architect at Microsoft and Lois is co-founder and CEO of iTherapy. And Matthew is a co-founder and CTO at iTherapy. Welcome, all of you, to this week in machine learning and AI. Wendy Chisholm: [00:02:33] Thank you. Lois Brady: [00:02:33] Thank you. Matthew Guggemos: [00:02:34] Thanks for having us. Sam Charrington: [00:02:35] Fantastic. Fantastic. I think five people is the largest interview that I've done. But we had the advantage of all being in the same room. In this case, I'm seated with Wendy, actually in a studio in Redmond, and Lois and Matthew are joining us remotely. And today we'll be talking about some of the work that Microsoft is doing around AI for accessibility. I'm really looking forward to digging into that. But before we do, I'd like for our audience to get to know each of you a little bit better and what you're working on. So, let's start with you, Wendy. How did you get involved in working on this intersection of accessibility and artificial intelligence? Wendy Chisholm: [00:03:20] Yeah. It's a fun story. It starts 25 years ago when I was working on my computer science degree. I've just always been very curious about not only technology, but the humans that use them. And if we're building technology and no one's using it, why are we doing it? So, I was studying computer science and psychology, and one of my professors asked me to tutor a student in statistics. And I said yes and I met him, and he was blind. I had not ever met anyone who was blind before. I wasn't really sure what I was doing, but I was very curious to learn about his experience and what I could do. So, we got very creative and used Legos to teach bar graphs, and I used a pin to poke holes in a piece of paper to create a scatter plot. And with the backdrop of my computer science degree, I was like, "There's got to be something that computers can do." And that really has been my trajectory ever since. It's taken me around the world. I got to work at MIT with Tim Berners-Lee. I got to write an O'Reilly book and go to Foo Camp. I got to do a lot of really cool things. So, what that eventually did was by understanding the diversity of human experience worldwide and how culture and language can play into that experience, and trying to shift how people think and understand the experience of the billion people on the planet who have disabilities, myself included, it's just been a very interesting thing. So, what I ended up doing was consulting for a while, and helping companies try to make their websites more accessible. And in realizing that people get very excited about it, they'd think, "Yeah, I want to do this. It's a great idea." But then not a lot would change. So, I decided to join a large corporation to understand those decisions are made and quickly learned that it's the tooling that can really help people make better decisions. And a lot of times, it's not that people are making decisions that make the world less accessible out of malice. Just a lot of folks don't know. So, the more that we can infuse our engineering systems with the knowledge and information that's needed to help people make good decisions, then it's more likely they'll end up with more accessible outcomes. So, that's kind of what I've been doing the last while, and that's what led me to AI and machine learning. 'Cause to me the real juice in this, again, is how we bring technology together with humans, and it's about helping people make good decisions. So, that's kind of how I ended up there. Sam Charrington: [00:06:10] Okay. Maybe to contextualize what iTherapy is up to, I'll have you talk a little bit about the AI for Accessibility project at Microsoft, and your role in particular with that project. Wendy Chisholm: [00:06:26] Yeah. What's super cool about the program is that there's been this long history of innovation that … for people with disabilities that ends up impacting all of us. And my favorite example of that is that smartphones wouldn't exist if it weren't for people with disabilities. And the reason is that when you use an onscreen keyboard to text or to type, that keyboard was actually created a long time ago for people with physical disabilities who couldn't actually type, whether for weakness or loss of a limb or anything. And now it's being used by all of us because we're not carrying keyboards around with us so we can type on our phones. The phones create such a limiting experience for us, and that's what technology for people with disabilities is about, is just really what abilities do you have and let's amplify those. So, because of that long history of innovation, what we're doing in the AI for Accessibility program is just funding projects that are working on that next wave of innovation. So, I get to spend my time talking to people who are working on that next wave and then kind of figuring out who to fund and how to piece that all together into this … what this future is gonna look like for all of us. Because like I said, when we innovate for people with disabilities, we end up impacting all of us. So, I get to kind of look into the future and place some bets, basically. Sam Charrington: [00:08:02] That sounds like a ton of fun. Wendy Chisholm: [00:08:03] It is, yeah. I get to talk to really smart people like Lois and Matthew, and give them money and support to do cool stuff. Sam Charrington: [00:08:12] Lois, can you share a little bit about your background and iTherapy? Lois Brady: [00:08:18] Absolutely, Sam. Like Wendy, our journey started about 25 years ago when I became a speech language pathologist, working with people who had communication challenges in one way or another. At that time, technology was all about getting the old technology from the general population and then we had to adapt it to our students who needed communication technology. When the iPad came out, that kind of flipped the model for us, and it was very impressive. And I noticed how a lot of my students who had significant challenges, complex communication needs were gravitating towards the iPad, and it encouraged me to write a book called Apps for Autism. So, as I was writing that book, I noticed that there's certain features that kids would attend to and use, and they'll use it without even us prompting them to use it and there's certain features that they don't really care about. So, taking all of those things that all my students really loved, we put them into Inner Voice to capture their attention and teach them communication, and also just to give kids with complex communication needs or complex sensory needs the latest, greatest technology and just let them communicate more or less like everyone else. So, we created the app Inner Voice, and then when we saw the artificial intelligence grant, we thought it was absolutely perfect to just keep the students, keep our ideas rolling along, make it easier, faster, more fluent, and really put our kids with communication challenges at the forefront of technology instead of 10 years behind everyone else, using all of the equipment that everyone else no longer uses. So, this has been wonderful for us, because as our kids use this, they really draw in everybody around them thinking, "Oh, this is so cool. I want to see what you're doing. I want to come and use it also." So, it's been a big boost for us. Sam Charrington: [00:10:24] And I understand that you're also trained in animal assistive therapy and you have a therapy pig named Buttercup. Is that correct? Lois Brady: [00:10:36] Absolutely. Buttercup is more famous than anything else I've ever done. He's absolutely wonderful. And I chose a potbelly pig because I specifically work and I specialize in autism, and a lot of kids with autism definitely have trouble with dogs, because they've heard them bark before or they've been jumped on, or something's happened, so they have already a preconceived notion about a dog. Or a cat. They may have gotten scratched. But a pig, they had absolutely no idea what to do with a pig and it was breaking new ground. And it's just a matter of getting something that captures their attention and using it to create communication opportunities, much like the iPad. When the iPad came along, it was the exact same thing. We're using this high interest item, whether it's a pig or an iPad, to capture their interest and then teach them communication with it. He's great. He's great. Sam Charrington: [00:11:34] I bring that up mostly because I have a daughter who is studying psychology and loves animals, and her goal is to do animal assisted therapy. And I can now tell her that I interviewed an animal assisted therapist and finally get her to listen to one of my podcast episodes. Wendy Chisholm: [00:11:56] Perfect. We're here to help. Sam Charrington: [00:11:58] So Matthew, how about you? Can you speak a little bit about your background and the … maybe go and take us into a little bit more detail into the iTherapy app and how it uses AI? Matthew Guggemos: [00:12:12] Absolutely. My background, first actually was as a musician. So, I am actually a drummer. I really have been interested in just how you learn skills. Drumming is … takes a long time to learn, takes a lot of study, and that's very similar to how speech and language develops. Speech and language, those are probably two of the most difficult skills people can learn. So, I was really fascinated by how do you actually learn to communicate using words. And just like Lois, I specialized with people with autism. I'm a certified autism specialist in addition to being a speech pathologist, and early on when I started working with kids with autism, one of the most difficult things about teaching them skills was to show them things that captured their interest and interest leads to learning. Because if you're not interested in something, it's hard to pay attention. It's hard to learn things you don't pay attention to. When the iPad came out, I noticed, "Wow," you could really capture people's interest with the iPad, just like Lois was saying. So, we started, the first version of Inner Voice was just using facial recognition technology to model speech. So, this is kind of similar how a musician learns. You go to a drum teacher and he shows you something, you copy it, and then you practice. And it was really interesting, how I can get tons of kids to imitate what was on the screen, but not necessarily what I would model for them. So, how we've kind of woven in the AI component using the Azure vision services, this is to me, really, it's great. We've been testing it with users now and it kind of mimics the way people learn language. For example, you look at something. Let's say you're a child. You look at something, you point to it, your parents call it … you point to an animal, your parents say, "That's a pig," and then child says, "Pig." And then they know what a pig is. So, what we've done with the vision services is that now a user, let's say a kid who's using Inner Voice, can use this feature we've called Visual Language, and he or she can take a picture of a pig, for example, and what will happen is the photo will get sent to a cloud and then that'll get paired with text. And the text appears back on the screen, and the avatar, which is from our original version, will read the text. So, they can see themselves saying the word, and then pair that word with the image and the written text. And this is something that we've developed called Multi Sensory Symbiotics. So, symbiotics is how you assign meaning to a symbol. By using this AI technology, we've been able to pair speech and language with photos and it can be interest driven. So, let's say a kid wants to know what something is in the room. They can take a picture of it and it can be labeled and spoken for him or her, and then they can imitate it and learn kind of through self motivation what things are called and how to label or describe things in their environment. Sam Charrington: [00:15:53] Wendy, I'm curious, when you think about again, this intersection of AI and accessibility, on the one hand AI presents an opportunity to allow us to, I guess, connect more directly or to provide better support for people that need it. On the other hand, there's also a risk that the people that need support get left behind or left out of the innovation that's happening with AI, and I'm curious how do you think about those two? Are there other factors that you think, like what's the perspective that you bring to supporting organizations that are trying to do work in kind of the confluence of these fields? Wendy Chisholm: [00:16:41] That's a great question. There are several different ways that we can go with that. On the one hand, the people being left behind and not included is near and dear to my heart, because as Lois talked about, technology and how people use it, it's really … I guess what I really want to focus on is that it's so important for us to, as we're building new technology, that we're really bringing everyone who's going to be using it and to get them around the table and involved in the process. And I think that's why when we see these amazing innovations, like the onscreen keyboard, that was because someone, a developer, was working with someone with a disability. And so I think one of the things I'm excited about that we're doing is we're specifically looking for projects that are firmly grounded in the community in which they're intending to benefit. And ideally, I want to be a shark tank of entrepreneurs with disabilities. So, part of that is just making sure that folks have the skills to really contribute. It also means that we're looking at data sets and we're looking at bias in data sets. And I think one of the things that we're excited about is the photographs that are taken by people who are blind or have low vision. They're not gonna be perfectly framed. So, how are they gonna … how does that affect the models and the training that's been done before? So, we want to make sure that there's a good diversity there so that folks are included. And then we get all the good innovative juicy stuff, too. And I just, I have so many great stories about that where when you really include people and bring them around the table, that you just … you get to do some really good stuff. Sam Charrington: [00:18:34] I'd love to hear some of those stories. Wendy Chisholm: [00:18:36] Yeah. And I think the other thing about it, too, is just … so, Saquib, who's one of our engineers that worked on the Seeing AI project, is still working on that. I'm not sure if you're familiar with Seeing AI. Sam Charrington: [00:18:49] Tell us about that, please. Wendy Chisholm: [00:18:50] So, it's an iPhone app right now and there's a really great demo of Sacha and one of my colleagues, Ann Taylor. And the reason I love this demo is that … so, she's blind. She's using Seeing AI. And he writes on a piece of paper, "Accessibility is awesome." So, she's able to feel where the paper is, take a picture of it. The text, the handwriting is recognized and read out loud to her. And for her, that means now that she can read cards or business cards, or letters from family and friends. I think it's really, it's empowering. And I think that's really cool. And I think when you start looking at the dream that Saquib has of when he's walking around with a friend who knows him and his preferences, his friend is able to recognize what he's interested in and maybe tell him, "This is what's changed since the last time we were here." Or give some color about what's happening in the space." But when you look at that, that's just another eyes free interface. And it's something that I think we'd all use. When I'm touristing in a new place and I don't know the language, I don't want to be looking at my phone for directions, 'cause I don't want to appear to be a tourist. And I want some of that feedback in my ear, and getting it tailored to me, that's something we can all use. So again, that's kind of what I feel the power is for us, is … and that goes kind of back to the decision making as well. In those instances, we're helping people make better decisions, 'cause we're giving them information that they didn't have before. So, it doesn't really matter if you're disabled or not, right? You want information so you can make better decisions. And that's really what this is about, I think. Sam Charrington: [00:20:37] Yeah, I love this recurring theme of the innovations that we are creating to support people with disabilities, kind of coming back full circle and impacting the way we use technology, the way everyone uses technology. Wendy Chisholm: [00:20:57] Exactly. And I mean, where I really hope the world goes is that by making sure that we're all at the table and contributing, we're not creating barriers and disability kind of disappears. Because one of my favorite quotes is, "It's the stairs that make the building inaccessible, not the wheelchair." And to me, that's the beauty of it, right? If we can really design the world such that everyone can participate, it's just not really a thing anymore. We're all benefiting. We're all benefiting from these connections with other people. Again, that's the juicy part with AI, because our devices now have so many sensors in them and can give us information that we may miss, whether we're eyes busy or we're blind. There's just so much opportunity there. Sam Charrington: [00:21:47] Absolutely. Lois, can you elaborate on the kind of experiences you've seen with the users of the Inner Voice app and the kind of impact it's had for them? Lois Brady: [00:21:59] Absolutely. We've been using this in our clinic and in school districts. And we're even branching out now into hospitals with folks who may have had strokes or head injuries. But initially it was made for video stealth modeling where you take a picture and you see yourself producing the target language or the target word, and then we added in the vision where they can take a picture, just like Wendy said. It's amazing. They can take a picture of words or a thing, and then the avatar says what that is, and across the board from … the youngest student which we have is probably around two to some of the oldest, which are in their 80s. It's amazement. Their mouths just drop and it becomes then the … I call it like an electric communication environment. Now everyone's coming over, asking about it, wanting to use it. And it just produces this place where now we want to talk and the students want to talk about what they're doing and then our students start adding in characters that they like and they make their characters talk. So, it's all about just providing these wonderful opportunities for not only the student to talk, but people to come in and say, "Oh my god, show me what you're doing. What's going on?" Where our kids quite literally never had those opportunities before, so they're leading the pack in that matter, because they do have all this wonderful AI embedded because of the Azure services. So, they're leading the way and nobody's ever seen this. So, they get to be the cool kid. And I think Wendy hit it a little bit before when she was talking about the universal design, because then I went and used it with a student who was bilingual and didn't have any English at all, and using a translator app, we were able to put in one language and speak a different language. So, the technology's just absolutely amazing and we can almost hit any kind of a challenge and overcome it right now. And it's never happened that way for us before. Technology was something that was cumbersome and hard to use, but no more. Everyone has it in the palm of their hands. And again, our kids are like leading the way at this point in time. Sam Charrington: [00:24:23] Matthew, can you share a little bit about your experience in using this … using AI as a technologist, building it into the application? What was your background with regards to AI and how did that inform the way you incorporated it into the app? Matthew Guggemos: [00:24:44] Well, i've always loved technology. I've always been someone who just loves to read about technology or learn how it works. My first and foremost background in terms of, I guess scientific background, is communication sciences. So, I looked into how AI could be used for that field. And probably the one that first caught my eye and then how Lois and I developed our visual language feature was the character recognition, describing the feature where I was talking about before, pairing images to text. So, I thought, "Wow, that is a fantastic way to help with literacy and possibly help people who maybe just want to learn a different language." It could be applied, if you have no disability at all, it could be applied to that. The other aspect was I got really fascinated by the smart bot technology that particularly Microsoft has now. So, they have a number of these services, text to speech and then the language understanding and Q&A frameworks. It's almost sort of like they're … you can see a lot of this in their Cortana app that they've released. And that, tying back to my background as a musician, you have to practice things to be good at them. And I thought, "What a great way to make practice motivational for kids or anyone by …" you can make a bot that will interact with you and you can ask it questions. You could find out information. And one of the trickiest things to teach any person with communication challenges is to initiate a communication exchange. But a bot is kind of a friendly place to start. You can say, "What's the weather? What time is it?" Or "What's a platypus? What's a potbelly pig?" And the … I had to tie that in. So, the bot can come back with an answer, and it's motivating, because being motivated really is a huge factor in communication. And AI can answer infinite questions about a subject that maybe an individual's only interested in. So, I think it's sort of a long answer to your question, but there's a host of reasons why I got into AI and particularly for communication sciences. Wendy Chisholm: [00:27:12] We're gonna see how many times we can say pig in this podcast. Sam Charrington: [00:27:19] Did you specifically work on the integration with the Azure services into the app? Or was that work that was done by other folks or via Microsoft? Matthew Guggemos: [00:27:35] Oh, so we … our role as specifically, Lois and I do this, we work on the UX/UI design first and foremost, and then we work closely with a developer who has helped us from the beginning design Inner Voice. His name is Junichi Fujita. He's an amazing guy. So, we went through some of these tutorials about how to integrate the Azure services into our existing code, because basically you're leveraging the cloud based services into our IOS code because currently Inner Voice is in IOS. So, we contracted with Junichi who did the coding for us, because I'm actually not a coder. I'm a speech pathologist. So, we'd come up with the designs to find the technology, make sure it's feasible for what we want to do, and then he is brilliant at being able to translate that pretty much exactly to our specifications. So, we've worked with him for a long time. Sam Charrington: [00:28:36] Are you aware of any challenges or impediments that you and he ran into in incorporating AI and these cognitive services into the application? Matthew Guggemos: [00:28:50] They were actually stunningly easy to integrate. The biggest problem was the UI/UX stuff. So, that one we kept going back and forth. "Well, how should it look? What's the easiest way for it to be used by users?" 'Cause we do a lot of user testing, so we don't just think, "Hey, I think people will like this." We actually designed something based on observation and interviews with people, and then we try it with them, and then they either like it or think, "Oh, this is no good." And then we go back and redesign it. So, most of the challenges were just in that aspect, but in terms of integrating the Azure services into our code, it was really easy. Wendy Chisholm: [00:29:30] Yeah, they've had an easy time, which is great. I think when I looked at some of our other grantees, I think they're gonna be pushing the limits of the technology. In particular, when we look at some of the work that Zyrobotics is doing, they're looking at speech recognition for students with nontypical speech patterns. So, they're really having to train the data and expand what it can recognize. We've got another grantee from the University of Iowa, and she's using a camera to help athletes who are blind and running around jogging tracks. So, the cool part about that one is with a jogging track, you have clear lines, in most tracks, right? Kind of indicating where the lanes are. So, once they get those recognizers working in real time, they'll be able to tell someone they're starting to veer out of their lane. Now, the problem is getting that working fast enough on the device in real time so that you're not getting it like, "Oops," a few seconds later and you've run into somebody. And these are athletes who are actively competing. And if we can get … solve some of those issues, the independence is great. Sam Charrington: [00:30:45] That's incredible. I happen to live very close to a school for the blind and I see their track all the time and they've got these … I don't know if listeners have ever seen these, but they've basically got these guide wires along the lanes- Wendy Chisholm: [00:31:01] Yeah, exactly. Sam Charrington: [00:31:01] So, they can still participate in the activity, but I can envision bringing AI to a device that the athlete can wear totally eliminates the need for a specialized setup. They could compete with other athletes. Wendy Chisholm: [00:31:25] Exactly. Sam Charrington: [00:31:26] And be on a level playing ground with the help of a model that's running in a phone. Wendy Chisholm: [00:31:31] Exactly. Yeah. So, there's gonna be some really hard challenges with that one. And I can't talk yet about this next round, but we're in the midst of interviewing our … basically what we do, we kind of are operating like a shark tank. So, after we review a bunch of the applications, we'll invite them in for a pitch meeting, just to kind of meet folks and get a sense of what they're really doing. And so I'm very excited about some of the ones we're hearing this week. This is pitch week for us. So, we'll be announcing in January, then, this kind of next round. But people, and again, now that there's some maturity to the program, and it helps that Sacha has been out there talking about it, too. We're getting some really cool applications. I think we were very lucky with iTherapy in our first round and Zyrobotics, and now we're just, it's really growing. So, we've got some fun things in the works. But yeah, and I think, too, there's just so many opportunities to use Internet of Things and just all these sensors. For my own, so I have PTSD, and so I have so many sensors kind of monitoring different aspects of heart rate and trying to predict just how I'm feeling. And I am still, have not been able to pull together a dashboard that pulls together all my calendar and health data, and mood data and all of this stuff. I think … I'm looking forward to some of what we can do. There's just so much more that we know, and there's so many patterns that we can recognize. And then if we can kind of pull that together, again, it's about the decisions that it allows us to make. So, I'm really excited about that. Sam Charrington: [00:33:17] It's funny that you mention patterns, because that's the word that was floating in my mind around this next question, and that's really about the patterns that you see as you work with organizations that are using AI in an accessibility context. And I think the broader part of the question is around … as we've started to work with accessibility in a broader computing context, we've developed specifications and guidelines that, at this point, are fairly well understood and codified. And I'm curious do you see AI for Accessibility evolving in a similar way or is there not a need for that? How do you see that evolving? Wendy Chisholm: [00:34:14] That's a really great question. My work at MIT was on some of those standards. Sam Charrington: [00:34:20] Okay. Wendy Chisholm: [00:34:21] That's right in my wheelhouse. I'm not sure about that, honestly. I do think, though, as I'm reviewing. So, part of my time, too, is giving feedback to folks who … let me put it this way, I do a lot of educating about possibilities. So, I do spend time, when people are creating plans for how to use AI in their organizations, making sure that they're considering all scenarios and not accidentally creating more barriers. So, I know there's work going on with other types of guidelines, like there's some work right now at W3C in terms of cognitive disabilities and some new suggestions on how to make websites more accessible for folks who have learning disabilities or even emotional disabilities. I am very curious to see where things go in terms of augmented reality and VR. I was looking at a manufacturing, someone talking about manufacturing the other day and how to bring AI into the floor, the manufacturing floor. And really looking at, there's opportunities here, I think, for people with disabilities to be employed in some of these new jobs, as long as people really consider how it's being designed. So for example, in a manufacturing scenario, getting feedback from a bot about how something is working, or being notified that, "Hey, we've noticed, we've detected that this run has some errors in it. We're seeing some patterns." And just making sure that the information is being presented, if it's audible. But it's also gonna be in text. What's interesting to me is that I think a lot of the potential issues I'm seeing have already been documented, and it's kind of just the same things over and over. And it doesn't surprise me, because that's kind of what I've seen in my career, right? I analyzed Java years ago and said, "Here are the things that we need to do to make sure that if you're using Java, your applications are gonna be accessible." And the concepts haven't really shifted that much. I think if you have visual information, you need to make sure you also have auditory and tactile information. And just because you never know the scenario someone's gonna be in. And again, I just, I'm gonna tie it back 'cause I really want to drive the point home that anytime you do that- Sam Charrington: [00:36:45] The pig? Wendy Chisholm: [00:36:45] Huh? Sam Charrington: [00:36:45] The pig? Wendy Chisholm: [00:36:46] Oh, the pig. Yeah. Oh my god. Matthew Guggemos: [00:36:50] That's the thematic continuity there. Wendy Chisholm: [00:36:52] How can I integrate pig into that? That's a good challenge. Because no matter what you're doing, it could be used by a pig. No, I'm kidding. You never know the scenario that someone's gonna be in. And you never know the kind of scenario, the environment that someone's in, right? So, for captions, especially on that manufacturing floor, it's gotta be really loud. So, I was actually really surprised that they were designing something that was audible without being visual, because I'm like, "Isn't it gonna be noisy?" I think everyone is gonna be experiencing hearing loss on this floor. It was just surprising to me. So yeah, I don't know that we'll have new standards, but you never know. It kind of depends on what evolves. Actually, now that I think about it, I think with data sets, I think we are gonna have to have some standards that clearly ensure that we have good diversity in all the conversations going on about bias. That's a big part of it right there, is just making sure that we aren't accidentally continuing to discriminate against people with disabilities, 'cause unfortunately, that is quite a reality, especially when you look at … yeah. Well, I'll just say that. And that's part of the culture change I think AI can really help with, is ensuring that we're supporting a culture of more diversity. Sam Charrington: [00:38:10] I think one of the things that's most exciting about all the work that's happening in so many places about AI for social good and various aspects of it is just how intersectional it is, the issues, the folks that are working on AI ethics and AI bias, and now the work that we're discussing here around accessibility, it ties in in so many ways and I guess what's kind of bringing that thought to mind is just the … I have lots of conversations about bias and bias in data sets, bias in AI systems, and to think about how important that is in this context, and then how as we overcome those issues and create new technologies here, how that feeds into the technologies that we all benefit from. I can't help but think it creates exciting opportunities for folks that want to kind of jump into this field. Wendy Chisholm: [00:39:11] Well, that's the thing and I think that's, when I've been talking … so, some of our applicants aren't as familiar with disabilities. And that's great. They're familiar with AI and machine learning and they can see, "Oh, this is how I can get the data you're looking for." And that's really a very cool thing, is that if we can bring together people who are looking for ways that their work in AI and machine learning can impact humans all over the planet, I think that's a very exciting opportunity, where we can really start making those partnerships and bringing people together, kind of matchmaking. Like, "Hey, we see over here that this community has this need. Is there anyone out there doing something similar and we can pair you together, and then you can test this out and continue to evolve your work in a way that's really gonna be impactful for somebody." And one of the things we keep saying is we're funding projects that are developed with or by people with disabilities. And again, that's because when it's grounded in reality, then you know you get something good. Lois and Matthew talk about how much their time is around the UX, and I think that's so important. That's where the really good stuff comes from, is when you're really looking at how people use this and how they are gonna integrate it into their daily lives, what it allows people to do. Sam Charrington: [00:40:35] Lois, when you look forward, what do you see in terms of incorporating AI more deeply in what iTherapy's doing? Lois Brady: [00:40:48] Well, we have plans probably up and through the next 10 years to continue using this. We have great plans. It's made a big difference with a lot of our students, however one of the main pain points that they have is by the time they come up with something to say or they can't, or join a conversation, the conversation's done. It's fluency. It's the rate that they speak. And that's across the board and across abilities and ages, is that if you're using a device to speak and not using your own voice, it takes a lot longer, so that is probably the gap I would love to bridge is that someone who cannot use their natural voice can jump into a conversation and speak like anyone else. That's gonna be difficult and that's absolutely gonna be AI. Currently, it takes so long for people to either generate a sentence or search for the word they want to say that most folks check out of the conversation and our folks don't really get to have one on one conversations, unless they're pre made and scripted. So that's, I think my ultimate, ultimate goal is folks who have challenges speaking can speak like just about anyone else. Wendy Chisholm: [00:42:12] Yeah, that's a really beautiful moonshot. There's a lot of these things in AI where we look for that real time, with the jogging track, that real time feedback. And hear real time speech generation. And we see that in a lot of scenarios and I think that's … yeah, I agree. That's the vision. That's a good vision. Sam Charrington: [00:42:32] And Matthew, what are you most excited about from an AI perspective? Matthew Guggemos: [00:42:39] Well, we are looking into coming up with some diagnostic tools, which I think are going to be really cool. I think they … we're hoping to come up with, we're just in the preliminary pieces of this now. But using AI to analyze data through possibly Q&A framework, and to help people create solutions for communication delays or disorders. So basically, one of the ways that we, as speech pathologists, evaluate people now, it's through kind of paper and pencil test. Then you have to score it and then it just becomes kind of cumbersome. So, what we're hoping is we're taking a look at some kind of far out ideas. In fact, I read this article in Wired magazine about a guy named Karl Friston who's into this concept called Free Energy. So, we're thinking of some ways to take that concept and apply to diagnosing communication disorder, so I think that's pretty exciting. So, that should be in the next few years. That's my, I guess, goal for the future, is to develop that, among other things. Sam Charrington: [00:44:00] Nice. Nice. So Wendy, I gather that by the time folks get an opportunity to listen to this podcast, you'll be just in the midst of finalizing your next round of grantees. Wendy Chisholm: [00:44:15] Yeah. Sam Charrington: [00:44:16] For folks that hear this and are interested in working with this program, what does the process tend to look like and when should they start looking for the next round to open up? Wendy Chisholm: [00:44:28] We accept applications any time. Sam Charrington: [00:44:30] Oh, really? Wendy Chisholm: [00:44:30] Yeah. Sam Charrington: [00:44:30] Okay. Wendy Chisholm: [00:44:31] We're constantly kind of reviewing them. And the link to apply- Sam Charrington: [00:44:37] Feel free. Wendy Chisholm: [00:44:38] … is really easy. It's AKA.ms/grant. Super easy. So, AKA.ms/grant. Yeah. You pull together an application and we'll review that. We're really looking for projects that are gonna elevate the industry. We're not as interested in funding just someone to develop an application. We want someone who is going to contribute something back, because for us, that's really how we're gonna raise the water and bring up the boat, you know? So, that's part of what we're looking at is either someone willing to contribute a data set or a model, or some other learning, whether research paper or something like that. And obviously someone who's, whether by themselves or through partnership or grounded in community, and that it's something that'll be feasible to accomplish in a year. So, while we really are looking for projects that will go beyond a year and they're something that can be built on, the grants are year long grants. So, there needs to be some deliverable in that year. But yeah, and we encourage anyone to apply. I mean, literally anyone from anywhere. We've done a push in Latin America. We're gonna be reaching out to Asia soon. We do have grantees in different regions. We have a few folks in Europe, one in India, and we're really looking to expand that, because … especially I really encourage folks in Asia and Latin America, because I recently learned, and just to spend a moment on Latin America, that the unemployment rate in general for people with disabilities is usually twice that of people without, and the average age is 30 years old in Latin America. And to me, that's prime employment age. So, we really want to shift those employment numbers. To that point, we really are looking for applications and projects that are gonna have an impact in our focus areas, which are employment, daily life and communication, and connection. For example, Lois and Matthew, this is communication and connection. The jogging track, that's daily life, because that's out and about, being independent. And then employment is another one we've got from Vanderbilt is a good example, where they're building a bot that can help someone who has autism and is interested in practicing for job interviews and stuff like that. It does some really cool modification of the interview to help someone practice. So, yeah. I encourage anybody. And NGO, non profit, individuals, companies. As long as you meet some of those criteria, go for it. Sam Charrington: [00:47:33] And are they fixed sized grants? Or what's the range? Wendy Chisholm: [00:47:37] They're not. Yeah, so right now we have a couple categories. We've got one set of grants where it's just called an Azure Credits. And so we're just giving folks credits to just play and learn and see what they come up with, make some progress on their idea. The other one is kind of an Azure Credits plus, and in that one, that's the category that Matthew and Lois are in, is we have a community, we have a lot of support in terms of education and we've got on staff folks who can help with technical questions, integration with Azure. We've got great connections with cognitive services, Microsoft research. So, we can really support folks in their development and give them cash for engineering costs or data acquisition, data cleaning, data labeling. Kind of some of those pieces just needed to build. And right now, we're not really saying much about how much it is. We're new. We're kind of experimenting with how much we give people and what results we get. So, yeah. We'll see how it goes. Sam Charrington: [00:48:46] Do you by any chance have a wishlist of ideas that you'd love to fund? Wendy Chisholm: [00:48:53] I do. Yes. We have several moonshots that I'd love to see. I want to see a self driving wheelchair. I want to see a dashboard for people with PTSD. Like I said, I want that dashboard for myself, that pulls together all the data that I have and really learns and can start recommending and predicting. I want Saquib to get his digital assistant so that he can be out and about as he's traveling and get information to make decisions. I'd love for my friend who is deaf, who recently went in for a surgery and her sign language interpreter was late, so she had to lip read and it was very scary. She didn't have all the information. But if she had had kind of a back up interpreter that could help her in that situation. And you know, the doctor couldn't wait, you know. So, I think those are some of the things that we're looking for. Yeah. And then I'm just curious to learn what other ideas people have out there. Like I said, there were some things I heard this week I hadn't even thought were needed. And now I really want to fund them. Sam Charrington: [00:50:00] Fantastic, fantastic. Well, Wendy, Lois, and Matthew, thank you so much for taking the time to chat with us about AI for Accessibility. Wendy Chisholm: [00:50:11] Thank you. Sam Charrington: [00:50:13] Lois and Matthew in particular, congratulations for what you're doing. It sounds like a great application with a lot of impact and Wendy, great program. Thank you. Wendy Chisholm: [00:50:23] Thank you. Yeah, thanks. Matthew Guggemos: [00:50:24] Thanks so much. Lois Brady: [00:50:25] Thank you, Sam.
Talk 226 – MSFT AI for Humanitarian Action Interview Transcript Sam Charrington: [00:00:00] Today’s episode kicks off a series of shows on the topic of AI for the benefit of society that we’re excited to have partnered with Microsoft to produce. In this show, we’re joined by Justin Spelhaug, General Manager of Technology for Social Impact at Microsoft. And our conversation, Justin and I discussed the company’s efforts in AI for Humanitarian Action, a program which extends grants to fund AI-powered projects focused on disaster response, the needs of children, protecting refugees and promoting respect for human rights. Justin and I cover Microsoft overall approach to technology for social impact, how his group helps mission-driven organizations best leverage technologies like AI, and how AI is being used at places like the World Bank, Operation Smile and Mission Measurement to create greater impact. Before we dive into the show, I’d like to thank Microsoft for their support and their sponsorship of this series. Microsoft is committed to ensuring the responsible development and use of AI and is empowering people around the world with intelligent technology to help solve previously intractable social challenges, spanning sustainability, accessibility and humanitarian action. Learn more about their plan at Microsoft.ai. Enjoy. Sam Charrington: [00:01:51] All right, everyone. I am here with Justin Spelhaug. Justin is the General Manager of Technology for Social Impact at Microsoft. Justin, welcome to this week in machine learning and AI. Justin Spelhaug: [00:02:01] It's great to be here. Thanks for having me. Sam Charrington: [00:02:03] Absolutely, absolutely. I am really looking forward to jumping in to this particular topic with you, discussing what Microsoft is up to as far as AI for Humanitarian Action. But before we do that, share a little bit about your background and how you got to working in this field. Justin Spelhaug: [00:02:21] Well I have 21 years at a single company, that may be a record these days, at Microsoft. Wonderful company, and I've done many different things. I've worked in Microsoft research, I've worked in Asia in our field, I've worked in operations. But all throughout my career, I've been the chief social business advocate, agitato, depending on who you are and pushing, always, to find a way to create that nexus between technology, scalable business models, and social impact in the private sector context. And so I'm very pleased to be doing what I'm doing today. Born and raised in Washington. A United States Marine, proudly served. University of Washington for the Huskies out there listening to this. And just real glad to be here. Sam Charrington: [00:03:10] Fantastic. Technology for Social Impact, you're a General Manager, that means that's a business unit of sorts or a business unit period. What is that organization all about at Microsoft? Justin Spelhaug: [00:03:22] So this business unit uniquely sits within Microsoft Philanthropies. And the focus is to help mission-driven organizations like nonprofits and the United Nations, The World Bank, use technology to advance their missions and to advance their impact. What's a little unique about our group in the industry are three key things. First is that we bring together philanthropies, technology and sales, commercial models, to support these mission organizations, nonprofits, on an end-to-end basis for the full spectrum of their needs. Second, we've got a pretty deep innovation strategy, building new technologies in the area of AI, which we'll talk about, but also in the area of dynamics, collaboration, Azure Core Services for nonprofits. And third, we're a social business, so all of the profit we generate, all the incremental profit this team generates is reinvested back into more impact, more services and support for nonprofits, more cash grants and donations to our philanthropies team. Sam Charrington: [00:04:27] And so, artificial intelligence, how long has the organization been working with its constituent customer organizations on AI related work? Justin Spelhaug: [00:04:38] As you know, at Microsoft, we've been working on AI since 1991 or longer. It's always been in Bill's mind as we set up Microsoft research. Our AI for Good Initiatives, so the social facing side of this, was launched about 18 months. And the first initiative there was AI for Earth, followed by AI for Accessibility. And then most recently launched last September at the United Nations General Assembly was AI for Humanitarian Action. Combined, that's a 115 million dollar investment that Microsoft is making across all three of those pillars over the next five years to try to drive impact. Sam Charrington: [00:05:21] And so, why do you think AI is important for these types of organizations? Justin Spelhaug: [00:05:26] I think it's important, first, if you'll indulge me to step back on what's the broader context of the social issues that the world is facing today. We'll detour there for one moment and then I'll come back to that question more precisely. Today, in the last 20 years, we've seen nearly a billion people move out of poverty. The international poverty line is defined as a $1.90 per day, and those billion people came largely in China, largely in India on the back of some really great economic success, and that's pretty amazing. But still, there's 800 million people that live under $1.90 per day. There's 124 million of those people that have severe food insecurity, life threatening food insecurity each and every day. And those issues combined with some really challenging geopolitical issues have created the biggest crisis in refugees and internally displaced people that we've seen since World War 2, with 68 million. I know you're close to those issues. 68 million people today without a place to call home, displaced. And when you step back and say, "What's the world's answer to these problems?" We look to the sustainable development goals. These are the things that were defined by the world and the United Nations to really paint the picture that we want to see by 2030. Whether it's equality or eliminating poverty, justice in rights, all of these issues are a part of the sustainable development goals. The problem is this, it costs between five and seven trillion dollars a year to fund these goals, and we've got about a 2.5 trillion dollar shortfall in funding these goals. And so we're not making progress in addressing these humanitarian disasters that are unfolding right in front of us today, the environmental disasters that are unfolding right in front of us today, and government is part of the solution, but they can't stretch to cover that full 2.5 trillion dollars. Business has to step up. And while we don't have all of the answers at Microsoft, we're trying to shift from a world of corporate social responsibility that maybe more narrowly defined to a world of total social impact. We're building at the core of our business model, the ability to innovate, serve nonprofits, and create new innovations like AI for Humanitarian Action that get right at these issues and help organizations unlock challenges. So let me provide one example and then we can dive more deeply into how AI is applied in this specific context that I'm talking about. Going back to that food crisis and hunger crisis, The World Bank is chartered with providing funding to communities that have food insecurity and famine issues. We're working with The World Bank to use our AI models, AI engines and tuning their models, bringing together environmental data, socioeconomic data, population data, geopolitical data to help predict food insecurity at a community level and determine which communities are most at risk to reaching a food insecure state or even reaching full-blown famine before it happens. If we can do that, if The World Bank can do that more precisely, they're able to release funding ahead of a disaster and save some lives. And that's how the power of technology, leveraging our resources within Azure Cognitive Services, our resources from Microsoft research and combining that with World Bank teams to solve this problem, can have a powerful and transformative impact on these kinds of issues. Sam Charrington: [00:09:31] Do you have a sense for what an engagement like that looks like? Who are you engaging with at The World Bank? What types of information sources are you using to help create the AI that helps drive these predictions? Justin Spelhaug: [00:09:47] The World Bank is getting ever more sophisticated in how they're engaging the private sector in these kinds of ways and there are some roles in The World Bank precisely chartered to work with Microsoft and other private sector companies and figure out how we get the most out of their resources. So that's the entry point where they frame up a problem and they ask, "How can we help solve that problem? And are we willing to not only provide some free computing capacity, but more importantly, provide the data scientists necessary to help improve the quality and the predictive accuracy of the models?" We put a proposal together, get that proposal on the table and then form a strategic partnership. That process, just to be clear, takes some time to really map out, what is the problem? What resources do they have? How can we augment those resources in a way that's most effective? And this particular case, The World Bank is asking for help with model tuning, affected in their ML models, to improve the predictive accuracy of the data they already have and the models that they already have, they just don't have the accuracy where they want it to be yet. And so that's how we're engaging, and so we're bringing to bear resources that are experts in that area to supplement The World Bank team and help them get that over the goal line. One other thing that's important to note about this particular project, and is unique in the industry, is that The World Bank is not only pulling from Microsoft, but they're also asking Google to step up, AWS to step up, and it's an interesting collaboration, in fact, between all three companies as we work to help them solve this problem. These problems are bigger than any one brand or any one company and we have to learn how to collaborate effectively together with organizations like The World Bank to solve problems that, frankly, impact human lives. Sam Charrington: [00:11:49] Maybe taking a step back from this particular application, is there a way that you categorize the various types of applications that you get involved in under the banner of humanitarian action? Justin Spelhaug: [00:12:01] Yeah. In the year of humanitarian action, we've identified four core scenarios that we want to focus on. The first is disaster response. Essentially helping forecast disaster before it happens. That was the example I provided, but also helping first responders respond more precisely and effectively when disasters occur. The second is needs of children and there's a range of issues we're focused on there from healthcare to protection and support. Third is protecting refugees and helping organizations that are serving refugees, like the Norwegian Refugee Council or the Danish Refugee Council or the UN scale their services more effectively. And then the fourth is promoting respect for human rights and we're very involved with topics like disrupting human trafficking, which your listeners may know is one of the largest criminal industries in the world today, and is a real challenge. All four of those areas we have examples we can talk about. Sam Charrington: [00:13:09] Let's do that. The refugee one is of particular interest to me since 2010 I've sat on a board of a local organization in St. Louis, the International Institute, which is one of 100 or so affiliates of the U.S. Committee for Refugees and Immigrants. And is one of the organizations that assists in the resettling of refugees into the St. Louis area. And as a child of an immigrant, it's always been a passion point for me. What are some of the applications of AI to addressing the issues that are faced by refugees around the world, really? Justin Spelhaug: [00:13:51] I'll give you two examples. One that's kind of live and one that we're working on. The Norwegian Refugee Council provides many things for refugees, but one of the things they provide is legal services. If we pick a geography like Iraq where we're dealing with refugees and displaced people, there are four million people in that country and the adjacent areas that need access to legal services. Now, these legal services, for your listeners, are pretty basic things. It's about getting identity for their newborn child, filing a death certificate, getting access to benefits that they're not getting access to through legal channels. But the Norwegian Refugee Council only has so many lawyers, and so the technology that we're building with them is a chat bot based platform that allows for a more intelligent interaction with the refugees so that we can direct them more precisely to the specific legal services that they need within NRC, allowing their lawyers to scale more effectively. Prior to the chat bot technology, the lawyers were having to triage each individual case and then route it to the correct specialist. With our technology, we're able and working towards being able to do that in a much more precise way, reducing the lead time that it takes to render those legal services and allowing these lawyers on the NRC side to scale up. With another organization, we're working on actually child identification for reunification. And so, we talk a lot about AI and ethics and some of the challenges that we face around facial recognition, for sure. And those are real challenges. But in this case, we're using facial recognition to really drive a positive outcome where we're able, through machine vision and image matching, to measure the symmetry of a child's face and match that to a database of potential parents. That child may have been dislocated as there was a movement within the camp for one reason or another, they can't find their parent in these camps and resettlements and they're using technology to match-make that child back to that parent and drive reunification more efficiently. Sam Charrington: [00:16:18] The legal example that you gave sounds very similar to the kinds of things we see on the commercial side with organizations using chat bots to provide support. Justin Spelhaug: [00:16:27] That's right. Sam Charrington: [00:16:28] Are there particular challenges with this particular application of chat bot technology, of these kinds of technologies, that are unique to the specific use case? Justin Spelhaug: [00:16:43] I don't know so much as challenges as there are a lot of demand. The one thing a nonprofit is constrained on is resource. They're typically not constrained on the demand for their service given the magnitude of the challenges they're working. We're working with another organization, as an example, on an emergency services chat bot, so call it 2-1-1 chat bot. That allows this organization to better direct their beneficiaries to the right emergency services in time of disaster, and prior to the implementation of this technology … And this one is actually in flight right now so it's not fully implemented. Their 2-1-1 lines were just getting overwhelmed. And so I think we're seeing many different applications of chat bot technology that allows nonprofits to scale their services and more precisely, it's all about matching, more precisely matching the demand in the moment to the expert that can solve it without that intermediary triage process that nonprofits can't afford to do. And so there's a real efficiency gain there. Sam Charrington: [00:17:53] Does the typical nonprofit have the technical sophistication to be able to absorb the solutions that you're proposing for them? Or does that create an ongoing challenge for them, because these AI systems, they need maintenance- Justin Spelhaug: [00:18:12] They do. Sam Charrington: [00:18:13] … you can't just throw them out there and they'll run forever without fine tuning and ongoing maintenance. What kinds of challenges do you see there in delivering this kind of technology to nonprofits? Justin Spelhaug: [00:18:29] Great question. Nonprofit is a tax code and so, you have this absolutely enormous spectrum of organizations. There are about four million nonprofits, we believe. 99 percent of those organizations are less than 50 people. Half of those organizations have anybody in IT or any formal IT funding. And so, there is very limited capacity in the ecosystem overall, kind of number one. So then how do we think about AI and Humanitarian Action? Well, one of the qualifying criteria for applying for a grant and leveraging the services that we're providing is that you do have resources on the other end, that not only can maintain the model, but can work with us to build and organize the model. You have the data and the data is also available because it can't do much without the data. And so an organization needs to be ready to use these kinds of tools. We do envision a world though … I should mention every one of these tools, like the 2-1-1 chat bot capability or the NRC chat bot capability that I talked about, we're abstracting and putting those patterns and practices and code on GitHub. So any other organization, nonprofit organization, that may be of the same or similar scenario can access that, can download it, and can start using it. We're making investments with nonprofits to build their technology capacity. We imagine a world where in the future, nonprofits are able to go to GitHub, they're able to get training, and they're able to start using these technologies, but we've got a long way to go. There's only a fraction of the organizations today that have the capacity to really put this into practice. Sam Charrington: [00:20:19] The talent aspect of this is a huge challenge. It's often difficult for large enterprises to compete with the Microsoft's of the world, the Googles of the world for talent or the Silicon Valley companies for talent. And it's even harder for the typical nonprofit. Justin Spelhaug: [00:20:39] Absolutely. Sam Charrington: [00:20:40] And so, I'm curious, you mentioned education, is that a big part of your charter, to provide educational resources to these types of organizations? Justin Spelhaug: [00:20:52] It is. In fact, for anybody in the nonprofit community listening, I think it'd be fair to say most people work in a nonprofit environment because they are mission driven and they believe in the mission. And that's why you get paid less often and still work 60, 70, 80 hours a week, because you're passionate about refugees or the environment or health or child protection. And part of my charter is building capacity in these nonprofit organizations. And actually building capacity with their beneficiaries. We're working on a number of different programs using Imagine Academy and other content we have and creating some new offerings that will help do that for nonprofits in a very affordable way. And we've made investments with organizations like UNICEF, as an example that, as you know, has a mission around child protection to create platforms that will deliver digital skills as well as a broad range of educational experiences for 75 million children around the world who are on the move. Children that are migrants, internally displaced refugees. So we're focused actually at both ends, to build capacity at both ends through our philanthropic tool … philanthropic programs, rather. Sam Charrington: [00:22:09] So we talked through some examples on the refugee side. On the disaster recovery side, what have you seen there? Justin Spelhaug: [00:22:18] I highlighted two. One was that World Bank famine prediction tool. Another is helping organizations respond to demand. That's the 2-1-1 platform that we talked about. But we're also helping organizations organize their volunteers more effectively. So there's one organization here in the United States that mobilizes a whole lot of volunteers during disaster time, particularly when earthquakes strike or hurricanes hit. And you need to know what your volunteers are certified on. What equipment can they use? Can they use chainsaws? Can they use forklifts? Can they drive bulldozers to help clear the debris, to help rebuild? We're using OCR technology to automate the assessment and identification of certifications for their volunteer base that extends well beyond 80,000 volunteers all around the nation. So they know precisely who has which skill and can be deployed to which location. We're also working with an organization that's working on an open mapping platform. Now this open mapping platform is used during times of disasters to help first responders pinpoint where to focus. But as you know, the map before a disaster is very different than a map after a disaster. And we're using AI and some ML models to better ingest images and assess buildings and building damage to figure out what's changed pre-disaster to post-disaster most significantly, relative to building structure. And how should first responders plan their engagements and their interventions based on where we see the most damage. So that's another example of technology that we're building there. Sam Charrington: [00:24:15] Can you elaborate on that one? Is that based on vision or- Justin Spelhaug: [00:24:18] It is, yeah. Sam Charrington: [00:24:19] … satellite imagery or what are the data sources there? Justin Spelhaug: [00:24:23] Let me explain the process here. This organization uses volunteers, and I'm saying "this organization" because we're not public yet, so that's why whenever I say that, it's because it's not a public case yet. Uses volunteers to take new satellite imagery and trace that onto the map. They have local volunteers that add additional details to the map. Neighborhoods, street names, buildings, evacuation centers, and then humanitarian organizations use that in response. Now we're adding machine vision into that equation to automate and expedite the damage assessment on buildings, as an example, in that process. And so that first responders have this continually updating feed of mapping information that's … Some of it's being generated by AI using machine vision to access buildings, some of it's being generated by local volunteers that are drawing a street or a street no longer exists and says, "This is where the school was." And that allows these first responders to pinpoint damage much more effectively and in real time to deploy those resources in the right way. Sam Charrington: [00:25:32] It makes me think of just the use of maps. We all in navigating our daily lives now use maps constantly and in a situation where a disaster has occurred, those maps aren't really useful. So in addition to assessing building damage, just the ability to effectively route in an environment like that has to be a challenge. I don't know if that's part of the initiative that you're working on with this organization- Justin Spelhaug: [00:25:59] Well- Sam Charrington: [00:25:59] … but- Justin Spelhaug: [00:26:03] Resource optimization as a topic is … That is where AI in many ways … 20 years ago, my first AI project was supply chain optimization using a platform, and that is a core and critical issue for organization. To give you an example of how we're applying that technology, we're actually applying that technology in supporting refugees. And this is a public case, this is with the Danish Refugee Council who's deploying Dynamics 365 finance and operations. So they've deployed that module and we've deployed that module so that we're able to optimize how they're delivering aid, food, water, wash, basic shelter. Because if they can get that aid delivered on time and accurately around the camps that they're managing, and believe me it's a tough problem because demands are always changing, supply positions are always changing, and if you don't get it right, lives are at stake. They've got to get that right, and when they do, they can focus on providing higher level services, counseling, job creation, business creation, once they get those basic needs taken care of. So building AI, as you know, AI's not just a thing that's Azure Cognitive Services, it's woven into the fabric of all of our platforms at Microsoft, including Dynamics. And back to your question of, "Well, how can nonprofits really take advantage of AI?" I'm highlighting some really specialized use cases here, but if you think about AI built into our Microsoft 365 platforms, or Dynamics platforms in this particular example, it becomes a much more palatable engine to leverage for nonprofits in the way that they're running their missions and optimizing their distribution of supply, in this case. Sam Charrington: [00:27:55] So you've already mentioned the Danish organization working with refugees, a Norwegian organization working with refugees and I think in both cases, they were managing camps. Do you happen to have any stats on the number of camps, the number of people in camps? Just the scope of the challenge on a global basis? Justin Spelhaug: [00:28:16] I don't have the number of camps. I was just in a camp in Kakuma, a northeast corner of Kenya. We flew in to really get on the ground and understand the issues of the camp. You get on the ground and there's 20 … In Kakuma, there are between 28 and 32 different NGOs all surveying this camp. Now, Kakuma was 150,000 people in this camp, and it was built for a capacity of about 100,000. So it was bursting at the seams. And the opportunity that we saw there was how do we start to help these organizations think about a common data model and a common set of platforms, Microsoft or non Microsoft, that can interoperate so that when their serving refugees, they're able to coordinate better? So Amid, who may be a refugee there in Kakuma, we're able to understand all of the services and all of the organizations that are supporting him and which interventions are helping him get access to economic opportunity and a better livelihood. Back to your question though, I mentioned that there are 68 million displaced people and refugees. We can fact check this, but it's about 28 million refugees and 40 million internally displaced people. Most of them are not in camps. Most of them are in urban environments and in cities. And that makes the challenge that much harder for these refugee aid organizations like the Danish Refugee Council or the Norwegian Refugee Council to support them, but the services that we're building, these chat bot services, the predictive services that we're building will operate within the confines of a camp or operate also in an urban environment. Sam Charrington: [00:30:00] So we've talked about two of the four already. Why don't we talk a little bit about the needs of children work you're doing? Justin Spelhaug: [00:30:10] Perfect. Yeah, this is a really interesting category of work that we're doing, and it does intersect with the work we're doing on human rights as well. So you'll see that bleed over. One area of work we're doing is with an organization called Operation Smile. Operation Smile works with children in low income markets and communities around the world and provides cleft palate and cleft lip surgery. For the listeners that may not know what that is, that is a facial deformity that children are born with that impacts their ability, in fact, to latch onto their mother and get nutrients. It impacts medical conditions like hearing. Creates, obviously, dental issues, but it also creates a massive social stigma for these children. The challenge that Operation Smile has is their mobilizing plastic surgeons from all over the world, from Johannesburg to LA, to provide surgeries, life-changing surgeries for these children. But these surgeons are operating in hospitals that they're not accustomed to. They're operating with equipment that they may not be using every day, and they're doing surgeries that are unique for them, often. And so they need feedback and they need feedback in real time. The old process that Operation Smile had was to take a picture of the child before surgery, take a picture after surgery, send it to an evaluator, and a month later the doctor got feedback. The new process that we've implemented is we're able to take a picture before the surgery, take a picture after the surgery, and right in that moment, we're using machine vision to actually score the severity prior to surgery of that cleft palate or cleft lip, how severe is it? We score the quality of the post-surgical photo and we're analyzing a whole bunch of dimensions, looking at facial symmetry, how clean the cleft palate is now connected, and there's a score that's provided post-surgery. That's brought into a database in a Power BI view where we're able to see the scores of all other surgeons performing a similar procedure. Does two things, provides that surgeon feedback in real time on how they're doing, but also allows them to match make a mentor that may be doing a better job on that particular surgical procedure, get on the phone before their next surgery, which may be scheduled in 45 minutes, get some tips on the technique, and then go back into that theater and improve that next child's outcome right then and right there. So that's a pretty cool case. Sam Charrington: [00:33:01] It is, and taking a step back, we've kind of already honed in on a couple of, I don't know, use case areas for AI. We talked about this resource optimization in the sense of mapping or making these organizations more internally efficient. And this use case as well as the volunteer onboarding one kind of speak to the ability to use AI to better connect people within an organization or better allow these organizations to better make use of their human resources. I guess in this case, the volunteers may or may … the surgeons may or may not be working on a volunteer basis. Justin Spelhaug: [00:33:48] They're all volunteers. Sam Charrington: [00:33:49] They're all volunteers? Justin Spelhaug: [00:33:49] Yeah. Sam Charrington: [00:33:50] So it's another volunteer management type of an application, and it also has echoes onto the enterprise. Justin Spelhaug: [00:33:59] It does. Sam Charrington: [00:34:01] I'm really just making an observation on what I'm hearing. Justin Spelhaug: [00:34:04] No, you're helping me with our strategy, in fact. I love it. I love it. And maybe there's a third use scenario here, which is the next one in terms of the needs of children. And this is about deep learning. This is about using Bayesian networks and deep learning to try to identify patterns that we couldn't previously see. Infant mortality is still a major, major, major issue in both developed markets and developing markets, more in developing. And SIDS, sudden infant death syndrome, is one of the number one drivers of infant mortality. It's been around forever and it's incredible how little we know about what's driving SIDS. With a partnership- Sam Charrington: [00:34:51] What's driving its prevalence or what's the underlying cause? Justin Spelhaug: [00:34:54] What's the underlying cause. Well, both, actually. What are the risk factors that create a propensity for SIDS? And then, when we identify those risk factors, what can we do differently with that child in order to minimize the potential outcomes? So in partnership with the Seattle Children's Hospital, our data science team, led by John Kahan, who's very passionate about this topic. In fact, he lost a son to SIDS. Pulled a massive database that existed and it turns out that we have a record for every child born in the United States that dates back to … I'll get the date wrong, but it's somewhere in the 80s. And that record has a whole bunch of attributes in terms of child's weight, height, has a wide range of attributes, and those attributes were never really fully analyzed to figure out what was the causation and correlation between all these different attributes and sudden infant death outcomes. The team ingested all of this data into Azure and used our machine learning tools and in particular, one particular Bayesian network technique to analyze the data and to start to understand the correlative factors. Through this analysis, they're able to get a whole bunch of insights and I think it'll be worth bringing John on the show to talk about the depth of the insights, but able to pinpoint down to the cigarette how it increases the risk factor for a child, an individual cigarette, increases the risk factor for a child relative to SIDS. This body of work has also led to new innovations that they're pursuing. We know that a child with SIDS is at much higher risk when it lays on its tummy. But when you're a parent, you're tired, you haven't slept for days, you're out of the room, how do you know if your child's on their stomach? Well, John's looking into machine vision technology that's able to recognize whether a child's on their back or their stomach and provide an alert to the parents so they can flip the child back over. And so, a couple of really neat use cases and ongoing analysis. This analysis has just started and the fruit has just started to be born from the work. That will hopefully help really provide effective remedies to SIDS long range. So those are two examples of the work that we're doing in needs of children and maybe another scenario here that we're talking about. Sam Charrington: [00:37:43] I'm curious about that last one. I've talked to other folks working in clinical environments or with data captured at hospitals and it's historically very difficult to get access to for a variety of concerns, most obvious being privacy concerns. I'm wondering if you have any additional insights or context as to what the process was like in the context of this project? Justin Spelhaug: [00:38:12] I think John would be probably the best to answer that, but the research was conducted with a partner, Seattle Children's Hospital. I think a lot of it was executed through the partner and we provided the AI capabilities on the backend, but John would be able to provide- Sam Charrington: [00:38:25] Got it. Justin Spelhaug: [00:38:25] … every detail on that topic. Sam Charrington: [00:38:30] I'm sure. Okay, so human rights. Justin Spelhaug: [00:38:33] Yeah. So human rights, this is a wide topic. To give you a sense of what we're doing there, I'll give you three quick examples. Social sentiment analysis is something that we use in marketing each and every day. Understanding feeds from the news, understanding feeds from social networks and understanding what they're telling us about our company. But what are they telling us about our cause? If our cause is to understand capital punishment and where capital punishment is occurring around the world and whether there's been due process around capital punishment, sometimes you need to fine tune those social sensing tools. So we're working with one organization, that's their core business model, is advocacy for human rights and giving them a platform to analyze all of these data feeds and to leverage a keyword model, like capital punishment, death, execution and many other words that we use, to start to understand what is the frequency that we're seeing around the world. In countries that have historically under reported it, well, guess what? It's typically in the news or it's in the social blogosphere. And what is the sentiment in that country and how can they better shape their advocacy? So that's one example of what we're doing in human rights. Another example is we're working with an organization to analyze Syrian war crimes videos and there we're using machine vision to identify and match known war criminals to crimes that are captured through this footage. And then the third, and maybe more intricate example here is the work we're doing on human trafficking. And human trafficking is a massive, massive criminal industry and problem. And particularly, sex trafficking is. It's something that doesn't seem to get talked about enough given how pervasive the challenge is. Here we're using technology to both disrupt the demand for sex work as well as the supply. And the way we're doing that, in partnership with organizations like Seattle Against Slavery, and other organizations, we're using chat bot technology on the demand side of this challenge to engage would-be buyers. And our chat bot poses as a sex worker, engages that would-be buyer in a dialogue, and ultimately based on that conversation, will direct that person to resources, but also let that person know that they've been engaged by an anti sex trafficking organization. So that's on the demand side. On the supply side, typically these girls and boys who are in this industry are posting their services in a wide range of places on the web and on the dark web. Well, we're building tools that allow organizations like Seattle Against Slavery to scrape and find these individuals. Typically these individuals will have multiple phone numbers and they'll encode those phone numbers with different hashtags and symbols so that they're not easily machine readable. Well, we're able to crack that and decode that, and that provides social workers the … We created a platform that provides social workers the ability to blast messages out to this community of sex workers to let them know there's resources. To let them know that there's somebody on the other end of the line who will help provide them support. And from those engagements, which are typically done over text messaging, set up an intervention meeting where they can help these folks get the services they need to get out of the sex trafficking industry, get the protective services they need, and change the course of their life. So an interesting blend of technologies there used by organizations. Sam Charrington: [00:43:07] Yeah, based on my experience with working with nonprofits, one of the things they are very good at is assessment. Assessment of their own programs and projects and their ability to actually make impact on the communities that they serve. Assessment and evaluation is also critical for AI based systems. I'm curious whether in your experience there are unique challenges associated with the ongoing assessment of the effectiveness of AI in these contexts? Or are the evaluation challenges the same as you might find in enterprise context or other places, or are there … Do you need to work with these organizations to connect the way they measure the success of their programs with the AI tooling? Justin Spelhaug: [00:44:00] Yeah. Just to back up for one second, I think many organizations still struggle with connecting their activity to the end outcome. For those refugees we've been talking about during this conversation, we want to improve their livelihood, we want them to have jobs, we want them to have a productive life and a productive family. But oftentimes, organizations are stuck measuring how many aid packages did we deliver to this location? Not, did we transform somebody's life? We just kind of make that assumptive step. I think there's two things that we're actually doing in that area to close that gap. One is, in our Dynamics 365 platform, we've been making some really significant investments in that platform for nonprofits. And it's important, when you think about Dynamics 365, to think about just how critical it is for nonprofits. At the front end of a nonprofit, you're fundraising, you're managing your volunteers. Those are your core resources. You're managing beneficiaries and beneficiary cases. You're delivering programs and measuring their impact, all of that is based and rooted in CRM logic and capabilities. On the backend, you're managing finance, operations, HR, all of that is rooted in ERP logic and capability. And what you ultimately want to understand is, what is the cost or the efficacy per outcome that I'm trying to generate? And that is a really hard bridge to cross. So to cross that, we've invested in a common data model on top of Dynamics 365 for the nonprofit industry, designing it specifically for the industry, with the industry. So we have institutional donors, private donors, like Gates, [[DFID (Department for International Development?]][inaudible 00:44:05], large nonprofits and small nonprofits that have helped us co-create this common data model. The common data model sits on GitHub so that the world can use it. And so that we're not the only platform running it, 'cause the more platforms that consume that data model, the better and more interoperable we can make those platforms. Sam Charrington: [00:46:15] I mean, can you use it anywhere, not just Dynamics? Justin Spelhaug: [00:46:18] Yeah, it's on GitHub. It's a schema. But Dynamics consumes it as a first class citizen, of course. We built it, we want to consume it and use it. On top of that data model, we're building connectors, templates, and sample laps that are really igniting partners to build really cool finish solutions. But one thing that's core in the data model is unlike other models that have been built before, they've been built really with fundraising in mind. They started as a CRM sales management tool and built with fundraising in mind. We built ours with program management, program delivery, and outcome measurement in mind. So very specifically, and we're doing a lot of work on benchmarking best practices on outcome management, outcome metrics, as well as program, that's built-in to this data model. On top of that, as I was saying, partners like Avanade, like Blackbaud, Classy, Fluxx and others, m-hance[a][b][c], are building solutions, finished solutions that they can deliver to nonprofits. We think there are some very interesting AI scenarios, to get back to the topic, to leverage the data in those models and leverage our intelligent cloud computer infrastructure to help nonprofits be able to draw a straighter line between that package of aid that they delivered to that refugee camp, to refugee outcomes. And the cost per outcome. If you'd say, "What's the one thing that you're working on, what's the single most important thing that you're trying to do in your team right now?" There's many, but it is helping organizations much more clearly understand that causation and correlation between their activities, the outcomes they're driving, so they can tune and modify their programs to have an even greater outcome. And AI is key to that. Sam Charrington: [00:48:11] I agree, and you did a great job at picking up on the kernel of an idea that I was trying to get out there. You tend to see a lot of activity focused on evaluation, mostly because it's part of … Many of these organizations are grant funded, the grantors all have program evaluation components of their grant, so they are kind of constantly working on evaluation, but- Justin Spelhaug: [00:48:39] On a treadmill, in fact. Sam Charrington: [00:48:40] On a treadmill, right. But there is this gap between the metrics that they have access to and can use and the actual impact to the point that you're, to your point, that they hope to make in the whatever community that they're serving. And it does strike me that AI could play a huge role there in its ability to look in a broad spectrum way at this constituent and identify patterns, right? Justin Spelhaug: [00:49:14] That's right. Sam Charrington: [00:49:14] A big part of what deep learning, for example, is great at is … or situations where we can't really figure out the rules- Justin Spelhaug: [00:49:20] It's fuzzy. Sam Charrington: [00:49:21] It's fuzzy. But we can, based on data, we can train a model to identify the success case. That would be huge for many organizations. Justin Spelhaug: [00:49:32] It is, and just two things to add to that. This morning I was on a call with the CEO of Mission Measurement, and what Mission Measurement is focused on is what he calls the outcome genome, but essentially a framework of outcomes that really denote social impact. And if you go to Mission Measurement's website, you'll see on the website they have 132 outcomes predict and correlate to 80 percent of the social impact that we drive. So it's a standard dictionary for how we can think about these outcomes by industry. You map that to a common data model and a deep data repository with these nonprofits, and in between, you're able to leverage AI. Now you're cooking with gas, in terms of helping organizations really understand the impact you're having, and you have a common language to start to describe globally what is the dollar per outcome that organization A is able to provide and organization B is able to provide, and how do we help them optimize those things? We're actually applying that same genome to our work. So the question I ask myself is, "Hey, how is Dynamics? How are these AI solutions actually moving the dial? Great stories, Justin, but how many kids were actually … How many life changing surgeries actually improved because of those optical recognition technologies that you're talking about?" We're working on building out that platform capability to tell our story too. Sam Charrington: [00:51:06] Okay. A couple of questions for you, or maybe one question from a couple of perspectives. And the question is, to your point, lots of great stories here, lots of great opportunity to apply AI for humanitarian action and for social good more broadly. I'm wondering if I'm listening to this podcast and I'm at a nonprofit or an organization that's focused on providing services to these communities, or other communities, how do I get started from your perspective, and taking advantage of AI? And then the flip side of that is if I'm an individual that's not working on your team at Microsoft or not working in one of these organizations, but sees the need, maybe has some skill in this area and wants to jump in, I'm curious what … If you've seen anything, any organizations that you work with or any suggestions you would have for folks that want to help? Justin Spelhaug: [00:52:12] Just to peel back that question, the starting point for AI, I think, is having a well-defined use case. And we kind of get fixated on the shiny object of AI and all of its glory, but we can often forget that what we're really trying to do is solve problems here. And so, as an organization thinks about getting started, what are the problems they're trying to solve? What analysis are they trying to predict or understand? And The World Bank is a really good example. There was a pretty well defined use case, it was about food insecurity prediction with core dataset and a common model. The second thing an organization needs to have as they start to head down the AI path is data and data and data. Now there are ways to solve AI problems with some advance modeling techniques, but in general, we need to solve these challenges, a pretty significant dataset. A pretty decent dataset. And so, do you have a well-defined use case, and do you have the dataset behind that use case? And then the third is starting to build some skill. We're putting as a company, a number of different programs together to build skills and training for folks, but I do think we have an opportunity to work with partners in the sector, Revel, KPMG, Avanade are all building competencies. They all have social business practices. Accenture has a social business practice to help nonprofits supplement their skills with great AI talent. And then everything that we're building here, everything that we can, we're gonna be putting on GitHub. The end game, in my view, is that we've got a toolkit of hundreds of tools. Not all that we have built, but the community is building too, that can be compiled together into this toolkit in AI for Humanitarian Action, four core scenarios, depending on what you're working on, you can pull from that toolkit to either be inspired or leverage that AI pattern directly for your particular use case. There are no magic bullets though. I mean, there's no magic bullet other than, I think, those four or five things. Sam Charrington: [00:54:51] It does strike me in thinking about the organizations and their constituents that we're working with potentially under resourced organizations or less sophisticated organizations. Not always, but often. And vulnerable communities. That brings to mind the whole concept of ethical use of AI and I would imagine you would … As an organization that's bringing this technology to them, you have a responsibility to ensure that they are using it correctly. To what extent does that come up and how have you addressed that in your engagements? Justin Spelhaug: [00:55:39] Yeah. No, I mean, ethics in AI is a very hot issue at Microsoft in general, as you know. Part of our work is to demonstrate to the world the good that AI can do, which is why we have initiatives like AI for Humanitarian Action. But at Microsoft, we've invested in AI for Ethics framework and that framework was published out by Brad Smith and Harry Shum and The Future Computed body of work. It's a book, you can find it online. It outlines six core principles. I won't go into them in detail, but fairness, reliability, privacy and security, inclusiveness, transparency, and accountability are all, in each of those, is kind of a discussion, I think. They're all core to designing AI in a way that respects the security and the privacy of that SIDS patient record that we talked about earlier or that refugee record that we talked about earlier. Now, that framework is being put together and has been put together into an ethical design guide that all of our engineering teams adhere to as we think about building products or services or engaging in these projects, as well. We also have an internal advisory committee that looks at major new product releases, new technology releases, and ensures that they follow those ethical standards, that we're designing it in the right way. In short, we have to apply those to the products that we're building, but we also have to apply those to the engagements that we're leading, and we do. And that serves as our north star compass point to make sure that we're doing things that are advancing society, that are inclusive for everyone, that are safe and secure. Sam Charrington: [00:57:27] Well, Justin, I'm really excited about the work you're doing, and I appreciate you taking the time to chat with us about it. Any final thoughts you'd like to share? Justin Spelhaug: [00:57:36] Well just that we're really excited to support the nonprofit community, these mission-driven organizations. We're gonna leverage that social business model that we have to continue to dial our investments up and into this community, building more AI patterns and practices, investing in Dynamics to make it increasingly more useful and potent. Helping benchmark best practices and processes for the sector and contribute that to the sector. And we're really excited about what that can do. And ultimately, we're excited about working with these organizations to move that dial on that first thing that we talked about, which are those sustainable development goals, and really leveraging everything that we can do at Microsoft, our products, our technology, our people, to lean in with these mission-driven organizations to close that gap we see. And if we do our job right, we'll be sitting here in 2030 talking about a little bit of a different, and hopefully a much better, world. Sam Charrington: [00:58:42] Fantastic. Thanks so much, Justin. Justin Spelhaug: [00:58:43] All right. Thank you.
Bits & Bytes Google announced a bunch of interesting ML/AI-related news at last week’s Next conference. Here are the highlights, along with a few other tidbits. Google launches new AI-powered contact center solution. The global market for cloud-based contact center solutions is expected to exceed $30B by 2023. It’s no surprise that Google wants a piece of this, and to that end launched the Contact Center AI alpha. The new offering combines Google’s Dialogflow chat platform with other AI technologies—e.g. agent assist and a conversational topic modeler—to help customers reduce wait times, improve customer satisfaction, and gain greater insights. A full host of technology and services partners were announced as well. Furthering its edge initiatives, Google releases new Cloud IoT Edge. Cloud IoT Edge includes Edge IoT Core, which facilitates the connection of edge devices to the Google Cloud and simplifies their management, and Edge ML, which supports running pre-trained TensorFlow Lite models on edge hardware. Cloud IoT Edge is designed to take advantage of the newly announced Edge TPU as well (see below). Google unveils new AI chips for edge machine learning. Google is bringing its TPU accelerator chips from the cloud to the edge with the launch of Edge TPU, currently in early access. Aiming to compete with offerings like the Nvidia Jetson and Intel Movidius product families, Edge TPU brings high-performance ML inference to small, power-constrained devices. Google adds Natural Language and Translation services to the Cloud AutoML family. I covered the launch of Google Cloud AutoML Vision in the newsletter earlier this year. Last week Google pulled back the covers on new AutoML services for natural language classification and translation. Skip the press releases though and check out Rachel Thomas’ great series of posts on these new tools. For more from Google and Next, check out these roundups of all announcements and analytics/ML announcements. Dollars & Sense Snap40, which uses ML/AI for remote patient monitoring, has secured US $8 million in seed financing Zorroa, which provides a platform for managing visual assets, has closed a $7M funding round Shanghai-based Wayz.ai, a smart location and mapping start-up (not to be confused with Waze) announced that it has raised a US$80 million series A Unisound, a Chinese AI solutions provider, specialized in voice recognition and language processing, has received RMB600 million ($89 million) in Series C-plus funding Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
My travel comes in waves centered around the spring and fall conference seasons. A couple of weeks ago, in spite of there being no signs of a true springtime here in St. Louis, things shifted into high gear with me attending the Scaled ML conference at Stanford and Nvidia GTC over the course of a few days. Following me on Twitter is the best way to stay on top of the action as it happens, but for those who missed my live-tweeting, I thought I’d reflect a bit on Nvidia and GTC. (You’ll need to check out my #scaledmlconf tweets for my fleeting thoughts on that one.) In many ways, Nvidia is the beneficiary of having been in the right place at the right time with regards to AI. It just so happened that (a) a confluence of advances in computing, data, and algorithms led to explosive progress and interest in deep neural networks, and (b) that our current approach to training these depends pretty heavily on mathematical operations that Nvidia’s graphics cards happened to be really efficient at. That’s not to say that Nvidia hasn’t executed extremely well once the opportunity presented itself. To their credit, they recognized the trend early and invested heavily in it, before it really made sense for them to do so, besting the “innovator’s dilemma” that’s caused many a great (or formerly great) company to miss out. Nvidia has really excelled in developing software and ecosystems that take advantage of their hardware and are deeply tailored to the different domains in which it's being used. This was evidenced in full at GTC 2018, with the company rolling out a number of interesting new hardware, software, application, and ecosystem announcements for its deep learning customers.   A few of the announcements I found most interesting were: New DGX-2 deep learning supercomputer After announcing the doubling of the V100 GPU memory to 32GB, Nvidia unveiled the DGX-2, a deep-learning optimized server containing 16 V100s and a new high-performance interconnect called NVSwitch. The DGX-2delivers 2 petaFLOPS of compute power and offers significant cost and energy savings relative to traditional server architectures. For a challenging representative task like training a FAIRSeq neural machine translation (NMT) model, the DGX-2 completed the task in a day and a half, versus the previous generation DGX-1’s 15 days. Deep learning inference and TensorRT 4 Inference (using DL models, versus training them) was a big focus area for Nvidia CEO Jensen Huang. During his keynote, Jensen spoke to the rapid increase in complexity of AI models and offered a mnemonic for thinking about the needs of inference systems both in the datacenter and at the edge–PLASTER, for Programmability, Latency, Accuracy, Size, Throughput, Energy Efficiency, and Rate of Learning. To meet these needs, he announced the release of TensorRT 4, the latest version of its software for optimizing inference performance on Nvidia GPUs. The new version of TensorRT has been integrated with TensorFlow and also includes support for the ONNX deep learning interoperability framework, allowing it to be used with models developed with the PyTorch, Caffe2, MxNet, CNTK, and Chainer frameworks. The new version's performance was highlighted, including an 8x increase in TensorFlow performance when used with TensorRT 4 vs TensorFlow alone and 45x higher throughput vs. CPUs for certain network architectures. New Kubernetes support Kubernetes (K8s) is an open source platform for orchestrating workloads on public and private clouds. It came out of Google and is growing very rapidly. While the majority of Kubernetes deployments are focused on web application workloads, the software has been gaining popularity among deep learning users. (Check out my interviews with Matroid’s Reza Zadehand OpenAI’s Jonas Schneider for more.) To date, working with GPUs in Kubernetes has been pretty frustrating. According to the official K8s docs, “support for NVIDIA GPUs was added in v1.6 and has gone through multiple backwards incompatible iterations.” Yikes! Nvidia hopes its new GPU Device Plugin (confusingly referred to as “Kubernetes on GPUs” in Jensen’s keynote) will allow workloads to more easily target GPUs in a Kubernetes cluster. New applications: Project Clara and DRIVE Sim Combining its strengths in both graphics and deep learning, Nvidia shared a couple of interesting new applications it has developed. Project Clara is able to create rich cinematic renderings of medical imagery, allowing doctors to more easily diagnose medical conditions. Amazingly, it does this in the cloud using deep neural networks to enhance traditional images, without requiring updates to the three million imaging instruments currently installed at medical facilities. DRIVE Sim is a simulation platform for self-driving cars. There have been many efforts to train deep learning models for self-driving cars using simulation, including using commercial games like Grand Theft Auto. (In fact, the GTA publisher has shut several of these efforts down for copyright reasons). Training a learning algorithm on synthetic roads and cityscapes hasn’t been the big problem though. Rather, the challenge has been that models trained on synthetic roads haven’t generalized well to the real world. I spoke to Nvidia chief scientist Bill Dally about this and he says they’ve seen good generalization by incorporating a couple of techniques proven out in their research, namely by combining real and simulated data in the training set and by using domain adaptation techniques, including this one from NIPS 2017 based on coupled GANS. (See also the discussion around a related Apple paper presented at the very first TWIML Online meetup.) Impressively, for as much as Nvidia announced for the deep learning user, the conference and keynote also had a ton to offer their graphics, robotics and self-driving car users, as well as users from industries like healthcare, financial services, oil and gas, and others. Nvidia is not without challengers in the deep learning hardware space, as I’ve previously written, but the company seems to be doing all the right things. I’m already looking forward to next year’s GTC and seeing what the company is able to pull off in the next twelve months. Sign up for our Newsletter to receive this weekly to your inbox.
Hey there! This week's main article is a bit longer than usual, but I hope you'll find it both interesting and thought provoking. Google's New Cloud AutoML: What is it and broader implications Developing machine learning systems is an inherently iterative process and one which can be, as a result, tedious, time consuming and expensive. In addition to the data science work that needs to be done, the goal of which is the development of a predictive model, there are also a host of IT issues that need to be addressed to put ML and AI models into production. To help organizations overcome these issues, the major cloud vendors, as well as many independents, offer cloud-based prediction APIs that developers can easily integrate into their applications. The benefit of these APIs is in their ease-of-use and the fast time-to-market they enable. In most cases, developers are only minutes away from retrieving predictions using these APIs. AI-as-a-Service Challenges The fly in the ointment, so to speak, of these APIs has traditionally been the fact that they really only work well in fairly generic use cases. Consider the case of an object detector for photos and videos. That object detector is based on deep learning, and was trained using millions of labeled example images or video samples. But if the objects you want to identify in your videos are not well represented in the labeled data set, the neural network can’t really hope to offer accurate predictions. [A fly in the ointment, according to Google Cloud Vision API] As a result, developers using AI-as-a-Service offerings face challenges relating to: Domain specificity. If a given service’s training dataset doesn’t span my domain, it can’t help me. For example, if I need to visually identify when toy cars are present in my images, but the training set only includes a few images containing toy cars, the service can’t really be expected to do a very good job with them. Fine-grained detection. What if I need to not just identify the presence of toy cars in my images, but also distinguish between different types of toy cars? Now I not only need a lot of them present in the training dataset, and need more fine grained labeling (requiring more expertise to develop by-the-way), but I also need a network architecture that is robust enough to capture the fine-grained distinctions among different types of toy cars. Bias avoidance. Even if our object detector has plenty of toy cars with granular labels, the performance of the detector for our application can be impacted by a variety of other factors: the distribution of the cars in the training set, the orientation of the cars in the training set, the backgrounds of the images in the training set, the lighting, quality and resolution of the images in the training set, to name a few. If any of these factors is poorly aligned with the types of images I need to analyze, my predictor is likely to underperform. Enter Cloud AutoML Google Cloud AutoML is a new service aimed at helping developers overcome these challenges. Cloud AutoML operates similarly to other AI-as-a-Service APIs, however, Cloud AutoML users can upload their own training datasets to augment those already collected by Google. The first service to be released under the Cloud AutoML brand is Cloud AutoML Vision, which allows users to train custom vision models. Cloud AutoML Vision is currently in alpha release, meaning the service is in testing and access to it is limited. Interested customers must apply for access, agree to applicable terms, and have their projects cleared for use against the API. Notable features include: Powered by transfer learning. We’ve talked about transfer learning a bit on the podcast (e.g. TWIML Talk # 62, 88). It’s a methodology for training a neural network on one dataset and then refining its training with another dataset. The advantage of transfer learning is that the first training, which typically uses a much larger training dataset, does much of the heavy lifting of teaching the network to make coarse-grained distinctions. This allows the second training dataset to be much smaller. In this case the first dataset is Google’s own labeled training data and the second training dataset is the customer’s. Google claims that transfer learning allows custom AutoML Vision models to achieve high domain specific performance with as few as “a few dozen photographic samples.” This is sure to be highly use case dependent. Automated model optimization. Cloud AutoML is powered by a number of advanced optimization techniques, like neuroevolution (discussed in detail in TT # 94), reinforcement learning (which we’ve discussed extensively, though in other contexts, such as TT # 24, 28, 43), and other automated techniques. This helps ensure that the models created by the service perform well. Human labeling. If you’ve got data, but it’s not labeled, Google can help you with this as part of your use of the service. It will be interesting to compare how their offering in this area compares to that of more specialized providers like MightyAI (TT # 6, 57) or CrowdFlower. For sure, Cloud AutoML is an exciting addition to the Google Cloud Platform, and to my knowledge they’re the first major cloud vendor to announce support for customer-provided training data for cognitive APIs. This is the natural direction for these kinds of services, though, and I’d expect all major cloud vendors to announce similar capabilities within 12-18 months. On Automated Machine Learning While this article has already gotten fairly long for this newsletter, I wanted to comment briefly on the broader notion of automated machine learning. Google’s clearly trying to stake some thought-leadership ground here with its naming choice for this service. I get this, and this may prove to be effective short-term positioning for them, but ultimately all users of AI-as-a-Service, particularly the less sophisticated users (from a data science perspective) that these services target, expect high levels of automation. And, as expressed above, I think the ability to bring your own training data (which kind of assumes transfer learning and automated model search/optimization) will be table stakes within a year or two. A broader challenge posed by the AutoML name is furthering the idea that magic, black box services can get you all the way there, and that some degree of data science or statistically knowledge isn’t necessary. Google and others in this space are certainly providing a valuable service in lowering the barrier to entry for developers and enterprises interested in using machine learning, but often the issue is that without a certain degree of savvy, you don’t know what you don’t know. This is particularly true with the bias-related challenges this offering is meant to address in the first place. Further, there are complaints that Google is attempting to hijack the term AutoML, which has existing meaning within the data science community and which has for years been the name of a workshop on the topic held at ICML and other academic conferences.   The broader field of AutoML encompasses the automation of a wide range of data science tasks, including data preprocessing, model selection, hyperparameter optimization, performance analysis and more. Google Cloud AutoML, while powerful, doesn’t quite live up to the broader vision of AutoML: well-understood tools, algorithms and methodologies that increase the throughput and effectiveness of data scientists. This may be splitting hairs, but I do agree that the distinction between closed, black-box automation vs open and transparent tools that can be integrated into a user’s ML pipeline is an important one. Examples of projects that have evolved out of these broader efforts include the open source auto-sklearn, AutoWEKA, and Auto Tune Models (ATM). Commercial offerings like Bonsai (TT # 43], SigOpt (TT # 50), h2o’s Driverless AI, and DataRobot also exist, falling in varying places along the transparency spectrum. To Google’s credit, they’ve published extensively in this area and in the academic literature generally. Further, the company is certainly heavily invested in open source tools in this domain (e.g. TensorFlow). And Cloud AutoML Vision is but a first installment toward a broader vision of an automated ML platform. It wouldn’t surprise me at all to see Google Cloud AutoML technologies eventually surface as open source projects within in the TensorFlow ecosystem over time. Finally, there’s an interesting conversation to be had about the impact of automated ML tools on the space: Will lowering the barrier to entry result in a flood of faulty, poorly-understood models entering them market via internal and public-facing enterprise systems? How will these systems impact their users? Will users accept the (relative) lack of transparency offered by these systems for production workloads? Will Google and others develop tools that help users understand the statistical biases of their datasets? What do you think the biggest issues will be? If you made it this far, I’d be very interested in hearing your thoughts on the above, so please don’t hesitate to reply! In any case, what’s clear and most exciting here is that powerful tools and technologies are rapidly becoming more accessible, and this will have a huge impact on how, and how quickly, machine learning and AI are adopted across many industries. Sign up for our Newsletter to receive this weekly to your inbox.
Bits & bytes End of an era. With the increasing traction of PyTorch and the resulting renewed vigor in the deep learning framework wars, Yoshua Bengio and the team at the University of Montreal’s MILA are throwing in the towel and terminating Theano development with the upcoming 1.0 release. Unifying reinforcement learning. Game platform developer Unity Technologies recently introduced Unity Machine Learning Agents, an SDK targeting both researchers and game developers for training intelligent agents using reinforcement learning and other techniques in game environments and virtual worlds. For background on this, check out TWIML Talk #24, where I spoke to Danny Lange, VP for ML & AI at Unity about the role of RL in gaming, and the huge opportunities available to game platforms. Training on a panoply of panoramic views. Interesting new dataset from 3D camera maker Matterport. The Matterport 3D dataset consists of 10,800 aligned 3D panoramic views (RGB + depth per pixel) from 194,400 RGB + depth images of 90 building-scale scenes, all hand-labeled with instance-level object segmentation. Trolling AI Twitter. AI luminary Pedro Domingos spent a few days trolling AI Twitter last week, yours truly included, on the topic of algorithmic ethics and discrimination. Personally, I get what he’s trying to say but think his comments amount to arguing semantics, and are fairly irresponsible for someone of his stature in the field. Does anyone out there know him and/or can get him on the podcast? Chip chat. Intel is touting a forthcoming “neuromorphic” chip design that takes inspiration from the human brain. NVIDIA announced the Deep Learning Accelerator, an open source hardware architecture for deep learning inference acceleration. CNBC is spreading rumors that Tesla has tapped AMD to collaborate on an AI chip. Meanwhile, Imagination Technologies will offer an AI chip design that chip makers can embed within their own designs, focused on image and signal processing. Show me the money. Baidu announced a $1.5 billion fund for investing in self-driving car startups. Video object detection startup Matroid raises a $10 million series A. Catch my conversation with founder Reza Zadeh on the pod. TalkIQ raises $14 million to help enterprises spy on analyze recorded voice conversations. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
This is just heaven, newsletter eleven! Over the river and through the woods Just like the journey to grandmother’s house, the journey to AI begins with a single step—so just get going! That’s where I tend to start when folks ask me how to get started in AI. It sounds like overly simplistic advice, but the reality is that most organizations—and individuals—face a tremendous amount of inertia and distraction when it comes to developing new competencies. This idea—Just Do It!—is just one of the messages I’ll have for attendees at the Gartner Symposium next week during a panel I’ll be participating in alongside representatives from IBM, Hortonworks and NVIDIA on the topic of Getting Started with AI. While machine learning and AI can be very complex, I tend to think that, as is the case with most things, the secret to success with ML & AI is very simple. When I’m talking to organizations about building ML/AI teams and products, I tend to start with these “simple truths.” There’s plenty of time to dive into the details once the basics are understood. Here’s my 10+1 list of the top keys to success in Enterprise AI. (There's more detail behind this. This is a recap from my talk at the Future of Data Summit I organized earlier this year.) If you’re looking to chart a path for your organization to get up to speed with ML and AI, and you think I might be of service, don’t hesitate to reach out. Sign up for our Newsletter to receive this weekly to your inbox.