We could not locate the page you were looking for.

Below we have generated a list of search results based on the page you were trying to reach.

404 Error
Google’s premier cloud computing AI conference, Google Cloud Next 2023, took place the last week of August at Moscone Center in San Francisco. I attended the event and had the opportunity to spend several days in a variety of keynotes, briefings, sessions, as well as explore the event’s expo floor. Of course, I shared some of my real-time observations via Twitter X, which you can check out here. Here, I’ll share a few of my key takeaways from the event. This was the first in-person Google Cloud Next event in three years. While the event felt a lot smaller and more compact than the last one I attended, it was still large for a post-pandemic conference with approximately 15,000 attendees present. Generative AI in FocusNo surprise here, but Generative AI was very much a key theme flowing throughout the event, though there was plenty of content for folks more interested in traditional cloud computing topics.In addition to enabling new features and capabilities for the company’s the core AI stack (AI-oriented infrastructure and accelerators, AI/ML/DS platforms, and AI-powered applications), Google is weaving generative AI into non-AI products through Duet AI, which adds AI-bases assistant technologies to a wide range of Google Cloud products.A good indication of the breadth of work they’ve done to quickly build generative AI into their product base can be seen in the many AI-related announcements they made during the event. Here’s a summary of the most interesting AI-focused ones, out of the full list of 161 noted in Alison Wagonfeld’s wrap-up post:Duet AI in Google Cloud is now in preview with new capabilities, and general availability coming later this year. There were a dozen more announcements covering Duet AI features for specific Google Cloud tools, but you can check out the blog post for a summary.Vertex AI Search and Conversation, formerly Enterprise Search on Generative AI App Builder and Conversational AI on Generative AI App Builder, are both now generally available.Google Cloud added new models to Vertex AI Model Garden including Meta’s Llama 2 and Code Llama and Technology Innovation Institute's Falcon LLM, and pre-announced Anthropic’s Claude 2. The PaLM 2 foundation model now supports 38 languages, and 32,000-token context windows that make it possible to process long documents in prompts.The Codey chat and code generation model offers up to a 25% quality improvement in major supported languages for code generation and code chat. The Imagen image generation model features improved visual appeal, image editing, captioning, a new tuning feature to align images to guidelines with 10 or fewer samples, and visual questions and answering, as well as digital watermarking functionality powered by Google DeepMind SynthID. Adapter tuning in Vertex AI is generally available for PaLM 2 for text. Reinforcement Learning with Human Feedback (RLHF) is now in public preview. New Vertex AI Extensions let models take actions and retrieve specific information in real time and act on behalf of users across Google and third-party applications like Datastax, MongoDB and Redis. New Vertex AI data connectors help ingest data from enterprise and third-party applications like Salesforce, Confluence, and JIRA.Vertex AI now supports Ray, an open-source unified compute framework to scale AI and Python workloads. Google Cloud announced Colab Enterprise, a managed service in public preview that combines the ease-of-use of Google’s Colab notebooks with enterprise-level security and compliance support capabilities. Next month Google will make Med-PaLM 2, a medically-tuned version of PaLM 2, available as a preview to more customers in the healthcare and life sciences industry.New features to enhance MLOps for generative AI, including Automatic Metrics in Vertex AI to evaluate models based on a defined task and “ground truth” dataset, and Automatic Side by Side in Vertex AI, which uses a large model to evaluate the output of multiple models being tested, helping to augment human evaluation at scale, and a new generation of Vertex AI Feature Store, now built on BigQuery, to help avoid data duplication and preserve data access policies.Now Vertex AI foundation models, including PaLM 2, can be accessed directly from BigQuery. New model inference in BigQuery lets users run model inferences across formats like TensorFlow, ONNX, and XGBoost, and new capabilities for real-time inference can identify patterns and automatically generate alerts. Vector and semantic search for model tuning now supported in BigQuery. You also can automatically synchronize vector embeddings in BigQuery with Vertex AI Feature Store for model grounding. A3 VMs, based on NVIDIA H100 GPUs and delivered as a GPU supercomputer, will be generally available next month. The new Google Cloud TPU v5e, in preview, has up to 2x higher training performance per dollar and up to 2.5x inference performance per dollar for LLMs and generative AI models compared to Cloud TPU v4. New Multislice technology in preview lets you scale AI models beyond the boundaries of physical TPU pods, with tens of thousands of Cloud TPU v5e or TPU v4 chips. Support for Cloud TPUs in GKE is now available for Cloud TPU v5e and Cloud TPU v4. Support for AI inference on Cloud TPUs is also in preview. GKE now supports Cloud TPU v5e, A3 VMs with NVIDIA H100 GPUs, and Google Cloud Storage FUSE on GKE (GA).Key TakeawaysMy takeaways from Google Cloud Next are very much in the same vein as those from my attendance at Google’s Cloud Executive Forum held earlier in the summer. I continued to be impressed with Google Cloud’s velocity and focus when it comes to attacking the opportunity presented by generative AI. The company clearly sees gen AI as a way to leap ahead of competitors AWS and Microsoft and is taking an “all in” approach. The company has also been very quick to rally customers around its new gen AI product offerings. In addition to the product announcements noted above, Google Cloud announced and highlighted new and expanded generative-AI-focused collaborations with a wide variety of customers and partners, including AdoreMe, Anthropic, Bayer Pharmaceuticals, Canoo, Deutsche Bank, Dun & Bradstreet, Fox Sports, GE Appliances, General Motors, Ginkgo Bioworks, Hackensack Meridian Health, HCA Healthcare, Huma, Infinitus, Meditech, MSCI, NVIDIA, Runway, Six Flags, eleven generative AI startups, DocuSign, SAP, and more.Interesting overview of @FOXSports use of Gen AI. Have 27 PB of video, ingest 10k hrs per month. Have custom models for things like celebrity detection, foul ball prediction, and more. Use the tech to allow analysts to more easily search archives. #GoogleCloudNext pic.twitter.com/ea3tQCVXU0— Sam Charrington (@samcharrington) August 29, 2023 AI-Driven Transformation panel at #googlecloudnext Analyst Summit featuring data leaders from ⁦@Snap⁩ and ⁦@Wayfair⁩. pic.twitter.com/aANlHv6nHT— Sam Charrington (@samcharrington) August 29, 2023 https://twitter.com/samcharrington/status/1696597457134817490https://twitter.com/samcharrington/status/1696597126090985574"For the first time, the business is really engaged in transformation... We will figure out hallucinations, omissions, etc., ... but the level of engagement is game changing."- Gil Perez, Chief Innovation Officer, Deutsche Bank /*! elementor - v3.12.2 - 23-04-2023 */ .elementor-widget-divider{--divider-border-style:none;--divider-border-width:1px;--divider-color:#0c0d0e;--divider-icon-size:20px;--divider-element-spacing:10px;--divider-pattern-height:24px;--divider-pattern-size:20px;--divider-pattern-url:none;--divider-pattern-repeat:repeat-x}.elementor-widget-divider .elementor-divider{display:flex}.elementor-widget-divider .elementor-divider__text{font-size:15px;line-height:1;max-width:95%}.elementor-widget-divider .elementor-divider__element{margin:0 var(--divider-element-spacing);flex-shrink:0}.elementor-widget-divider .elementor-icon{font-size:var(--divider-icon-size)}.elementor-widget-divider .elementor-divider-separator{display:flex;margin:0;direction:ltr}.elementor-widget-divider--view-line_icon .elementor-divider-separator,.elementor-widget-divider--view-line_text .elementor-divider-separator{align-items:center}.elementor-widget-divider--view-line_icon .elementor-divider-separator:after,.elementor-widget-divider--view-line_icon .elementor-divider-separator:before,.elementor-widget-divider--view-line_text .elementor-divider-separator:after,.elementor-widget-divider--view-line_text .elementor-divider-separator:before{display:block;content:"";border-bottom:0;flex-grow:1;border-top:var(--divider-border-width) var(--divider-border-style) var(--divider-color)}.elementor-widget-divider--element-align-left .elementor-divider .elementor-divider-separator>.elementor-divider__svg:first-of-type{flex-grow:0;flex-shrink:100}.elementor-widget-divider--element-align-left .elementor-divider-separator:before{content:none}.elementor-widget-divider--element-align-left .elementor-divider__element{margin-left:0}.elementor-widget-divider--element-align-right .elementor-divider .elementor-divider-separator>.elementor-divider__svg:last-of-type{flex-grow:0;flex-shrink:100}.elementor-widget-divider--element-align-right .elementor-divider-separator:after{content:none}.elementor-widget-divider--element-align-right .elementor-divider__element{margin-right:0}.elementor-widget-divider:not(.elementor-widget-divider--view-line_text):not(.elementor-widget-divider--view-line_icon) .elementor-divider-separator{border-top:var(--divider-border-width) var(--divider-border-style) var(--divider-color)}.elementor-widget-divider--separator-type-pattern{--divider-border-style:none}.elementor-widget-divider--separator-type-pattern.elementor-widget-divider--view-line .elementor-divider-separator,.elementor-widget-divider--separator-type-pattern:not(.elementor-widget-divider--view-line) .elementor-divider-separator:after,.elementor-widget-divider--separator-type-pattern:not(.elementor-widget-divider--view-line) .elementor-divider-separator:before,.elementor-widget-divider--separator-type-pattern:not([class*=elementor-widget-divider--view]) .elementor-divider-separator{width:100%;min-height:var(--divider-pattern-height);-webkit-mask-size:var(--divider-pattern-size) 100%;mask-size:var(--divider-pattern-size) 100%;-webkit-mask-repeat:var(--divider-pattern-repeat);mask-repeat:var(--divider-pattern-repeat);background-color:var(--divider-color);-webkit-mask-image:var(--divider-pattern-url);mask-image:var(--divider-pattern-url)}.elementor-widget-divider--no-spacing{--divider-pattern-size:auto}.elementor-widget-divider--bg-round{--divider-pattern-repeat:round}.rtl .elementor-widget-divider .elementor-divider__text{direction:rtl}.e-con-inner>.elementor-widget-divider,.e-con>.elementor-widget-divider{width:var(--container-widget-width,100%);--flex-grow:var(--container-widget-flex-grow)} Additionally, Google Cloud continues to grow their generative AI ecosystem, announcing availability of Anthropic’s Claude2 and Meta’s Llama2 & CodeLlama models in the Vertex AI Model Garden.TK highlighting breadth of model catalog in Vertex AI, via new and existing model partners. Announcing support for @AnthropicAI Claude2 and @MetaAI Llama2 and CodeLlama models. #googlecloudnext pic.twitter.com/E1gkpT59UA— Sam Charrington (@samcharrington) August 29, 2023 /*! elementor - v3.12.2 - 23-04-2023 */ .elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=".svg"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block} OpportunitiesNumerous opportunities remain for Google Cloud, most notably in managing complexity in both their messaging and communication as well as in the products themselves.From a messaging perspective, with so many new ideas to talk about, it is not always clear what is actually a new feature or product capability, vs. simply a trendy topic that the company wants to be able to talk about. For example, the company mentioned new Grounding features for LLMs numerous times but I’ve been unable to find any concrete detail about how new features enable this on the platform. The wrap-up blog post noted previously links to an older blog post on the broader topic of using embeddings to ground LLM output using 1st party and 3rd party products. It’s a nice resource but not really related to any new product features.And since the conference, I’ve spent some time exploring various Vertex AI features and APIs and generally still find the console and example notebooks a bit confusing to use and the documentation a bit inconsistent. To be fair, these complaints could be leveled at any of Google Cloud’s major competitors as well, but coming from an underdog position in the cloud computing race, Google has the most to lose if product complexity makes switching costs too high.Nonetheless, I’m looking forward to seeing how things evolve for Google Cloud over the next few months. In fact, we won’t need to wait a full year for updates, since Google Cloud Next ‘24 will take place in the spring, April 9-11, in Las Vegas.
I recently had the opportunity to attend the Google Cloud Executive Forum, held at Google’s impressive new Bay View campus, in Mountain View, California. The Forum was an invitation-only event that brought together CIOs and CTOs of leading companies to discuss Generative AI and showcase Google Cloud’s latest advancements in the domain. I shared my real-time reactions to the event content via Twitter, some of which you can find here. (Some weren't hash-tagged, but you can find most by navigating the threads.) In this post I’ll add a few key takeaways and observations from the day I spent at the event. Key Takeaways Continued product velocity Google Cloud has executed impressively against the Generative AI opportunity, with a wide variety of product offerings announced at the Google Data Cloud & AI Summit in March and at Google I/O in May. These include new tools like Generative AI Studio and Generative AI App Builder; models like PaLM for Text and Chat, Chirp, Imagen, and Codey; Embeddings APIs for Text and Images; Duet AI for Google Workspace and Google Cloud; new hardware offerings; and more. The company took the opportunity of the Forum to announce the general availability of Generative AI Studio and Model Garden, both part of the Vertex AI platform, as well as the pre-order availability of Duet AI for Google Workspace. Nenshad Bardoliwalla, product director for Vertex AI, delivered an impressive demo showing one-click fine tuning and deployment of foundation models on the platform. /*! elementor - v3.12.2 - 23-04-2023 */.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=".svg"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block} Considering that the post-ChatGPT Generative AI wave is only six months old, Google’s ability to quickly get Gen AI products out the door and into customer hands quickly has been noteworthy. Customer and partner traction Speaking of customers, this was another area where Google Cloud’s performance has been impressive. The company announced several new Generative AI customer case studies at the Forum, including Mayo Clinic, GA Telesis, Priceline, and PhotoRoom. Executives from Wendy’s, Wayfair, Priceline and Mayo participated in an engaging customer panel that was part of the opening keynote session. Several other customers were mentioned during various keynotes and sessions, as well as in private meetings I had with Google Cloud execs. See my Twitter thread for highlights and perspectives from the customer panel, which shared interesting insights about how those orgs are thinking about generative AI. Strong positioning While Models Aren’t Everything™, in a generative AI competitive landscape in which Microsoft’s strategy is strongly oriented around a single opaque model (ChatGPT via its OpenAI investment) and AWS’ strategy is strongly oriented around models from partners and open source communities, Google Cloud is promoting itself as a one-stop shop with strong first party models from Google AI, support for open source models via its Model Garden, as well as partnerships with external research labs like AI21, Anthropic and Cohere. The company also demonstrates a strong understanding of enterprise customer requirements around generative AI, with particular emphasis on data and model privacy, security and governance. The company’s strategy will continue to evolve and unfold in the upcoming months and much more will be discussed at Google Cloud Next in August, but I liked what I heard from product leaders at the event about the direction they’re heading. One hint: they have some strong ideas about how to address hallucination, which is one of the biggest drawbacks to enterprise use of large language models (LLMs). I don’t believe that hallucinations by LLMs can ever be completely eliminated, but in the context of a complete system with access to a comprehensive map of the world’s knowledge, there’s a good chance that the issue can be sufficiently mitigated to make LLMs useful in a wide variety of customer-facing enterprise use cases. Complex communication environment and need to educate In his opening keynote to an audience of executives, TK introduced concepts like reinforcement learning from human feedback, low-rank adaptation, synthetic data generation, and more. While impressive, and to some degree an issue of TK’s personal style, it’s also a bit indicative of where we are in this market that we’re talking to CIOs about LoRA and not ROI. This will certainly evolve as customers get more sophisticated and use cases get more stabilized, but it’s indicative of the complex communication challenges Google faces in evangelizing highly technical products in a brand new space to a rapidly growing audience. This also highlights the need for strong customer and market education efforts, to help bring all the new entrants up to speed. To this end, Google Cloud announced new consulting offerings, learning journeys, and reference architectures at the Forum to help customers get up to speed. (To add to the training courses announced at I/O). I also got to chat 1:1 with one of their “black belt ambassadors,” part of a team they’ve put in place to help support the broader engineering, sales and other internal teams at the company. Overall, I think the company’s success will be in large part dependent on their effectiveness at helping to bring these external and internal communities up to speed on Generative AI. Broad range of attitudes A broad range of attitudes about Generative AI was present at the event. On the one hand there was what I took as a very healthy “moderated enthusiasm” on the part of some. Wayfair CTO Fiona Tan exemplified this perspective both in her comments on the customer panel and in our lunch discussion. She talked about the need to manage “digital legacy” and the importance of platform investments, and was clear in noting that many of the company’s early investments in generative AI were experiments (e.g. a stable-diffusion based room designer they’re working on). On the other hand, there were comments clearly indicative of “inflated expectations,” like those of another panelist who speculated that using code generation would allow enterprises to reduce the time it takes to build applications from six weeks to two days or those of a fellow analyst who proclaimed that generative AI was the solution to healthcare in America. The quicker we get everyone past this stage the better. For its part, Google Cloud did a good job navigating this communication challenge by staying grounded on what real companies were doing with its products. I’m grateful to the Google Cloud Analyst Relations team for bringing me out to attend the event. Disclosure: Google is a client.
TWIMLcon: AI Platforms starts NEXT WEEK 🎉🎊🙌!  I’m obviously super excited for the conference, and I think you should be too! Here are the top 10 reasons why I think you should register for TWIMLcon today!1. Great Sessions: We’ve got an incredible session line-up for this year’s TWIMLcon. If you care about real-world ML or MLOps, you’ll definitely want to catch these great talks on topics like GitOps, Ray, real-time ML, model quality & testing, programmatic labeling, and much more.2. Top Speakers: At TWIMLcon you’ll hear from presenters sharing experiences gained on leading ML/AI projects and teams at companies like Etsy, Spotify, Meta, Square, Reddit, Google, Waymo, NVIDIA, Capital One. /*! elementor - v3.7.4 - 31-08-2022 */ .elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=".svg"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block} 3. Keynote Interviews: My favorite! Join me for these headliner chats on putting the “Ops” in MLOps, featuring the authors of “Reliable Machine Learning: Applying SRE Principles to ML in Production,” and on optimizing your MLOps approach for high-value B2B use cases.4. Team Teardowns: An attendee fave from the inaugural TWIMLcon, we created this unique panel format to explore how teams organize and work together to get models into production efficiently. This year’s panel features teams from LinkedIn.5. The Great MLOps Debate: Last year's debate was a big hit, so we're excited to have Demetrios Brinkmann, organizer of The MLOps Community, back to moderate another, this time on the merits of end-to-end MLOps platforms versus specialized, best-of-breed tools for supporting the ML lifecycle.6. Workshops: Expand your knowledge and expertise on the latest and greatest tech in these hands-on workshops presented by Iterative, Weights & Biases, and NVIDIA.7. Networking: We’re taking special care to ensure that TWIMLcon is an engaging virtual event, with plenty of opportunities to network and connect with fellow attendees.8. Globally accessible and COVID-free: While we’re looking forward to seeing everyone in person again soon, the advantages of our virtual format for 2022 include easier global access, no time and cost spent on travel, and no risk of exposure to COVID or monkeypox 🤢.9. Multiplayer: Attending TWIMLcon with your team offers a great opportunity to come together, get exposed to what leading ML/AI teams are doing, and lay the foundation for taking your efforts to the next level. Be sure to encourage your colleagues to register and attend with you.10. FREE: For the first time ever, TWIMLcon is free to attend for ML, AI and data practitioners and leaders so there’s no excuse not to register today!BONUS: The TWIML Community is really TWIMLcon’s secret sauce. This community is like no other – an open community of ML and AI practitioners and leaders who value growing, learning, and sharing together. TWIMLcon presents a great opportunity to connect with others who share your interest in MLOps.We’d love to have you join us. Registration is free for all qualified attendees.If you haven't already checked out the latest agenda and blocked the time on your calendar, please do that now. You're not going to want to miss this one! Register Now!
You may know Intuit as the public company (INTU) behind Quickbooks and Turbotax but thanks to $20B of recent acquisitions, they are also the new owners of the Mailchimp marketing automation company and Credit Karma - a personal finance application. The company invests heavily in machine learning and AI as a way to deliver new features and capabilities in their products, to enhance the customer experience, and to improve operational efficiencies. In January 2021, during our last TWIMLcon, we were fortunate to hear from Ian Sebanja, a product manager, and Srivathsan Canchi, head of ML platform engineering, about how Intuit was handling cost-management internally for the cloud-based (mostly AWS) infrastructure that supports their machine learning efforts. In this talk, Ian and Srivasthan shared how they have grown ML use exponentially while only growing costs linearly.  (One of our attendees, James Le, wrote up a nice blog post about that talk here and you can find the original talk in our TWIMLcon 2021 free on demand session archive here.)In October 2021, Srivathsan came on the podcast (episode 438)  and we talked about the ML feature store designed and built by the Intuit team. This tool eventually became the foundation of the Amazon SageMaker Feature Store product.Srivasthan then very kindly made the introduction to Juhi Dhingra and Dunja Panic, product managers on the data platforms team at Intuit, who agreed to chat with us on our webcast about the two data platforms they have built for data processing: one for batch processes and one for streaming applications. They also shared their plans to unify these two platforms over time, and some lessons they learned that could be applied to other organizations. Check out the video below to watch the full conversation with Juhi and Dunja or scroll down to read our takeaways from this session.https://www.youtube.com/embed/yVhnGGcOFOcLearning how to serve your internal customerIntuit has over 15,000 employees running over 60,000 data processing workloads. Product managers and data engineers across the company are spending up to 35% of their time managing operationalization and infrastructure that is “under the water line”, taking away time and energy from focus on the product they are creating.  So Juhi and Dunja and their team set out to build easy to use batch and streaming data processing platforms to simplify all of this for their internal customers and they started by focusing on the data engineer persona. /*! elementor - v3.7.4 - 31-08-2022 */ .elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=".svg"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block} (courtesy of Intuit) /*! elementor - v3.7.4 - 31-08-2022 */ .elementor-column .elementor-spacer-inner{height:var(--spacer-size)}.e-container{--container-widget-width:100%}.e-container>.elementor-widget-spacer{width:var(--container-widget-width,var(--spacer-size));-ms-flex-item-align:stretch;align-self:stretch;-ms-flex-negative:0;flex-shrink:0}.e-container>.elementor-widget-spacer>.elementor-widget-container,.e-container>.elementor-widget-spacer>.elementor-widget-container>.elementor-spacer{height:100%}.e-container>.elementor-widget-spacer>.elementor-widget-container>.elementor-spacer>.elementor-spacer-inner{height:var(--container-widget-height,var(--spacer-size))} The data engineer just wants to build great end-user products and experiences. If they can focus more time on that, and less time on managing infrastructure, then Intuit and its customers will both win. To that end, they built two data processing platforms - one for batch and one for streaming - that help their internal data engineers do things in “minutes, not months.” The developer can tap into data sources like their data lake, data warehouses, and feature store, select the appropriate processor (batch or streaming), access sample code for common use cases, and provision the resulting data pipeline to production cloud infrastructure. The platform provides multi-language support and manages the orchestration, provisioning, and scale needed by the application and data processing approach chosen. It is essentially a one-stop shop that lets the user focus on the customer experience, while the platform team supports the behind-the-scenes infrastructure such as pipelines, runtimes, containers, and databases, as well as site scalability and reliability.Streaming use case example: QB AssistantQuickbooks has a self-help function called the QB assistant. Instead of providing canned self-help options, it is highly personalized based on the activity of the user in Quickbooks itself. By building this personalized self-help system, Intuit was able to increase user engagement, improve click-through to help docs, and reduce customer support contact rates. They did this by pulling user behavior (clickstream data) from their event store, processing it in real-time on the streaming platform, and then proposing personalized suggestions back to the user in near real-time.Batch use case example: Cash flow forecastingCash flow is absolutely critical for a small business owner. In their mission to become a primary method of cash flow management for their small business customers, they built a new cashflow forecasting function into Quickbooks. To make it work, they ingest the cashflow data into the data lake, enrich it to create features using the batch processing platform, and then push those features to the feature store.Selecting stream vs. batch processing for a given use caseWe discussed how application owners make the decision between which data processing platform they should use for their application. The answers seemed to point mostly towards use case fit and budgets within the business unit.Regarding use case fit, monthly reports, or things that only need to run periodically are good candidates for batch processing while use cases like chatbots or clickstream analytics are better candidates for streaming data processing.With respect to budgets and running costs, batch processing is inherently cheaper. In batch processing, infrastructure can be spun up (including things like Amazon EMR) and run only for the duration of the batch job and then be turned off. Streaming applications are “always on” by their nature, so they will constantly be running some level of infrastructure to support base demand and that will generally speaking, have a higher cost than a batch processing operation.What does the future hold for this platform?With so much of the platform built and operational, we wondered what was next? Were there still challenges? Juhi and Dunja quickly pointed out that they were in no way done building out the platform. They addressed issues such as having too many tools for users, still having issues onboarding data, getting clean data, having good data governance, enforcing ownership and stewardship of data, and enabling trust in the data for their internal customers.Their goal going forward is to build a completely unified “data mesh” where people can discover data products, explore them, and leverage them, or build on top of them. They want to provide a simple, well integrated, easy to use data processing experience with:Integrated access, discovery, exploration, model development, analyticsOne unified experience for stream and batch processingLow-code/no-code options for less technical personas, including helpers for debugging issues on-the-flyHigh reusability of data products by recommending aggregates and transformations of existing data productsBuilt in governance, data quality, and lineageLessons learned in their data journeyNot every company is as mature as Intuit when it comes to their data infrastructure, systems, and practices. So we asked Juhi and Dunja to share their thoughts on some things that any company can take away from their experience and apply to their own situation and they shared a number of great principles:Understand YOUR business. You can use overarching design principles but your business should ultimately dictate what you do in terms of designing and building your own platform.Understand your own data, data lake, data warehouse, and tools (including which ones work for your users and which ones don’t).Keep the solutions as simple and broad-based as possible - do not deploy a tool for a single use case or solution.Try to serve multiple personas even if you just start with one. Intuit started with the hardest one (data engineer) and are now aiming to support other types of engineers (specifically, ML and SW engineers), as well as moving towards the less-technical personas such as data analysts and business analysts.Remember to balance priorities - are speed and agility more important or governance? Only you and your business know what the right answer is. There was so much more covered in the Q&A, where we discussed feature storage for batch vs. streaming, data lineage, multi-cloud infrastructure management, and how they keep their users on the rails without breaking data schemas or standards. We invite you to listen to the entire episode here.We want to thank Juhi and Dunja for coming on the webcast and we’ll look forward to another update soon!
Today we continue our CVPR series joined by Kate Saenko, an associate professor at Boston University and a consulting professor for the MIT-IBM Watson AI Lab. In our conversation with Kate, we explore her research in multimodal learning, which she spoke about at the Multimodal Learning and Applications Workshop, one of a whopping 6 workshops she spoke at. We discuss the emergence of multimodal learning, the current research frontier, and Kate’s thoughts on the inherent bias in LLMs and how to deal with it. We also talk through some of the challenges that come up when building out applications, including the cost of labeling, and some of the methods she’s had success with. Finally, we discuss Kate’s perspective on the monopolizing of compute resources for “foundational” models, and her paper Unsupervised Domain Generalization by learning a Bridge Across Domains.
Sam Charrington: [00:00:00] All right, everyone. I am here with Jabran Zahid, Senior Researcher with Microsoft Research. Jabran, welcome to the TWIML AI Podcast. Jabran Zahid: [00:00:09] Thank you very much, Sam. It's a pleasure to be here. Sam Charrington: [00:00:11] Great to have you on the show. I'm really looking forward to digging into our conversation. To get us started. I'd love to have you share a little bit about your background and how you came to work at the confluence of biology and artificial intelligence. Jabran Zahid: [00:00:26] Oh, thank you very much for this opportunity to share with you what we've been working on here at Microsoft. By training, I'm an astrophysicist, and prior to coming to Microsoft a year and a half ago, I was working on understanding galaxy evolution and cosmology, largely trying to look at galaxies. The most recent stuff I was working on is look at galaxies and try to develop techniques to tie those galaxies to the dark matter distribution in the universe. I was interested in mapping the dark matter in the universe using the galaxies as little beacons of light in this sea of dark matter. It was a real privilege to be able to study astrophysics. It's a beautiful subject, but as I've gotten older, one of the things that started to become a higher priority for me, personally, was to be able to have a greater impact with the work I was doing, and by impact, I meant, that meant to me, the ability to impact people's lives on the day to day. While astronomy is a beautiful subject, it's not the most practical in terms of every people's day-to-day lives. It has important cultural impact. It doesn't have that impact on everyone's lives from the day to day, so I started to look for opportunities in one place that made perfect sense to look towards was industry, where not only is there interesting projects and interesting things being done, there's the opportunity and ability to have reach if you work at the right place that has the reach to individuals. When, one of my former colleagues who was also an astrophysicist himself, went to Microsoft Research. She told me about the position within the Immunomics group and told me a little bit about the details. It was just my bread and butter. It was a science project mixed with a very, very, if successful, could potentially have a huge impact could even change the world if we succeed at what we're doing in this project. That just really got me excited. Once I had learned more about the project and brought my skills to the table, it made sense. I was a good fit for the role, and I ended up at Microsoft Research at the end of January last year, six weeks before the pandemic hit. Sam Charrington: [00:02:26] Wow. Did you say Immunomics? Jabran Zahid: [00:02:30] That's what we call it.  It's immunology mixed with genomics basically. Our project essentially is, we're trying to map the immune system and the way we do that is we genetically sequence the T-cells of the human immune system, which we'll go into details on what that means. We're essentially trying to learn how to read the immune system from the genes themselves. Sam Charrington: [00:02:50] You mentioned that you started just before the pandemic. Did that influence the evolution of the project at all? Jabran Zahid: [00:02:57] Absolutely. We have been engaged in helping adaptive biotechnologies. The project I worked on, The Antigen Map Project, is a collaboration between Microsoft Research and Adaptive Biotechnologies. We've been helping them make diagnostics, and when COVID hit it presented a very unique opportunity for us to turn all of our efforts or a big fraction of our efforts, towards trying to diagnose COVID, which we did successfully. Adaptive Biotechnologies has a FDA authorized diagnostic on the market, which you could order today if you wanted to. COVID not only provided a very strong impetus in regards to the fact that it was just one of the most pressing human problems that we were facing, but also, it provided a unique opportunity to really bring together many, many aspects of our project. It's a great test case for understanding what we do in our project, what the antigen map is. It really accelerated our research. I anticipate that when we look back at last year, it will be seen as a watershed moment in our project, simply because of the accelerant that COVID was for our project. Sam Charrington: [00:04:15] Awesome. We'll dig into the machine learning aspect of the project and how you apply ML, but I think I'd like to hear a bit more about the biology and understand the problem that you're trying to solve at a more fundamental level. Immunomics, how does it work? What specifically are you trying to do with The Antigen Map Project? Jabran Zahid: [00:04:39] Yeah. Thank you for asking about that, Sam. I'm really happy to share this and I should, first of all, say that what I'm discussing now is a representation of 50 or so people's work. It's not just me who's carrying this out. This is a large collaboration. It really is an effort that spans multiple companies and really builds on decades of research and immunology. The human immune system is an amazing system. The adaptive immune system specifically is something that have started evolving about 300 million years ago. What the adaptive immune system is, is the part of the human immune system that has a memory. When you're a kid, you get sick with, let's say measles or something, your immune system will eventually will respond to that, and the adaptive immune system will retain a memory of having seen the measles. You will not get sick with the measles again if you've had it in the past, because the second your body gets exposed to the measles, your adaptive immune system is ready to go. It remembers what it looks like, what the pathogens from measles looks like, and it springs into action. A big part of that immune system is the T-cells. The T-cells essentially are floating around in your blood and then in some of your organs. When they recognize, they have a little receptor on their surface, and that's actually what we sequence is the T-cell receptor. We get a genetic sequence of the T-cell receptor and that genetic sequence encodes, more or less, the shape of that receptor. Like a key fitting into a lock, if that receptor's T-cell finds its lock that it fits into, if it finds the pathogen that it binds, it'll basically trigger an immune response. After that, immune response, the virus or bacteria is cleared from the body, it will remember. Those T-cells, that special T-cell, will stick around in your body for much longer than the rest of the T-cells. These T-cells, the adaptive immune system itself, is produced by a stochastic quasi random process in which different combinations of amino acids are put together producing a huge number of possible shapes for the T-cell receptor. That's where the complexity of the problem comes in, and that's where machine learning is required. The space of possible T-cells is something like 10 to the 15, and you yourself have hundreds of billions of these things in your body. Trying to use sequencing, which is what Adaptive Biotechnologies secret sauce is, their ability to genetically sequence a large number of T-cells. For an individual, I can tell you from a  vial of blood, you can sequence something like 500,000 to a million T-cells. and then we can read those in our computer and we have that for tens of thousands of individuals. You can imagine, now you have all these strings of letters floating around that represent T-cells. You want to read, what do those  letters mean, becase those T-cells encode. the memory of all the things you've been exposed to in your past If we can successfully read that book of your immune system we will be able to tell you all the things you've been exposed to in the past, and things you may be actively fighting which is the area we've been mostly focused on which is building diagnostics of things you're actively fighting now Sam Charrington: [00:07:53] A couple of questions based on your explanation. The first is, you mentioned that T-cell production is, in many ways random. The result of some stochastic process, so the 500,000 T-cells that you mentioned you might pull from a vial of my blood isn't some historical DNA record of 500,000 diseases . There's some number of diseases that have created T-cells but then there's a lot of randomness built-in Am I getting that right Jabran Zahid: [00:08:23] That's a wonderful question What it really is is the process by which these T-cells are produced is called VDJ recombination Essentially in your thymus different groupings of amino acids are inserted to create the T-cell receptor . Now, those are naive T-cells they don't know what their cognate pathogen is You just have a huge number of them This is the beauty of the adaptive immune system It just creates a huge number of them It's only when those random ones of the naive ones encounter a pathogen to which they latch so that key fitting into the lock, that's when they proliferate. They clonally expand they start reproducing themselves, and they retain a memory, and they become what are called memory cells This is a very simplified version of it but essentially what happens at that stage those will stick around in your blood far longer than the ones that are naive To your question specifically,when we draw the vial of blood we have a huge number of these naive cells The vast majority are naive cells actually but not all of them One to some fives of percent are these memory cells and discriminating between the memory and naive is one of the major challenges of our project and that's something we're very actively engaged in. Sam Charrington: [00:09:38] We'll come back to that in a second I want to ask another question I had about this. Maybe it is the same question, when you're doing the sequencing, is the sequence of proteins directly telling you the receptor or something about the receptor or is there something more fundamental about a T-cell that is coming out of the sequencing? Jabran Zahid: [00:10:02] That's great question The sequencing is what's known as the CDR3 region which encodes the receptor itself The sequence is just amino acids 20 different possibilities ofA's C's T's G's whatever but amino acids are encoding for proteins which then make up the structure of the receptor. In your mind, the picture you should have is literally the lock and key picture? That there is a structure to this receptor. It has to physically fit the pathogen that it's trying to bind in a way that it binds through a physical chemical bond, essentially. If the shape is right, then those two things will come together and it'll be a good fit, and that's when the immune response starts. Otherwise, nothing happens. Those cells just float around. Sam Charrington: [00:10:51] When you're using machine learning to distinguish between the random T-cells and the ones that are activated and have identified their pathogen, it's not within that protein sequence because the receptors are the same. Is there some other flag or characteristic that distinguishes the two? Jabran Zahid: [00:11:12] Generally, if one really wanted to get the ground truth, you would go and you would look at surface markers on this T-cell so not the receptor itself, but the T-cell that would help you distinguish between whether it's a memory or naive cell. The way we go about understanding that issue is by looking at other characteristics. One of the primary characteristics is what's known as the publicity of the T-cells. These T-cells have a range of generation, but probabilities of occurring in any individual, which is referred to as a generation probability. The probability is generated by this random process of VDJ, and for ones that have reasonably high generation probabilities, there's a good chance you'll see them in a number of individuals. One of the standard ways that we set up our experiments or are the methods by which we get and arrive at finding the collection of T-cells that are both memory and specific to a disease is, COVID's a great example. You have thousand individuals that have COVID, we've drawn their blood. We've sampled their T-cells. We compare that against a thousand people, a control sample that don't have COVID, and we simply ask the question, which T-cells appear in a statistically significantly higher frequency amongst the individuals that have COVID as compared to the individuals that don't. That gives you your set of T-cells that may potentially be T-cells that are actively fighting COVID and then you do all your machine learning and things like that from there. That's the starting point of our diagnostic procedure. Sam Charrington: [00:12:47] Got it. It sounds like a great application for some pattern matching. Jabran Zahid: [00:12:51] Yeah, absolutely. You can really imagine some of the tools of natural language processing coming into here, because these are literally just strings, but you got throw in a little bit of physics too, because they're encoding for physical properties of a thing. It's a complicated problem, which we're just scratching the surface of right now, but really have made enough progress that it's clear to us this is going to be something that's going to yield very important techniques for us understanding human health. Sam Charrington: [00:13:18] Before we dig into the technology aspect, I just want to hit pause briefly and ask you, you talked about your background as a astrophysicist and cosmologist, I did not hear doctor, biologist, any of that, but yet you're speaking very fluently about the biology. I'm just curious about that process for you coming up to speed in this domain and how you approached it, and if there's anything interesting from your background that you brought to this problem area? Jabran Zahid: [00:13:52] Starting out on this project I had a high school biology understanding of the immune system, and then whatever Wikipedia told me. I didn't have any sophisticated knowledge. That was the primary challenge. The tools that I had learned along the way for studying galaxies and cosmology were very applicable and translated, very straightforwardly to the problem, and the techniques, and the training, and the craft of doing research was something. I had been doing research for 20 years. I understood and had great mentorship that really gave me those skills, but the domain specific knowledge was the greatest challenge, and remains my greatest challenge to this day. You may say I speak of it fluently, but in my mind I feel that ignorance is outweighing the knowledge that I have on this subject. I appreciate you saying that, but the reality is that that's been the challenge. Basically, the way you approach a science problem is you got to start playing with the data, but at the same time, you got to contextualize that exploration of the data in what is known in the field. The way I've gone about doing that is of course reading a huge amount of the papers that have the 30 years of immunological, 40 years of immunological research on the subject, going to conferences when possible, that's been a little bit more difficult these days, but scientists have made huge strides in virtual conferences. One of the most important things is talking to my colleagues that are immunologists and just asking what, sometimes it may seem like a stupid question or a dumb question, but it's really just a reflection of my own ignorance and trying to fill that in. That's what's gotten me this far and I feel that filling in those gaps, combined with the techniques that we're developing as a team, using tools of machine learning, are really the things that are going to be required to take this project to the next level. Sam Charrington: [00:15:50] Let's talk about some of those techniques. You describe the setup, at least high level of this pattern matching problem. You've got your folks with an identified disease. You've got your control group. You take a bunch of T-cells from all of them, and you're trying to figure out which T-cells are more significantly evidenced in your exposed group. What machine learning approaches do you apply to a problem like that? Even the step before that, what is the data collection, and so many of these are supervised techniques, labeling process look like for this kind of problem? Jabran Zahid: [00:16:32] We can take COVID as an example, it varies from disease to disease. COVID encapsulates much of the process, which is, in some sense, a process that's ubiquitous in any machine learning process. You collect your data, which is drawing vials of blood, and for COVID, the way we did that was Adaptive has partners throughout both industry and academia, so the ground truth, oftentimes, not always, but most of the times, was taken as a PCR test. If someone had a PCR positive test, we know this person has the virus in their body, and therefore they not only exposed or infected. Let's draw their blood, that's where the labels are typically coming from. There are other subtleties involved, which we don't need to go into. Then, you get your label data, and now we have a huge number of... Sam Charrington: [00:17:25] If I can jump in quickly there. These PCR tests aren't perfect. They have whatever the false positive rate is for the PCR test, false negative rate. Do you try to adjust for that in the process, or either by some kind of quorum technique, multiple tests or mathematically somewhere? Jabran Zahid: [00:17:48] Yeah. Different ways, depending on the circumstances in which we address that issue. Oftentimes what we see is that these false  negatives, which are somewhere at the level of 5% or so, I think that's typically the number. You see them as outliers, but we have large enough samples and that's  just part of the game. There's going to always be... Sam Charrington: [00:18:06] Another source of noise. Jabran Zahid: [00:18:08] Yeah. There's always noise and you just deal with it and it depends on the circumstances and how it's affecting your system, so it's certainly an issue, but we are well equipped to handle that. Sam Charrington: [00:18:16] Okay. Jabran Zahid: [00:18:17] Yeah. Then we have our label data. In any machine learning project, one of the things you really want to do next is, once you collect the data, is determine your features. At the highest level, our features are these public sequences, the sequences of these T-cells that appear in multiple individuals in a statistically higher frequency in the individuals who have whatever endpoint we care about. In the case of COVID, people who have COVID versus individuals in our control sample, and then we just count those sequences. How many of those are occurring in an individual, and then do a simple logistic regression model, and that gets you pretty far. It's impressive how far that can get you. Just like any machine learning application, usually the simplest models gets you 90% of the way there. You have to start with the simplest models because you have to have a baseline, and you can interpret them much more easily, so that's where we're at in terms of our diagnostic. We have the simple model that we can submit to the FDA and it has been authorized by the FDA, but of course you want to extend on that. We have this enormous data set, and how do you push that further? We don't care about just whether you have COVID or not. We want to know other things that we can learn from this data. One interesting application is in addition to these tests where we just sequenced what we call the repertoire, so the T-cells. There's laboratory experiments in which we take the actual pieces of the virus of COVID and put them in test tubes and throw a bunch of T-cells at them and see what sticks to what. One of the issues with the diagnostic approach that I described is you see that these T-cells are statistically occurring in a higher statistical frequency in the cases versus the controls, but you don't really know for sure whether they're specifically  attacking COVID. These laboratory experiments allow us to make that test, which is take those pieces of the virus, when the virus enters your body, the way your immune system responds, as it chops up the virus and then presents it essentially on a surface of a cell to the T-cell to come along and grasp onto it. There's a presentation step and that presentation is usually about 10 or so amino acids of the virus. It gets chopped up. We chop up the virus, throw it in a test tube, throw a bunch of T-cells at it, figure out which ones stick and then ask the question: of the ones that are sticking, how many of these do we see in our diagnostic? In the public cells that we comprise our diagnostic. The upshot of all of this is now we have the ability to both know that the T-cells that we have in our diagnostic are attacking COVID, but not only that, but what they are attacking in COVID. What part of the virus are they attacking. Sam Charrington: [00:21:06] Meaning which 10 protein sequence is the receptor latching onto in particular? Jabran Zahid: [00:21:13] Exactly. 10-ish. That's just a rough number. One upshot of this is we can distinguish now between whether this T-cell is hitting the spike protein, which is the protein that encodes the spikes on the surface of Coronavirus or the envelope protein, which creates something else. If you follow the vaccine development, and one thing you note is that almost all the vaccines, certainly all the ones that have been approved in the United States, all target the spike proteins.  They don't introduce the  whole Coronavirus. They just cut out the spike protein and whether it's mRNA virus where they just indirectly introduced that RNA into your body or whether it's something like the Johnson & Johnson, which they attach it to a vector, like a common cold virus and they attach it. In any case, that's what your body is building your immune response up to, and the fact that we can discriminate between what the T-cells are responding to means that our diagnostic has the power and we're working on this very diligently, to discriminate, whether you have had a vaccine or a natural infection. That has important implications for things like trying to understand people who get reinfected after a vaccine, for example, and vaccine manufacturers will really care about that. COVID whether we like it or not, it's going to be here for a while, so this is really providing an ability for us to begin to understand and dissect the disease in a way at level of resolution that hasn't been previously possible. Sam Charrington: [00:22:49] I'm not sure I'm following that. How does this technique allow you to differentiate between folks that have T-cells because they were vaccinated versus the naturally occurring virus? Before you do that, I love that you refer to the set of T-cells that a person has as a repertoire, like it's a certain set of skills. Jabran Zahid: [00:23:15] That's what the field refers to them. That's a bit of jargon, but I love that too. I'm glad you picked up on that. That's cool, right? That's that's the technical term for it. Again, the diagnostic that we build works by counting up the T-cell response. You count up the different T-cells and now that we think are specific to COVID. Now what we can say is these T-cells are specific to this subset of all of our T-cells that we think are in our diagnostic. Let's say we have 10,000 T-cells in our diagnostic, some fraction of those are attacking the spike protein, and some fraction of those are not attacking spike they're attacking the envelope, and the spike protein is a small fraction of the genome of Coronavirus. There's something like 10,000 amino acids and the spike is only a few hundred to a few thousand. I don't remember the exact number, but if we know which T-cell is attacking, what, in people who have vaccination. We only observed those T-cells that are targeting spike in those individuals. It's actually quite, it's amazing how robustly we can do that. Whereas someone who has a natural infection will have a response that covers a much broader range of the T-cells. Sam Charrington: [00:24:26] It's really speaking to both the granularity of the problem, and I'll elaborate on this in a second, but also the diversity of T-cells that you are speaking to, it's not the case that there is a Coronavirus T-cell and there's one and only one. It's that there's a family of T-cells that attack different aspects of the Coronavirus, and maybe even multiple that attack the spike, and the population that someone has of each of the, possibly many in this family, can tell you a lot about how they acquired the virus. Jabran Zahid: [00:25:04] Absolutely. That's partly where the machine learning comes in. How the immune response was triggered and that's really where that machine learning comes in. Finding those deep, deep patterns encoded in those receptors. What makes these T-cells specific to COVID, and what's similar about these two that we know are hitting the spike protein and things like that. That's really where. The next step of the project really requires this very sophisticated modeling. A problem we haven't cracked by the way, despite many, many, many different attempts, so it's a very difficult problem and can only be addressed with the tools and sophistication of machine learning algorithms. Sam Charrington: [00:25:52] We started out talking about logistic regression and the supervise problem where you've got the test results as labels, and now you're starting to talk about things that sound like clustering and unsupervised types of problems. Is that the general direction that you're heading with this kind of analysis? Jabran Zahid: [00:26:11] Absolutely. The unsupervised techniques provide a means for clustering. For example, dimensionality reduction. The standard approaches is that one would throw at any problem with very, very high dimensionality and large parameter space, but that's only the first step. The real question, the heart of it all, is we want to read the immune system. What we call the antigen map is I give you a T-cell and its receptor and you tell me what antigen that T-cell will bind to because it's only then that we can read off your immune history. When we draw your blood, we may know this T-cell is a memory cell, but we won't know if it's a memory cell to the common cold or to Coronavirus or to some bacteria. We won't know that just from looking at it, we'll have to use the sequence and understand how that sequence encodes the information about what it has attached to in the pas; what it's bound to in the past. That's where the machine learning really comes in and you can imagine the complexity of the problem. We're literally trying to read the immune system in a way that allows us to read your immune history. It's just a bunch of strings when you look at it on this computer screen, and so the challenge is going from that bunch of strings on your computer screen to a physical mechanism and physical system and the physical properties of that T-cell that really give us the information about what it's binding. Sam Charrington: [00:27:47] You've tried a lot of things and have a list of things that haven't worked. What are some of those things? Jabran Zahid: [00:27:54] That's a great question. It's pretty interesting because a few researchers have come onto this problem since I have and everyone treads the same path in some sense, which is, you come in and you say logistic regression, how are you still using logistic regression to do this? That's that naivete that's required to really try some interesting, crazy things in science. One of the obvious things is how far could we carry this analogy of we're trying to read the immune system. One of the things I tried was to take BERT, which is a well-known natural language  processing model. It's called a transformer. It's a model that's essentially it's used in natural language processing tasks for questions and answers on a bot or translation. It's a very  multi-faceted tool. Natural language processing is a field in which machine learning has really matured and they have techniques and approaches by which what they called transfer learning, where you can take a model trained in one domain, this happens in image analysis as well, but you take the model trained in one domain, let's say all of the web pages of Wikipedia, and then apply it in another domain. You do this training in this huge data set, and then you fine tune it to your specific problem. It works to varying degrees depending on the nature of the problem, but that's besides the point. The question I asked is, can we just use this model, this transformer type, natural language processing model to read the sequences and see if we can get somewhere? It turns out it just doesn't work. It doesn't work at least in the way that we set it up. It's not surprising. These sequences and the analogy breaks down between natural language and biophysics and biochemistry. Understanding where that breakdown happens is one of the most critical questions to really figuring out what the right set of algorithms and the right set of constraints and the right data. In some way,s the right setup of the problem, that's one of the most difficult tasks and machine learning is setting up the problem appropriately. Hopefully these failures will help guide us to the path that's going to lead us to success. Sam Charrington: [00:30:14] Are there specific things that you can share that you learned in attempting to apply BERT to this problem or specific places that it broke down? Jabran Zahid: [00:30:24] I didn't push it too far. I would say that the one thing that immediately stood out to me was it worked to a degree. At first, I was very excited. I was like, "Wow, this has predictive  power on specific tasks," and so, "Hey, let's publish this or let's use it," but it turned out, BERT is something like a hundred million parameter model. It's a really, really huge model, which, unless you have a lot of data it's not really justified. The reason it was working is basically the way BERT is designed, as I understand it is, typically you have an embedding, a layer that does all the embedding, and then you have this layer that, you attach to it on the end, that does essentially the decoding slash whatever task you care about and more or less most of the interesting stuff was happening in those surface layers. You could really reduce the model down, take away the 700 odd hidden layers and still get the same level of accuracy, and then in fact, what that led me to realize was there's actually even simpler models like Random Forest, embarrassingly, that will get you that same level of accuracy that was in BERT, and one of the lessons I honestly took away from that was don't rush to the most complicated models, start with simple models and build up from there. That's what we've been doing. One of the ways we've been approaching this problem and one of the things we've learned by going to this approach is that, you have these strings of amino acids, you cannot just substitute in random positions, new amino acids, and think that it will bind to the same thing. The places where substitutions can happen in the amino acids is very specific places and only changes from very specific amino acids to different amino acids, and this of course begs the question why is that the case? We suspect this has to do with the physical properties of being amino acids themselves. Some are interchangeable. This is known because of the physical, chemical properties of these amino acids have been measured in the laboratory. Putting that physical picture together, which came into sharp relief when we started by using complex models, but understood that actually simpler models can get us there has really guided us on the path of understanding the problem. It's not just  enough what we're dealing with human health. It's not just enough to predict things. We need to understand why those predictions are happening the way they are, otherwise we run a serious risk of producing essentially a black box, and we found in human health, often you have confounding signals. You think you're seeing one thing, but it's actually being caused by something completely unrelated, and when you don't fully understand what your model is doing, you can fall into those types of traps. Sam Charrington: [00:33:15] With regard to BERT, it sounds like you are using, you mentioned a transfer learning, sounds like you were using a pre-trained BERT model and trying to fine tune. Did you also try to train from the ground up? Jabran Zahid: [00:33:31] Yeah, we did. The thing that we took from BERT was the unsupervised training step, which was, what BERT does is it would take a sentence and it would mask out random words in that sentence and then try to reproduce what was masked out and that's unsupervised because it... Sam Charrington: [00:33:47] It would seem to preserve some of the positionality that you require for proteins? Jabran Zahid: [00:33:54] Exactly. We would mask out random amino acids and then try to reproduce the sequence on the other side, you start with that unsupervised task. That'show you do the pre-training, so to speak, and then you slap on a layer for a classifier or whatever your specific task is, we definitely tried that and it was successful, as I said, but what we came to learn was something like a Random Forest is a lot easier to interpret. What is it that's what we're learning and through that procedure we learned that, "Oh, it's actually positional information and very specific types of substitutions that are allowed." It was a lesson that I've learned many times you're doing machine learning, which is don't go to the complex models. Don't go to what's sexy necessarily right away, unless it's warranted, but we also follow our passions, and sometimes you see the new shiny new model and you want to try it. BERT may make it easy, and natural language processing community in general makes it very easy to take models out of the box and use them. Something I think that the rest of the sciences and certainly immunology would benefit greatly from is making progress in that way as well. Sam Charrington: [00:35:01] Awesome. Tell us about where you are with this project relative to where you want to be and what the future path is? Jabran Zahid: [00:35:11] Yeah. We have made significant progress in the last year, driven by COVID not only the fact that it's one of was, and remains one of the greatest immediate challenges facing humanity, but also it provided an accelerant for us to bring together all our techniques that we've been working on. I described, for example, these laboratory techniques where we throw a bunch of T-cells at the pieces of the virus, bringing that together with our diagnostic approaches has demonstrated  this application I was describing for discriminating between vaccine versus natural infection, et cetera. We really brought together a lot of the different techniques and demonstrated the power of these techniques. Not only to ourselves, which is the one of the most important things, but to the world, by having these diagnostics that are authorized by the FDA, and I may be wrong about this, but I'm pretty confident that these are some of the very first, if not the first, COVID diagnostic machine learning diagnostic approved by the FDA. That in and of itself is an amazing accomplishment, and there's a lot of back and forth on how do you do that and things like that validated, et cetera. That's an interesting  side note. We made enormous progress. The ultimate goal is the antigen map. As I've described in the beginning, which is this ability to take any T-cell and understand what it's meant to target. My hope is that five years from now, when we look back at this moment, we'll see it as a watershed moment. We will have arrived at a firm understanding of whether that is even possible, whether the antigen map is possible because the reality is, we often refer to it internally as a moonshot. It's a high risk, high reward venture, but if we are able to succeed in this, we will have the ability to understand immune risk to the human health in a way that humans have never had before. It will impact therapeutics, diagnostics, every aspect of how we treat human health. I'm excited to be a part of this. I hope we succeed. I hope we are able to provide this great benefit to the world and we'll see if we can succeed or not. That's the question that we've set out to answer and hopefully in five years, we'll have an answer to that question. Sam Charrington: [00:37:30] Awesome. Well, Jabran, thanks so much for doing the work, but also coming on the show to share a bit of it with us. Jabran Zahid: [00:37:38] Sam, thank you so much for this opportunity to share the amazing work we're doing on our team. Thank you. Sam Charrington: [00:37:43] Thank you. All right, everyone. That's our show for today to learn more about today's guest or the topics mentioned in this interview, visit TWIMLAI.com. Of course, if you like what you hear on the podcast, please subscribe, rate and review the show on your favorite podcatcher. Thanks so much for listening and catch you next time.
Sam Charrington: [00:00:00] Welcome to the TWIML AI podcast. I'm your host, Sam Charrington. Hey, what's up, everyone. Before we jump into today's interview, I'd like to give a huge thanks to our friends at Microsoft for their continued support of the podcast. Microsoft's mission is to empower every single person on the planet to achieve more, to inspire customers to reimagine their businesses and the world. Learn more at Microsoft.com/AI and Microsoft.com/innovation. And now, onto the show. All right, everyone. I am here with David Carmona. David is the general manager of artificial intelligence and innovation at Microsoft. David, welcome to the TWIML AI podcast. David Carmona: [00:01:01] Thank you, Sam. Pleasure to be here with you. Sam Charrington: [00:01:04] It is great to have you on the show. And I'm looking forward to digging into our conversation, which will focus on AI at scale and large scale language models, and a bunch of really interesting things you're doing there. Before we jump into the topic, though, I'd love to have you share a little bit about your background and how you came to work on all this cool stuff. David Carmona: [00:01:25] Yeah. Well, I've been in Microsoft for almost 20 years, 19 and a half. Sam Charrington: [00:01:30] Wow. David Carmona: [00:01:30] So, almost getting to that magical [laughs], magical moment. And it's funny because my beginning with Microsoft, I was [inaudible 00:01:37] to Microsoft. That was 20 years ago. So, that was the big Windows moment. Right? But actually, I didn't come to Microsoft because of Windows. I came to Microsoft because of, … At that time, my favorite product, which was Visual Studio. So, I was a developer. I still am a developer. I will always be a developer no matter what I am. Sam Charrington: [00:01:57] [laughs]. David Carmona: [00:01:58] And for me, working in Visual Studio has been like my entire career. So, [inaudible 00:02:04] I started with AI and, and VR probably way too early [laughs]. That didn't end well. So, I ended in traditional development. And I had a ton of fun with that. And I, when I move … I'm originally from Spain. When I moved here to the US [inaudible 00:02:17], I worked in, in, in Visual Studio. So, I ended managing the business for Visual Studio and all our tools like .NET and, and all of that. It was a super fun time because it was that big transition in Microsoft to open development. So, I was lucky to do things like launching TypeScript. Right? Or- Sam Charrington: [00:02:36] Oh, wow. David Carmona: [00:02:36] … open-sourcing .NET or making it cross-platform, or releasing Visual Studio code. Right? So, super fun stuff. But then like five years ago, this AI thing started to become super real. So, [laughs] I was, I was offered to lead a new team in Microsoft, focused on the business, on creating a new business for AI. And I, I didn't think about it twice. So, yeah, that's where I am. So, it's interesting … So, as you can see, my career is always like, between technology and businesses. I think … I, I mean, knock on wood, but I think I'm in, in that great balance right now [laughs]. So, I have both. I'm super fortunate to have both because I work, connecting with Microsoft research and, and the entire organization of technology and research in, Microsoft. My goal, my team's goal is really to connect that with the business. So, we work on … We define it as themes, like bigger themes of innovation in Microsoft. And then we connect those themes to actual real products and technologies that we can take to market. it's super cool. And one of those things … We have many, but one of them … I think like, probably the start of the themes is, is AI at scale. Sam Charrington: [00:03:46] Okay. And so is the role primarily focused on taking innovations that are happening in research to existing Microsoft products? Or is it more focused on creating new business opportunities? Or is there some balance between the two? David Carmona: [00:04:01] Yeah. It's a balance. So, we have … The way that we work in Microsoft on our framework for innovation is based on Horizon. So, we have … We refer to them as the three [inaudible 00:04:10] Horizon. Right? So, we have Horizon 1, two, and three. Three, Horizon 3 are the like, the moonshots, right? Like, longer-term new business creation, new category creation for Microsoft. A lot of that is, driven by curiosity, in most cases, in research. So, we leave a lot of room for researchers to work on those themes. But then we go all the way to Horizon 2, which are things that are really about opening new opportunities or creating new opportunities for existing products. And you can go to Horizon 1 even, which is extending existing products. Right? So, making them better. So, we work in that, in that balance, between the three. Sam Charrington: [00:04:52] Nice. And so you mentioned AI at scale as being one of your big focus areas. What exactly does that mean at Microsoft? David Carmona: [00:05:00] Yeah. So, AI at scale, I mean, we, we named that as a new category. So, it's not that it's a product or anything like that. So, it's how we refer to what we believe is a huge change in the way that we are going to see people developing AI. And it's driven by m- many different things, many different trends and technology breakthroughs. But I think the most important one is this concept of massive models and, and what they mean. Right? So, this, this ability to create now, like, this huge [laughs], massive models with billions of, of parameters. And beyond the technical achievement, the reality is that those massive models are opening new opportunities that go beyond the technology and get into the business. Right? So, we can discuss it today. So, [inaudible 00:05:47] … So, we can spend a lot of time on the technology behind it. And then- Sam Charrington: [00:05:47] Mm-hmm [affirmative]. David Carmona: [00:05:47] … we can, we can focus a little bit on, "Hey, but what does it really mean?" So, how is this going to change the way that any company can develop AI? Right? And, and [inaudible 00:05:59] it's really interesting. And then there's a whole ecosystem around this concept like, that, that you need to, for example, train these models, you need an AI supercomputer. So, that's another piece of the puzzle, right, for AI at scale. Sam Charrington: [00:06:14] So, we talk a lot about the increasing size of models and, you know, particularly in the context of NLP and language models. But help us contextualize that. You know, we throw around, you know, millions of parameters and, you know, hundreds of layers, and things like that. How is it shaking out? Or how do you think of this progression towards larger-size models? David Carmona: [00:06:41] Yeah. I think in, in a sense, you probably remember [laughs] [inaudible 00:06:45] ImageNet moment for, [laughs]- Sam Charrington: [00:06:46] [laughs]. David Carmona: [00:06:47] … for [inaudible 00:06:48] learning. Right? So eh- Sam Charrington: [00:06:49] Uh-huh [affirmative]. David Carmona: [00:06:49] That was, … I mean, [inaudible 00:06:51] many people referring to this moment, like the ImageNet moment for NLP. Right? So, because we get to a point that there's something that allows us to increase the size of the model. So, we go for it. And then we see, "Hey, wait a second. This is getting better. So, the more parameters that I add, the better that this is getting." Right? So, that was the moment in ImageNet with ResNet, for example. Right? That we added so many layers, and, "Hey, this, this image classifier is, is working so much better." So, we are kind of in the same place, but at a totally different scale, right, or order of magnitude. Right? For example, that model, the ResNet model for ImageNet, I think had like 60 million parameters. I mean, a completely different domain. That was computer vision. Now, we're talking about billions of parameters. And, and, and when we see progression, it's being like, very [laughs], very quick. So, [crosstalk 00:07:44]- Sam Charrington: [00:07:46] Mm-hmm [affirmative]. David Carmona: [00:07:46] I don't know. GPT-2. So, the first version was like 100 million parameters. Then, I think BERT was like 300. Then you have Turing NLR. I think it, at that time, was like 1.2 billion. Then you have GPT-2, 1.5. Then you have Turing NLG. That was 17 billion parameters. That was last year [laughs]. We're not talking months ago. That, … We're not talking about, about years ago. And then we had just, just a couple of months after that, GPT-3 with 175 billion [laughs] parameters. Right? So- Sam Charrington: [00:08:18] Yeah. David Carmona: [00:08:18] Every step is 10 times [laughs] [inaudible 00:08:21]. It's a new order of magnitude [crosstalk 00:08:22]- Sam Charrington: [00:08:22] Mm-hmm [affirmative]. David Carmona: [00:08:22] … which is super impressive [laughs]. Sam Charrington: [00:08:24] So, we've kind of transitioned from … In the domain of Vision, you know, we would always talk about the number of layers as an indication of the size and complexity of the model. And now, when we talk about these language models, we tend to talk about parameters. What is that? And how does that tie to the architecture of these models? David Carmona: [00:08:45] Yeah. I mean, behind … It's not that we didn't want to build these massive models before. It's that we couldn't [laughs]. That's the reality. Sam Charrington: [00:08:52] Mm-hmm [affirmative]. David Carmona: [00:08:52] And I think the big breakthrough to really enable these, these sizes of the model is the transformer architecture. And yeah, definitely a lot of say about that. But, yeah, the transformer architecture, it has … I mean, it's also based in layers. In this case, they are symmetric. So, it scales very well because it always has the same number of inputs and outputs. So, you can stack up all the layers. And, and it was a huge change because that broadened the blocker that we had before with scaling these NLP models, is that we were using techniques as, as you know, as recurrent neural networks. Right? Like, LSTM and things like those. And those things are great because it allows you to connect, for example, in a text, the words between words. You can have some kind of memory. So, a word right now can be impacted by words in the text before. Right? And, and you keep that memory. The problem is that the way that we were doing that was very sequential. So, and I mean, by definition, a recurrent neural network taking the previous step as an input. So, you need to finish that step to go to the next one. So, that impacted on the scalability of the models. So, I think with the transformer architecture, we kind of broke that ceiling because now, suddenly, we don't have an architecture that is [inaudible 00:10:05]. So now, in this case, it's all in parallel. We take the, all the inputs in parallel and with some techniques, in particular, … I think the most important one [inaudible 00:10:16] I would highlight too. But definitely, for that work, two things have to happen. One, it's the concept of the positional embedding, so how every word needs to get an input in the, in the model, the position somehow, a flag of an indication of where that word is because that's [laughs], of course, important [laughs]. It's very important- Sam Charrington: [00:10:36] Mm-hmm [affirmative]. David Carmona: [00:10:37] … Where a word is in a sentence to understand the sentence. But then the second thing is this concept of attention or, in this case, self attention, which is a way to kind of replicate that concept of connecting or changing the meaning of words, depending on the words that were happening before, or even in the case of bidirectional [inaudible 00:10:56] words are happening after that. Right? And that's, that's a whole new construct applied to NLP that is proving to be, not only super scalable, but even, performing even better [inaudible 00:11:08] the traditional approach to NLP. Sam Charrington: [00:10:43] Hmm. And so how should we think about how attention works in these kinds of models? David Carmona: [00:10:43] So, I, I, I mean, it's a very simplistic view, but I like to think of it … Because attention is not new. So, we've been using attention- Sam Charrington: [00:10:44] Mm-hmm [affirmative]. David Carmona: [00:11:23] … in, in others … Even in other domains. Right? Like, vision or i- image generation, or … I mean, the most simple example that I use all the time is movie recommendation. Right? So, how do you know if, if a user is gonna like a movie or not? So, the way that you do that is that you take a vector defining the movie in, you know, in any dimensional space. And then you take another vector defining the taste of the user. And then you multiply those vectors, right, to get the distance, the, like, the cosine distance or similarity between those two vectors. And that's an indication how much the user will like the movie. That's that's attention, but in the case, of two different entities. Right? My taste and the movie. In this case, self attention is like doing similar, but with a sentence with itself or with a text with itself. Right? So, but in this case, the w- the attention that we want to measure is the connection between the words. So, how one word is related or connected to the rest of the words. And at the end, you're gonna have like, a heat map, right, so, where every word is connected in some manner with other words. So, if you're saying, "The kid hit the ball, and he was happy." So, he will be super connected with the boy. Right? So, I mean, super simple because at the end, you have multi [inaudible 00:12:42] attention blocks. And, and then you have all these different layers. It's like trying to understand [inaudible 00:12:49] networks. After three layers, you're lost [laughs]. You are completely lost on [crosstalk 00:12:53]. Sam Charrington: [00:12:53] [laughs]. David Carmona: [00:12:53] But I mean, that's the core principle of it. Sam Charrington: [00:12:56] Mm-hmm [affirmative]. Part of what's interesting here is that, you know, we've transitioned from an approach to NPL that was, like you mentioned … Prior to capturing positionality, you know, we'd take a bag of words of things that was at document level, didn't capture where those words were, didn't really do a good job of capturing the relationships, but we're just looking at the statistical properties of a document or sentence or- David Carmona: [00:13:22] Yeah. Sam Charrington: [00:13:23] … corpus to now looking at the relationships between all of these entities that make up language. Is that part of the power of this [crosstalk 00:13:31]? David Carmona: [00:13:32] Yeah. Yeah. E- exactly. I would say that and then the concept of, of training these models with self supervised algorithms. Right? So- Sam Charrington: [00:13:42] Mm-hmm [affirmative]. David Carmona: [00:13:42] [inaudible 00:13:43] supervised training. I think that's the other thing that, that … It was the explosion in all these models, is how now, … Because this scales amazingly well, now, you can afford training these things with huge amounts of data. Like, for example, the entire internet [inaudible 00:14:00] kind of. Right? Which is kind of what we're doing with this model. So, we take the text on the internet. And then depending on the model we can go in, in a little more detail in there if it's a [inaudible 00:14:10] model or representation model. With smart techniques, you take that. You take … You mask that text, so the, so the model can try to guess either the missing words or the words that are happening after a given text. And by training that with that input, that you are almost not touching at all. Right? So, it's all self supervised, [inaudible 00:14:31] and, and all of that. The model can actually learn very complex concepts and relationships. Sam Charrington: [00:14:37] Mm-hmm [affirmative]. You mentioned different types of models. Elaborate on that a bit. David Carmona: [00:14:41] Yeah. So, I think, the way that … And, and we can talk that more about because at the end, these same concepts can apply beyond NLP. But if we focus just on NLP, they are the main families of models. One is that I think people are super excited also because of Turing NLG and because of GTP-3. Those models are generation models. So, they are a natural language generation model, so NLG. And in that case, what … The way that that model is trained, they are called ultra aggressive models because you train the model with the, a lot of text. But then you train it to guess what is gonna happen, what text goes after a particular text. Right? So, they generate … They are super good, generating text, like guessing the end of a sentence or guessing an entire document, or guessing how a movie will, will end, or whatever [laughs] we want to, to guess or [inaudible 00:15:37] text, things, things like those. And that's one big family of models. You have em … Again, like, GTP-3 is an example of that. Turing NLG is an example of that. And then you have another family, which is more about representation, so natural language representation models. And the goal of those is more like, representing the text. So, in that case, the architecture that is, that is used, instead of trying to guess … Or the way that it's trained. Instead of trying to guess what's next, what we do is that you mask some words in the text. And then the model will try to guess it. And they are called bidirectional because in that case, not only they look at what happened before a certain moment, but also after that. So, they will look at the words before and after a particular word to understand the context there. Right? So, those are really good to map like, text to representation, then I fine tune to do whatever I want. Right? So, from super basic sentiment analysis to question answering, or whatever I want to fine tune the model. So, those are like, the two big blocks. Then I like to go a little bit deeper 'cause for each of them, they are two other families that I think are very relevant to understand, which is how, … So, then there's more than one language in the world [laughs]. Right? So- Sam Charrington: [00:16:58] [crosstalk 00:16:59]. David Carmona: [00:16:59] You need to address that. Right? So, in particular, where you are creating real products. So, we are using these models in, in Office, for example. Office is working [inaudible 00:17:07], I feel like, 100 languages. So, imagine doing this for every language would be very [crosstalk 00:17:13]. Sam Charrington: [00:17:13] Mm-hmm [affirmative]. David Carmona: [00:17:13] And that would be the traditional approach of, of doing this. So, we, … And, and Microsoft has been a big believer on the need of doing this thing in an universal way. So, that creates a new family of models that are universal models, right, universal language models. And in the case of Turing, for example, we have both. We have a regular model. And then we have the universal language representation, ULR, so T, Turing ULR, universal language representation. And that is super powerful 'cause what allows us, for example, in, in Microsoft, is to implement features in Word using this, like, … I don't know. Em, semantic search. We don't need to train that feature or that model for every language. We just need to fine tune it for one language. And then you have the feature for free in 100 languages. Right? Sam Charrington: [00:18:03] [crosstalk 00:18:04]. David Carmona: [00:18:03] Which is super cool. So, very, very recommend them to use those models for that. Th- this was, by the way, for people who want to go deeper. There's a paper that I like a lot is [inaudible 00:18:14] 2017 where it explains this, this concept. And, the example that it uses is how you learn math. Right? So, you look at … Well, not me. I wouldn't consider me bilingual. I speak Spanish and a little bit of English, but [laughs] my kids are truly bilingual. And when they learn math, they don't need to learn that two plus is equal four in English, but then [Spanish 00:18:39] in Spanish. Right? So, they just need to learn math once. And then- Sam Charrington: [00:18:43] [crosstalk 00:18:44]. David Carmona: [00:18:43] … they can apply that in different languages. So- Sam Charrington: [00:18:46] Mm. David Carmona: [00:18:46] It's the same thing for models. So you can focus on teaching or training the core concepts, fine tuning for the concept. And then you have it for free in all the languages. Sam Charrington: [00:18:56] Mm-hmm [affirmative]. Yeah. [inaudible 00:18:57] I wanna dig into transfer learning and multitask. These are all things that are coming to mind as you're explaining this. But before we do that, we started out talking about language models as an example of these massive models that require a new way of thinking about, you know, AI at scale. And you mentioned, you know, the progression of the sizes of these models … And you know, it's 10X each time. GPT-3 is, you know, 10X Turing. And one question that occurs to me is, you know, is size the, you know, the most important or the only factor? You know, does it mean that each time we jump a generation, you know, "Let's just forget about the, you know … We shouldn't be using Turing anymore. Let's just use GPT-3 because it's 10X better." I think, you know, there are some obvious reasons why that might be the case, like if they're trained on, on different corpuses. Like, we know that GPT-3 has kind of a very broad public internet. And at least with GPT-2, like, there was a lot of critique about, you know, Reddit, you know, and, and the biases that get introduced there. So, the training set is going to be an obvious differentiator that separates from the size. But I'm wondering if there are other things that we need to be thinking about beyond just the size of the model. David Carmona: [00:20:24] Yeah. Yeah. No, you are right. And I think … So, it's a very simplistic thing to just discuss the models of … Or the parameters of a, of a model. [crosstalk 00:20:35]. Sam Charrington: [00:20:32] Mm-hmm [affirmative]. David Carmona: [00:20:33] There's way more. I have say, though, that the one thing that we are, we are seeing is that the more parameters that you add … Right now, we are not seeing the ceiling of this. So, we keep improving the accuracy and the generality of the, of the model. So, hey, parameters are important. But then at the same time, it is true that it really … So, there's not one model for everything. So, different models are good for different things. Right? And in our case, for example, we, we … Turing, our family of models. It's actually a family because of that. So, we don't believe that one model will … At least right now, will be useful for every single scenario that you are targeting. Right? So, in, in our case, we created that, that family of model, which are inclusive of, of many things, including many different language, like, this basic [inaudible 00:21:27] that I was providing before or, or this, these metrics- Sam Charrington: [00:21:30] Mm-hmm [affirmative]. David Carmona: [00:21:30] … of, of different models. You're gonna need a model for each of them, depending on what you want to accomplish. But then even beyond that, 'cause not everything that you do is NLP. So, in the family of Turing in Microsoft, we have models that are even multi-modal, that include image and text or that are focused on image. And that thing will keep growing. So, that's something important to keep in mind. The other thing is, of course, the eternal debate on the importance of the architectures, right, that, that you're using. So, I think there's a … And I don't have a super strong opinion. I think it's like everything. It will go through phases. It will get to a moment that just by adding brute force parameters, the thing will be very difficult to improve. And we'll need to be a little bit smarter on how we can improve those models. We can optimize those models in, in another different way. But again, I don't want to diminish the fact that we keep seeing that we add more parameters and, and we get more power. Right? One thing that you said, though, Sam, I, I want to, I want to double click on that 'cause it's super important. So, it's the responsible AI implications of the model. I think that will be an an area for models to differentiate and to keep in, in mind when you're using a model 'cause the reality is that, right now, these models, they have a lot of challenges from the bias, transparency, and, and, and others that, that we need to keep in mind. So, we need to just … So, we innovate on the power, accuracy and, you know, multitask aspect of generality of these models, we also need to innovate on the responsible side of them. And eh- Sam Charrington: [00:23:08] [crosstalk 00:23:09]. David Carmona: [00:23:09] As, as you said, the training corpus, that's important. I think right now, we are probably way too late in the pipeline to apply responsible AI principles to these models, meaning that we create things with these models. And then, just then, we apply those things like … I don't know. Like, you know, filtering or many, many other techniques that you can use there. I think we need to go earlier in the process, even at the point of the training, so we can make those models responsible by design. Sam Charrington: [00:23:41] Do have a sense for how we can do that? A lot of the power of these models comes from, essentially, taking the entire internet and building a language model based on it or, you know, large parts of the internet. How do you apply the, you know, how … What are the techniques that we can use to build responsibility earlier at that scale? David Carmona: [00:24:08] So just as an example, but one example in Microsoft could be the Office or the Outlook auto reply. Right? So, what is … So, that is the typical example of a massive NLP model that is taking as an input, an email and, as an output, is creating a likely reply that you want to, that want to do. Right? So- Sam Charrington: [00:24:28] Mm-hmm [affirmative]. David Carmona: [00:24:28] That [scenario on paper, it looks so simple [laughs] il- extremely simple. But when you get into the responsible side of [inaudible 00:24:37] extremely complex. And you need to, you need to pay a lot of attention. And it's not like a one-shot thing that you do, and done. You are, you are, you are golden. The reality is that you need to apply that across the entire lifecycle of the model from, as you said … So, you mentioned one that is important, which is the training data. So yes, of course, we need to get a subset of the training data to make sure that there's no toxic data that is training the model. But that is not, that is not enough. So, we need to keep in mind things like the privacy of the user. Right? So, think of, "How can we … " So, actually, for this feature, we use differential privacy to make sure that the instances that we use [inaudible 00:25:20] surface, they are not … They cannot identify a user or things like those. And you can also think of the input as something that we also manage, that we make sure that they are short answers, that they are not like, long emails [laughs], of course, things like those. So, it's something that you need to do at every stage. There's a ton of research, active research happening right now to really tackle this super complex challenge that we have with these models. Sam Charrington: [00:25:47] Mm-hmm [affirmative]. So, before we jump into how we achieve this kind of scale, you mentioned something in our pre-call that really stuck with me, is this idea that models are becoming a platform. And you know, transfer is a piece of that. Fine tuning is a piece of that. I'd love to hear you riff on, on that idea. I think it's a really interesting way to think about models. David Carmona: [00:26:14] Yeah, yeah. It's not a new concept. So definitely, we've been, seeing … So, you see our services [inaudible 00:26:23] services in Azure. And they support the concept of transfer learning. So, you don't need to train a model from scratch. Right? So, it's … But the reality is that a lot of what we do in AI is training models from scratch for your particular scenario. So, we're doing everything that we can to try to simplify that process because if we don't simplify that process, it's gonna be very difficult to really scale AI in an organization, in a, in a company. So, there are definitely many techniques to do that. I think in the area of NLP, fine tuning is the most relevant now. And then we can talk about some emerging ones that are super interesting and cool. But with the fine tuning process, the idea is that you pre-train … You can use a model that is pre-trained, like our Turing model, pre-train on that [inaudible 00:27:10] information from the internet, multi domain, totally general. And then you fine tune that model. So, fine tuning, meaning adding something to it. Like, for example, you want to fine tune the model to do a sentiment analysis. So, you would add then like, a classifier or something like that, a binary classifier. And then you use label data. In this case, you use like, sentences that are, you know, positive, negative sentiment. And then you fine tune. So, you train additionally. It's like extra steps of training that entire thing with your added classifier, in this case, for example, which is gonna update the weight. But it's not starting from scratch, meaning that you don't need that massive data and the skills because you don't need to change the architecture. You don't need to compute because it's not that much compute needed. So, that is certainly a huge step into democratizing these models. Right? So, that's, that's super important. And not only you can do that for fine tuning for specific tasks, you can also fine tune it for your domain. So, if you work in finance, or you work in health, or you are in any industry, and you want to find a law company … So, you want a law firm. You want to fine tune that model for the domain of your vertical. So, you don't need to train the whole thing. You just need to train for that particular domain. So, super, super important, but then what we're seeing is these models can go even beyond that. And that's a super interesting area. Right now, it's still in the beginnings. But what is the big difference with that approach? So, in this first approach, with fine tuning, you are training the model at some point. I mean- Sam Charrington: [00:28:51] Mm-hmm [affirmative]. David Carmona: [00:28:52] Not from scratch, but you're training it. You are changing the weight of, of the model. You're- Sam Charrington: [00:28:56] Mm-hmm [affirmative]. David Carmona: [00:28:56] You're updating that model. You need [inaudible 00:28:58] to train it. But then we have these other techniques. They are called like, zero-shot or few-shot, where you don't do that. So, the model can learn in [inaudible 00:29:08] time. So, you don't need to change the [inaudible 00:29:11] of the model. You have only a model. You don't change that model. Now, in [inaudible 00:29:15] time, where you are doing the inference of the model, you can … If you are doing a few-shot, then what you do is just provide a few examples of the task that you want to do, and then directly, the one that you want to solve. And the model will do it, which is mind blowing [laughs] that it can do that. But then you have zero-shot, which is like, the mind blowing times three [laughs], which is that you don't even need to provide examples. So, you can ask one of these models, "Hey, I want to translate this to French." And you provide the sentence. And the model will know how to do that. It will identify patterns in the corpus data that it was trained on. And it will know what it means to be, to do a translation. And it will do that translation. So, those techniques, what they are really doing, from fine tuning to few-shot to zero-shot, is making it much easier to really use these models in your particular scenarios for your particular domain, your particular task, or your particular modality. Super cool. Sam Charrington: [00:30:18] Mm. Awesome, awesome. We've talked about different kinds of models. Uh, just a few quick words on applications. Like, you know, what do you think are the most exciting applications of language models generally or, or Turing in particular, you know, within and outside of Microsoft? David Carmona: [00:30:38] Yeah. So what, what I can do because it's a [laughs], it's a big one. We can, we can talk for a long time. I can give you an overview of how we are using it in Microsoft. And then you can get a sense of, of the usages that, that it can have. So, in Microsoft, the way we look at this is like … We always look at these things, any technology is a stack. So, our goal always is to deliver a full stack. So, you just … And that's our approach to any technology. So, we do the research. But then we want to make sure that that research is available for others to, to use. And then we want to make sure that we keep adding layers [inaudible 00:31:19]. for example, the first one would be releasing that as open source. Right? So, we add another layer. We want that to be part of Azure, so you can train those models yourselves, which is the AI supercomputer that we are, providing in Azure to train those models. But then we keep building on that. On top of that, we have things like Azure machine learning. So, you have another abstraction layer that can improve your productivity, fine tuning those models, like [inaudible 00:31:44] mentioned before. But then we put another layer on top of that, which is [inaudible 00:31:49] services, which are end to end out-of-the-box services that you can use as [inaudible 00:31:54] points. And you can infuse directly into your application without worrying about doing anything with, with those models. And then on top of that, we build applications. So, we make them part of our products, like, Office, Dynamics. Or we create new products that were impossible before. So, that's the [inaudible 00:32:11] approach. I think if we focus on the application side, just to give you some, some examples of things that are already available, that people can use that are powered by these massive models [inaudible 00:32:21] a lot in Office. A lot of things in Office are powered by these models. So, you can think of, for example, semantic search in Office [inaudible 00:32:30] you open a Word document, you search for something in that Word document. And that is not the traditional find and replace [laughs] that we had before. This is semantic search. So, you can even ask questions to the document. And [laughs] the document will answer those, those questions. That is all powered by, by Turing. You have things like document summarization. So, you go to SharePoint, and you hover on a document. And you will see a summary of the document in there. That is a … It's an abstraction. So, it's not just taking parts of the document. That is generated with, with Turing. Things in Outlook, like Outlook auto-reply that I was mentioning before, or things like, … There's something meeting, Meeting Insights, that before a meeting, it will give you all the relevant information about that meeting. So, those are like, … In the taxonomy that we were talking about before, those would be Horizon 1. It's about making those applications better. But then we have these Horizon 2 things that are [inaudible 00:33:24] new opportunities that these models can open. And I think a good example of that would be Project Cortex. So, Project Cortex is part of the Microsoft 365 family. And the goal of that project is super cool. So, what it does is that it's able to get all your internal knowledge in your organization by looking at both the structure and the, and structure data in your organization. So, think of documents, meetings, PowerPoints, anything that you have in there, even images 'cause it's able to scan and do OCR on, on images. So, it's able to crawl all that information for your company, and then to extract knowledge out of that. So, what we do is that we create this concept of a knowledge entity. Like, imagine that, … I, I don't know. You are in a law firm. Imagine international, whatever, commerce. I don't know. I have no idea of, of law. But it's like a topic- Sam Charrington: [00:34:23] [crosstalk 00:34:24]. David Carmona: [00:34:23] … that then AI system was able to extract from your information. And it can, it can help you a lot. So, it can give you … It can provide you with a summary. It can give you, what are the most relevant documents for that particular subject in the company, what are the experts, so, who you should talk with about, about those topics. So, it's mind blowing [inaudible 00:34:45] knowledge basis. Right? So that, that you can get … It's extracting the DNA of your company. So, you can really make it available for the, for the rest of the employees. And like, those, I mean, I can [inaudible 00:34:57]. So, every, any product that you can mention [inaudible 00:35:00] use Bing. So, it's another, of course, super important one. Things like question and answer in Bing [inaudible 00:35:05] even the universal search. So, we use this trick of universal language representation in Bing. And those are all available in there as well. Yeah. So, we use it [inaudible 00:35:16]. But more on the business side, I would mention, in Dynamics 365, we use these models for a lot of different things. Very obvious one, of course, is anything that has to do with customer service understanding or, you know, sentiment analysis. All of that in customer service that is- Sam Charrington: [00:35:33] Mm-hmm [affirmative]. David Carmona: [00:35:33] … powered by these models. But then things that are more visionary. So, think of, for example … In Dynamics 365, one of the things that we can provide is suggestions to sellers in your company by looking at any interaction with that customer before, like emails or documents, phone calls, whatever. Right? So, it's able to understand that and structure information, and give you … It's like language generation. But in this case, to take the next steps to your, to your customs. Sam Charrington: [00:36:01] Hmm. David Carmona: [00:36:02] So, yeah. Super, super broad. We could talk for a while. Yeah [laughs]. Sam Charrington: [00:36:04] [laughs]. So, you know, let's maybe jump into what's happening that's enabling all of this to take place now. One of things that … You know, when we think about kind of the scale and size of these models … You know, we've talked about the scale of the compute that has been required to enable it. You know, how do you thi- … And you mentioned AI supercomputers. Like, what's that all about? How do you think about, you know, building out the infrastructure to scale and train these models? David Carmona: [00:36:36] Yeah. Le- let's say that the train model like this in your laptop will take probably thousands of centuries [laughs]. So, definitely, you need a lot of scale to train [crosstalk 00:36:48]. Sam Charrington: [00:36:48] Yeah. David Carmona: [00:36:48] And you need … I mean, it's amazing, the kind of challenges that you get when you grow a model like this. Like, fundamental challenges like, "Hey, the model doesn't fit in your GPU." [laughs] That's- Sam Charrington: [00:37:02] Mm-hmm [ David Carmona: [00:37:03] affirmative]. … Something that we wouldn't use before. Right? So, I think it is like … If you pass 1.3 parameters, something like that, then the model is not gonna fit. So, you better find new ways. But then it's just a computer. So, the time- Sam Charrington: [00:37:15] [crosstalk 00:37:16]. David Carmona: [00:37:16] … required to train one of these models, you need like, ultra [inaudible 00:37:19]. I, and, and I think … So, that's the main reason why we focus on … And like, always, like I was saying, in the beginning, we try to have a platform approach to it. So, not thinking of fixing this problem for Turing, for our models, but fixing this problem for our customers, so they can use this infrastructure as well. Sam Charrington: [00:37:38] Mm-hmm [affirmative]. David Carmona: [00:37:38] So, the approach that we took was building this massive infrastructure in Azure. So, these are massive clusters that are, that you can sting directly in Azure. And not only you can sting, then, of course, you have the complexity when you have … These are … I mean, imagine … For example, the one that we announced a year ago, that is a massive cluster of like, 10,000 GPUs. You have more 200,000 CPUs. So, it's massive scale. So, how do you manage that? You need things that allow you to manage that in a distributed way. And then what is even more challenging is, "Okay. So, I have my infrastructure completely managed. I can [inaudible 00:38:15]." It is integrated with Azure machine learning. So, you can like, launch like, jobs in that massive infrastructure. But then how would you actually do it? So, you have a model that is by definition, huge. So, how do you train that thing? How do you divide this task, this super complex task, into individual [inaudible 00:38:36] in your, in your massive cluster? And that's that's the other side of the coin, which is our work on these like, software systems that are meant to help you in that process. So, this was … At the same time that we announced the AI supercomputer, we also announced … It's called DeepSpeed. It's open source. So you can use it on, on top of anything. And it will help you do that for you. So, what it will do is that it will take this training. And it will distribute that training across a massive infrastructure. So, it will know how to do that in an efficient way. And it does it basically … It's like a three … We call a 3D distribution because it takes like three different [inaudible 00:39:18] to, let's say, chunk this task. Right? One, which is the most basic one, is the data distribution. So, you just [inaudible 00:39:27] your data in smaller chunks. And then you have [inaudible 00:39:30] each node is gonna take one of those chunks. But that is not enough. You need to go further than that. So, the other level of distribution that we use is [inaudible 00:39:39] distribution, which is [inaudible 00:39:41] because of the transformer architecture, that [inaudible 00:39:44] symmetry is [inaudible 00:39:46] to split the [inaudible 00:39:49] layers. So [inaudible 00:39:50] each node will take a different layer [inaudible 00:39:54] communication and optimization going on there that [inaudible 00:39:57] you need to take care. And then the last one is the [inaudible 00:40:00] which [inaudible 00:40:01] even for each of those layers, we can divide [inaudible 00:40:04] smaller chunk [inaudible 00:40:07] a different GPU. So [inaudible 00:40:09] what that allows you, it [inaudible 00:40:11] a lot of research involved [inaudible 00:40:13] this framework. [inaudible 00:40:14] you almost get like, a linear distribution, like, a linear growth in your model. So, you can [inaudible 00:40:20] number of parameters … And by the way, [inaudible 00:40:23] is able [inaudible 00:40:24] more than one [inaudible 00:40:25] parameters. So huh, you can train models that are not even [inaudible 00:40:29] existing today. And you see the line, and it's almost linear. So, it's exactly what you're, you are looking for in these systems. Sam Charrington: [00:40:35] Oh, wow. Wow. And what about on the hardware side? Microsoft announced this Brainwave Project some time ago to bring new hardware architectures to bear this problem. Can you share a little bit about that? David Carmona: [00:40:50] Yeah. So, yeah. We announced the [inaudible 00:40:53] maybe a little bit more ago. But it's fully available now. So, you go to Azure. And you go to Azure machine learning. And one of the options that you have to deploy your model is[inaudible 00:41:02]. And what, what that is gonna give you, especially [inaudible 00:41:05] inference time, is very low latency and a lot of, you know, efficiency in cost. Right? So, it's perfect for massive … I mean, I, I always use the same example. So, this feature in Word, one of the features powered in Word by Turing, is called predictive text. So, that means that, when you type, it's gonna give you suggestion, how the text will continue. Right? So [inaudible 00:41:29] think of [inaudible 00:41:30] intelligence, but, but for Word. 300 million users of Word. Imagine doing the inference of that model in every keystroke [laughs]. So, that's the- Sam Charrington: [00:41:39] Mm-hmm [affirmative]. David Carmona: [00:41:40] That's the scale that we're talking here. it's huge. So, you better optimize that a lot if you want to scale it to that, to that number. And we do that … I mean, you have to do it in, … Again, it's like a game that you have to tweak every single step. Of course, we don't go with this m- multi billion models on inference time. So, there's a lot of optimization to do there to reduce the number of parameters, to even using techniques to make it more efficient. And then there's the hardware. Right? So, we use the ONNX Runtime thing in Microsoft. That can optimize not only for the CPU … So, it has optimization for CPUs, but also for [FPA 00:42:21]. So, it's a way of [inaudible 00:42:23] from the hardware that you have, underneath. And it really allows you to bring all these things that are great to talk from the research point of view. But then putting [inaudible 00:42:33] in action, it requires all this level of detail that is a new level of complexity. Sam Charrington: [00:42:38] Mm. So, this is primarily focused on the inference side. Do you see any … Are there any particular innovations you're excited about on the hardware side for training? Or you, do you see it primarily being evolutions of today's GPUs? David Carmona: [00:42:55] I mean, when we see … I mean [inaudible 00:42:57] super evolving. So, we'll see … The reality right now is that you have to be flexible. So, we are not- Sam Charrington: [00:43:02] Mm-hmm [affirmative]. David Carmona: [00:43:02] … discarding any approach, any at all. Right? So, the reality is that FPA for the inference was super efficient because it allows you to change it. Right? So, it's programmable. So, that was very, very efficient [inaudible 00:43:16] and very agile. The combination of agility and efficiency was, was the right thing. But that may change at, at any moment. And as these things get more stable, then ASIC may be the way to go. And, and, yeah, of course, we are, we are not discarding any, any of those approaches. Sam Charrington: [00:43:32] So, how do you see this level of scale that we're dealing with today impacting the world for kind of users of AI? What, what changes? David Carmona: [00:43:43] I think that the main thing maybe bringing, bringing all of this together is how this will change the way that you develop AI. So, how this will open new ways of developing AI that we can, that we can use right now. So, that whole concept of creating more general multitask, multi-domain, multi-modality models, that then you can customize for your particular task, that is, that has huge implications on how you can … One, how you can scale AI in your organization and how AI can scale to other organizations, like smaller organizations. Right? So, that for us, it's a, it's a huge aspect of, of all of this. And the way that I see it is, is that uh, it's kind of what we experienced in the last 20 years for software. And this is very similar. So- Sam Charrington: [00:44:38] Mm-hmm [affirmative]. David Carmona: [00:44:38] Software at some moment, we had the hard lesson that software has to be super connected to [laughs] the business. So, you have a team of software developers in a basement [laughs] not connected to the- Sam Charrington: [00:44:51] [laughs]. David Carmona: [00:44:51] … business, that is not gonna work. I think we are ki- … AI is in a basement right now, kind of. Right? So, it's- Sam Charrington: [00:44:57] [laughs]. David Carmona: [00:44:57] We are not fully connected to the business [inaudible 00:45:01] because it requires so much like, skills so many skills and expertise that, that it's a very technical domain right now. We need to change that. So, we need to make sure that the business and a- AI come together. And, we learned that with software. It's called devops. It's about bringing the two together, and then doing a small iteration [inaudible 00:45:22]. It's coming to AI. We are all talking about MLops now. It's a huge area. It's our [inaudible 00:45:28] definitely in Microsoft to provide the platform to empower that collaboration and that continuous iteration, and trackability of everything that you do in your AI development cycle. [crosstalk 00:45:37] and that will be, massively be empowered by AI at scale. So, you have models that can really empower like, a more dynamic way, so you don't have to create from scratch, these models. You can iterate on them with the business and just focus on teaching your domain to the model instead of starting from scratch. That goes in that direction. We do think that there's one step beyond that. We are also seeing … We also saw it with software. That also needs to happen with AI, which is really going beyond the technology and the businesses, and getting to every employee. So, how every employee in an organization should be empowered with AI just like they can Excel right now to [inaudible 00:46:21] numbers [inaudible 00:46:21] that for AI. So, every employee can apply AI, and not only apply it, but also create, consume, mix and match [inaudible 00:46:31] of having some level of freedom to really apply AI to, to what they do. That's another huge area, like the augmented intelligence area. Sam Charrington: [00:46:41] Mm-hmm [affirmative]. David Carmona: [00:46:41] That [inaudible 00:46:42] models, we, we may see it happening sooner than later. Sam Charrington: [00:46:45] Awesome. Well, David, it's been wonderful to catch up with you and to dig into some of the work you're doing around AI at scale. Thanks so much for taking the time to chat with us. David Carmona: [00:46:58] Thank you so much, Sam. It was a pleasure. Sam Charrington: [00:47:00] My pleasure. David Carmona: [00:47:01] Thank you. Sam Charrington: [00:47:02] All right, everyone. That's our show for today. To learn more about today's guest or the topics mentioned in this interview, visit TWIMLAI.com of course, if you like what you hear on the podcast, please subscribe, rate, and review the show on your favorite podcatcher. Thank you so much for listening, and catch you next time.
We had a solid kick-off today at TWIMLcon 2021: AI Platforms. The conference started today and runs through January 29, 2021. It’s not too late to join in! Use discount code GREATCONTENT for 25% off registration. We started off the day talking to Solmaz Shahalizadeh, VP of Commerce Intelligence, at Shopify. During her time there, she implemented the company’s first ML products, built their financial data warehouse, led multiple cross-functional teams, and played a critical role in their IPO. In our discussion today, she said something that set the tone for the rest of the conference: “If you’re serious about your data, you want to invest in your platforms.” We could not have said it better ourselves. She also shared lessons learned from building a team of hundreds of data scientists, for example, paying attention to how well each of the team members can articulate the real world impact or how the model will solve a specific business problem. Next up, Aman Khan (Product Manager) and Josh Baer (ML Platform Product Lead) talked about how Spotify built its ML platform to provide service to over 300 million customers. They shared a few of the key tenets that now guide their approach to delivering ML infrastructure: Build infrastructure together: Having your infrastructure teams and your ML teams collaborate to build a common platform serves the organization best. Be opinionated: Having more tools is not better. Fewer tools leads to less custom code, less technical debt, and less confusion for the development team. Make difficult trade-offs: They focused hard on building a platform that served their ML Engineers first and foremost, with the idea that once they nailed that, they could extend it to other roles in the organization. We then shifted gears (pun completely intended) and talked with Sudeep Pillai, ML engineering team lead at Toyota Research Institute. Sudeep shared an overview of the MLOps environment developed at TRI and discussed some of the key ways MLOps techniques must be adapted to meet the needs of high-stakes environments like robotics and autonomous vehicles. He noted that early autonomous driving systems were strongly rule-based and rigid but that there has been a major shift away from rules-based systems and in his words: “ML is eating the Autonomous Driving Stack.” He further shared how ML moved into the Perception, Prediction, Planning, and Control aspects of Autonomous vehicle design. MLOps is sometimes thought of as “DevOps for Machine Learning” it was clear from Sudeep’s presentation that it needs to be more. MLOps at TRI is a complete set of processes specifically adapted not only for ML but also for the AD domain. It feels like the MLOps conversation is truly evolving and maturing when you see conversations like this one. After speaking with Sudeep, we chatted with Mike Del Balso, CEO and Co-Founder of Tecton. He walked us through the issues of feature development, management, and deployment. It seems hard to believe but he noted that just getting a few features into production can delay a project by months or even a year because of the hand-off between the data science team and the data engineering teams. He made an observation which is important to highlight here: “Feature stores are some of the highest value data we have in our organizations and we don’t manage them as such.” Mike went on to share with us some customer success stories (like Atlassian reducing model deployment times from months to days while increasing accuracy by up to 20%). Overall, it was a great discussion and I think we’re going to be all hearing a lot more about feature stores in the year ahead. As we rolled towards the end of Day 1, we had the great opportunity to hear from Dr. Jennifer Prendki, the founder and CEO of Alectio. Before founding Alectio, Jennifer was the VP of Machine Learning at Figure Eight, she built the first ML department from scratch at Atlassian, and she pioneered MLOps at Walmart Labs. Dr. Prendki and her team are challenging the long held belief that more data is a prerequisite to increasing the performance of an ML model. In order to break down her thesis that more data is not necessarily better, she unpacked what she refers to as “Data Prep Ops.” After unpacking Data Prep Ops in great detail which is too long to cover here, she summarized with a few major points: Good data preparation is a prerequisite for doing ML well; There is a Data Prep Ops market that is misunderstood and we as community members need to make it a first-class citizen in our MLOps practices; Data Preparation is more than labeling - It is a multi-faceted set of complex operational processes that are effectively their own discipline; Data prep can not be separated from the machine learning process - these two processes are related. It was a great discussion, with lots of food for thought for practitioners out there wrestling with the “more data = better predictions” status quo. In keeping with the TWIMLconnect theme at this year’s event, attendees had an opportunity to participate in a networking activity towards the end of the day. Attendees were randomly grouped into small breakout rooms for four lightning getting-to-know-you rounds. With smiles all the way around and folks complaining that the rounds were too short, it was clear everyone had a great time. We wrapped up the day with Jeff Fletcher, a Cloud Machine Learning Specialist from Cloudera walking everybody through a workshop exploring how ML can be done on the Cloudera Data Platform, including data preparation, pipelines, and production deployment. Jeff was clearly in his element and happy to show off the power of their platform. Tomorrow, we have a full schedule with: A keynote interview with Faisal Siddiqi, Director of Engineering from Netflix; Todd Underwood, an Engineering Director from Google will discuss what happens when “Good Models Go Bad”; Dotan Asselman, Co-Founder/CTO of theator, and Ariel Biller, Evangelist for ClearML will talk about continuous training; Chip Huyen (who wrote multiple amazing surveys of the MLOps market) will talk about the move to real-time ML; Monte Zweben, CEO of Splice Machine will discuss how you can scale models by moving beyond the traditional database architectures and by combining operational, analytical and feature store databases onto a common platform; Jeff Fletcher from Cloudera will close out the day with a continued look into the power of the Cloudera Data Platform. If this sounds interesting, it’s not too late to register! There are still seven more days of sessions, including Friday’s Executive Summit. Pro Plus and Executive passes provide ongoing access to the conference recordings so that you can catch up after the event.
To hear about Josh Tobin’s various projects in robot learning, check out the full interview! The discussion on projects outside of the NeurIPS paper picks up at 15:58 in the podcast. Enjoy! Robots have a particularly hard time with “reading a room.” In order to know how to act, they first need to understand their environments. It’s like learning where everything is when you enter a grocery store for the first time. For machines, processing the world in terms of three dimensional spatial awareness–figuring out where objects are, how they are positioned, and even the robot’s own placement in the environment–is incredibly complex. In the real-world, scenarios can be unpredictable and challenging to simulate, so how can we improve perception so machines can successfully understand the world around them? Josh Tobin has dedicated his research to improving robots’ ability to accomplish real-world tasks. He finished his PhD at UC Berkeley under Pieter Abbeel, who we had the opportunity to interview for a Reinforcement Learning Deep Dive. Josh recently took a break from his role at OpenAI and joins us from NeurIPS 2019, where he presented his paper on Geometry Aware Neural Rendering, where they successfully improve 3D modeling to more complex, higher dimensional scenes. Neural Rendering and Generative Query Networks Neural rendering is the practice of observing a scene from multiple viewpoints and having a neural network model render an image of that scene from a different, arbitrary viewpoint. For example, if you have three cameras taking in different perspectives of an environment, the model would use that information to reconstruct a fourth viewpoint. The intuition behind this process is that the model has demonstrated a good understanding of the space if it can predict a strong representation of that view. Tobin’s work is an extension of the ideas presented in a paper from DeepMind on Neural Scene Representation and Rendering. The paper introduces Generative Query Networks (GQN), which the DeepMind paper refers to as “a framework within which machines learn to represent scenes using only their own sensors.” The significance of GQNs is that they do not rely on a massive amount of human-labeled data to produce scene representations. Instead, the model gleans the “essentials” of a scene from images, “constructs an internal representation, and uses this to predict the appearance of that scene.” In GQN, they take the problem of neural rendering and set up a model structure that works with an encoder-decoder architecture. As Josh describes, “The encoder takes each of the viewpoints and maps them through a convolutional neural network independently, so you get a representation for each of those viewpoints. Those representations are summed.” This creates a representation for the entire scene which is then passed on to the decoder. “The decoder’s job is to…go through this multi-step process of turning [the representation] into what it thinks the image from that viewpoint should look like.” GQNs are a powerful development, but there is still a bottleneck that occurs when the representation produced by the encoder is passed to the decoder. This is where the geometry aware component (the main contribution of Tobin’s paper) comes in. Geometry Awareness: Attention Mechanism and Epipolar Geometry Josh’s primary goal “was to extend GQN to more complex, more realistic scenes” meaning, “higher-dimensional images, higher-dimensional robot morphologies, and more complex objects.” Their approach was to use a scaled dot-product attention mechanism. “The way the attention mechanism works is by taking advantage of this fact of 3D geometry called epipolar geometry,” which refers to viewing something from two points and defining the relationship between them. In this case, epipolar geometry refers to knowing “the geometry of the scene, so where the cameras are relative to one another.” If you’re a machine trying to render an image from a particular viewpoint, you want to “go back and look at all of the images that you’ve been given as context, and search over those images for relevant information. It turns out, if you use the geometry of the scene [epipolar geometry]… then you can constrain that search to a line in each of the contexts viewpoints” and attend to the pixels that are most relevant to the image you’re constructing. “For each pixel, we’re constructing this vector that represents a line. When you aggregate all of those you have two spatial dimensions for the image. So you get this 3D tensor and you’re dot-producting the image that you’re trying to render…and that’s the attention mechanism.” The New Data Sets In order to evaluate the performance of the new model they developed several new data sets: In-Hand OpenAI Block Manipulation. This is a precursor to Open AI’s Rubik’s Cube project. In this data set, “You have a bunch of cameras looking at a robot hand that’s holding a block. The colors of the cube, the background, and the hand are randomized and the hand and the cube can be in any pose.” Disco Humanoid. This is Josh’s term for the data set because it looks like a “humanoid model that’s doing crazy jumping-in-the-air dance moves.” It’s similar to the MuJoCo humanoid model except that the colors, poses of the joints, and the lighting are completely randomized. It’s meant to test “whether you can model robots that have complex internal states rates with this high-dimensional robot that you need to model in any pose.” Rooms-Random-Objects. The most challenging data set they introduced involved simulations of a room with objects taken from ShapeNet, a data set with over 51,000 3D models. “Each of the million scenes that we generated had a different set of objects placed in the scene. It’s really challenging because the model needs to understand how to render essentially any type of object.” Randomization is a key part in each data set. As Josh believes “If you want to train a model and simulation that generalizes the real world, one of the most effective ways of doing that is to massively randomize every aspect of the simulator.” Evaluating Performance and Results To evaluate their results, the team compared their model with GQN using several metrics, including the lower bound on the negative log likelihood (the ELBO), per-pixel mean absolute error (L1 and L2), and by qualitatively reviewing rendered images of actual scenes. “The main result of the paper is we introduced a few new data sets that capture some of those properties, and then we showed that our attention mechanism produces qualitatively and quantitatively much better results on new, more complex datasets.” They have yet to test the system in the real world. Josh confirms “right now, it only captures 3D structure, it doesn’t capture semantics, or dynamics, or anything like that, but I think it’s an early step along that path.” Josh is working on various related projects around data efficiency in reinforcement learning and some fascinating work on sim-to-real applications. To hear about them, check out the full interview! The discussion on projects outside of the NeurIPS paper picks up at 15:58 in the podcast. Enjoy!
Ken Goldberg is involved in several projects in collaboration with multiple organizations at UC Berkeley, including some technology-based art projects. To hear about all of them, check out the recent TWIML AI Podcast interview, The Third Wave of Robotic Learning with Ken Goldberg. Ever thought you had a good grip on your phone and then watched in slow motion as it fell to the floor? Generally, as humans we’ve learned to gauge how to pick something up, and we usually don’t have to think about the microdecisions and movements involved. But even for us, grasping objects and maintaining stability can be difficult at times. It turns out the seemingly simple task of grasping an object is an even bigger challenge for robots, because they have to learn the physical dexterity grasping requires from zero prior knowledge. So how do we efficiently teach machines this skill? Ken Goldberg is an engineering professor at the University of California, Berkeley where he runs the Laboratory for Automation Science and Engineering (AUTOLAB). The lab is focused on several forms of robotic learning including imitation, deep, and reinforcement learning for a variety of applications spanning surgery to agriculture. One of their major contributions in recent years is the development of the Dexterity Network (Dex-Net), a project that generates datasets for training robust grasping models. The Challenge of Robotic Grasping Researchers have been studying the problem of grasping for decades, but as Ken states, “Robots remain incredibly clumsy today. They’re much better than they were, but industrial arms, if you give them novel objects, they will drop them with a fairly high frequency.” The topic has warranted more attention in recent years with the rapid growth of e-commerce. Training robots to handle packages of various sizes and weights has massive potential for the industry, and large retailers are eager to find a solution, inspiring efforts like the Amazon Picking Challenge in 2017. The act of picking something up sounds fairly simple, but because robots lack physical and perceptual context, it’s a much harder problem than it looks. “Humans and animals… seem to cope very well with a problem like grasping, and interacting with the physical world, because we bring to it a sort of inherent understanding, a deeper understanding about the nature of objects. This is very subtle. I can’t describe this exactly. It’s intuitive to us how to pick things up, but it’s very hard for us to formalize that intuition and give that to a robot.” According to Ken, there are three fundamental elements of uncertainty that make robot grasping extremely difficult: Perception. Understanding the precise geometry of where everything is in a scene can be a complex task. There have been developments in depth sensors like LIDAR, “but they still don’t completely solve this problem because if there’s anything reflective or transparent on the surface, that causes the light to react in unpredictable ways, it doesn’t register as a correct position of where that surface really is.” Adding additional sensors doesn’t help much because they often create contradictions, “[the agent] doesn’t know what to trust” in order to act correctly. Perception is especially important in grasping because “a millimeter or less can make the difference between holding something and dropping it.” Control. The robot has to maintain control of its grasp meaning, “The robot has to now get its gripper to the precise position in space, consistent with what it believes is happening from its sensors.” If the gripper moves slightly or holds it too tight, the object can drop or break. Physics. This has to do with choosing the right place to grasp the object, understanding friction and mass are significant unknowns. To demonstrate how difficult this is, Ken gives the example of pushing a pencil across the table with your finger. We can estimate the pencil’s center of mass, but we ultimately do not know the frictional properties at play. It’s almost impossible to predict the trajectory because even “one microscopic grain of sand, anything under there is going to cause it to behave extremely differently.” What Makes a Grasp “Robust”? For the robustness of a grasp, we want to consider what happens even when the perception, control, and understanding of the physics are slightly off. “If you pick up a glass of wine, for example…Even if the glass isn’t quite where you thought it was, even if your hand isn’t quite where you thought it was, and even if the thing is slippery, you’re still going to be able to pick it up. That’s a robust grasp.” Robust grasps are not uniform because objects vary incredibly. “It turns out that for most objects, there are grasps that are more or less robust. What we’re trying to do is get a robot to learn that quality, that robustness.” “We can generate that by using [physics and mechanics]. Actually it goes all the way back to centuries of beautiful mechanics of understanding the physics and forces and torques, or wrenches, in space that characterize what happens if we know everything. But then what we do is perturb that statistically and if it’s robust it works for all these statistical perturbations with high probability then we say it’s a robust grasp.” Physics vs Statistics and The Third Wave of Robot Learning There’s some debate in the community around the best approaches to robotic learning, which Ken breaks up into three waves of robotic learning. The first wave is the “classic physics” approach which prioritizes traditional understandings of physics in terms of forces, and torques, friction, mass — all that good stuff. The second wave is the more modern, “data-driven approaches that say: ‘Forget about the physics, let’s just learn it from observation purely’” and assume the physics will be learned naturally in the process. Then there’s what Ken advocates for, which is the third wave of robot learning that combines the two fields of thought. The goal is to synthesize the knowledge from both perspectives to optimize performance. However, “figuring out where that combination is is the challenge. And that’s really the story of Dex-Net.” The Dexterity Network The thinking behind Dex-Net was to do for robotics what the development of ImageNet did for computer vision. “ImageNet really transformed machine learning by having a very large data set of labeled images.” By providing a large dataset of labeled images, ImageNet helped spur on the development of deep learning in general, and machine vision in particular. “The question for us was, could we do something analogous in grasping by assembling a very large data set of three-dimensional objects, three-dimensional CAD models, and then labeling them with robust grasps.” To create Dex-Net they used a combination of both physics and statistical-based deep learning techniques. They first applied “that whole first wave [of physics], all that beautiful theory” to loads of simulated models to find which grasps were robust to noise and perturbations. The Use of Depth Sensors to Produce Simulations Pure depth sensors were used to create three-dimensional models and map the objects in space. All other information was stripped away, “I don’t care about the color of things or the texture on things. In fact, that’s a distraction.” Depth sensing makes for nice simulations and perfect models that perturbations and noise could be applied to. In the perfect model, “I have an arrangement of points in space, and then I know when that arrangement corresponds to a successful grasp or not because I’m using the physics and statistical model of the sensor.” After the perturbations, “you have a noisy pattern of points in space, and you know what the true, robust grasp was for that pattern of points…The output is just a scalar or number from zero to one, which is the quality, we call it, the probability that that grasp will succeed.” They’re able to generate millions of these examples fairly quickly (overnight), producing a solid data set to train with. When the machine is shown objects that it has never seen before, it can evaluate the quality of the grasps. “Then what I do is I try a number of different grasps synthetically on that depth map, and it tells me this is the one with highest quality…we consider that the optimal grasp and we execute it. Here’s the thing: It works remarkably well, far better than we thought.” Limitations, Improvements and Applications Those robust examples were then used to train a deep learning system that could generalize to new examples. The system generalized surprisingly well, but as Ken points out, it’s not perfect. The team was able to reach over a 90% success rate, but that was subject to the nature of the objects. “If the objects are all fairly well-behaved like cylinders and cuboids, then it’s fairly easy to do well, but when you have more complex geometries many systems have trouble.” The system still performed well with irregular objects, but did not get close to 100% success. Another limitation is that if you were to change the gripper or sensor, the framework would still apply, but you would have to retrain the system for a new neural network. This is where providing an open dataset and code examples comes in. These can be used to train new grasping models specific to new types of grippers or objects. For an example of Dex-Net in action, check out this video Sam shot at last year’s Siemens Spotlight on Innovation event: In the full interview, Sam and Ken discuss the wide variety of projects he and his lab are working on, from telemedicine to agriculture to art. The conversation on applications picks up at 24:51 in the podcast. Enjoy!
How does LinkedIn allow its data scientists to access aggregate user data for exploratory analytics while maintaining its users' privacy? That was the question at the heart of our recent conversation with Ryan Rogers, a senior software engineer in data science at the company. The answer, it turns out, is through differential privacy, a topic we've covered here on the show quite extensively over the years. Differential privacy is a system for publicly sharing information about a dataset by describing patterns of groups within the dataset, the catch is you have to do this without revealing information about individuals in the dataset (privacy). Ryan currently applies differential privacy at LinkedIn, but he has worked in the field, and on the related topic of federated learning, for quite some time. He was introduced to the subject as a PhD student at the University of Pennsylvania, where he worked closely with Aaron Roth, who we had the pleasure of interviewing back in 2018. Ryan later worked at Apple, where he focused on the local model of differential privacy, meaning differential privacy is performed on individual users' local devices before being collected for analysis. (Apple uses this, for example, to better understand our favorite emojis 🤯 👍👏). Not surprisingly, they do things a bit differently at LinkedIn. They utilize a central model, where the user's actual data is stored in a central database, with differential privacy applied before the data is made available for analysis. (Another interesting use case that Ryan mentioned in the interview: the U.S. Census Bureau has announced plans to publish 2020 census data using differential privacy.) Ryan recently put together a research paper with his LinkedIn colleague, David Durfee, that they presented as a spotlight talk at NeurIPS in Vancouver. The title of the paper is a bit daunting, but we break it down in the interview. You can check out the paper here: Practical Differentially Private Top-k Selection with Pay-what-you-get Composition. There are two major components to the paper. First, they wanted to offer practical algorithms that you can layer on top of existing systems to achieve differential privacy for a very common type of query: the "Top-k" query, which means helping answer questions like "what are the top 10 articles that members are engaging with across LinkedIn?" Secondly, because privacy is reduced when users are allowed to make multiple queries of a differentially private system, Ryan's team developed an innovative way to ensure that their systems accurately account for the information the system returns to users over the course of a session. It's called Pay-what-you-get Composition. One of the big innovations of the paper is discovering the connection between a common algorithm for implementing differential privacy, the exponential mechanism, and Gumbel noise, which is commonly used in machine learning. One of the really nice connections that we made in our paper was that actually the exponential mechanism can be implemented by adding something called Gumbel noise, rather than Laplace noise. Gumbel noise actually pops up in machine learning. It's something that you would do to report the category that has the highest weight, [using what is] called the Gumbel Max Noise Trick. It turned out that we could use that with the exponential mechanism to get a differentially private algorithm. [...] Typically, to solve top-k, you would use the exponential mechanism k different times⁠ —you can now do this in one shot by just adding Gumbel noise to [existing algorithms] and report the k values that are in the the top […]which made it a lot more efficient and practical. When asked what he was most excited about for the future of differential privacy Ryan cited the progress in open source projects. This is the future of private data analytics. It's really important to be transparent with how you're doing things, otherwise if you're just touting that you're private and you're not revealing what it is, then is it really private? He pointed out the open-source collaboration between Microsoft and Harvard's Institute for Quantitative Social Sciences. The project aims to create an open-source platform that allows researchers to share datasets containing personal information while preserving the privacy of individuals. Ryan expects such efforts to bring more people to the field, encouraging applications of differential privacy that work in practice and at scale. Listen to the interview with Ryan to get the full scope! And if you want to go deeper into differential privacy check out our series of interviews on the topic from 2018. Thanks to LinkedIn for sponsoring today's show! LinkedIn Engineering solves complex problems at scale to create economic opportunity for every member of the global workforce. AI and ML are integral aspects of almost every product the company builds for its members and customers. LinkedIn's highly structured dataset gives their data scientists and researchers the ability to conduct applied research to improve member experiences. To learn more about the work of LinkedIn Engineering, please visit engineering.linkedin.com/blog.
Over the past couple weeks I got to sit on the other side of the (proverbial) interview table and take part in a few fantastic podcasts and video conversations about the state of machine learning in the enterprise. We also cover current trends in AI, and some of the exciting plans we have in store for TWIMLcon: AI Platforms. Each of these chats has its own unique flavor and I’m excited to share them with you. The New Stack Makers Podcast.I had a great chat with my friend, Alex Williams, founder of The New Stack, a popular tech blog focused on DevOps and modern software development. We focused on MLOps and the increasingly significant convergence of software engineering and data science. Minter Dialogue. I spoke with Minter Dial, host of the popular podcast, Minter Dialogue, and author of the book Heartificial Empathy: Putting Heart into Business and Artificial Intelligence. We had a wide ranging conversation in which we talked about the future of AI, AI ethics, and the state of AI in businesses. Datamation. In this video chat with James Maguire for Datamation, we discuss some of the key trends surrounding AI in the enterprise, and the steps businesses are taking to operationalize and productionalize machine learning. Hope you enjoy the talks! If you're not already registered for TWIMLcon we'd love to have you join us! Register now!
I had the chance to sit down with Scott Clark, Founder & CEO of SigOpt, a Founding sponsor of the upcoming TWIMLcon: AI Platforms! Scott discusses what SigOpt has been up to, the unique value he sees #TWIMLcon bringing to the ML/AI industry and what you can expect from the expert-driven SigOpt session and booth! Sam Charrington: [00:00:00] All right everyone, I am excited to have Scott Clark, founder and CEO of SigOpt. If you know Scott's name, it's because he's one of the few who has been on the TWIML AI podcast multiple times. Scott, welcome once again. Scott Clark: [00:00:13] Thanks, Sam. Always a pleasure to chat. Sam Charrington: [00:00:16] For those who haven't heard one of the previous episodes, why don't we get started by having you give us a really brief overview of your background. Scott Clark: [00:00:23] Yep. So I did my PhD in optimization in applied mathematics at Cornell. Spent a couple years at Yelp on their advertising team, helping them tune the various aspects of it, and working on releasing things like the Yelp academic dataset challenge. And then about five years ago, started SigOpt. Sam Charrington: [00:00:42] And so what is SigOpt? Scott Clark: [00:00:45] We're a software company. We build tools to help people build better models. We do this via an experimentation and optimization platform that bolts on top of any model or AI platform, allowing people to tune and tweak all the various configuration parameters of their models better, faster, and cheaper than alternative methods. We do that today with asset managers with over $300 billion of combined asset center management, Fortune 500 firms with $500 billion of total market cap, several dozen universities and research institutes around the world, as well as the US intelligence community, and many many more. Basically, anyone who has a model, we help them configure it and experiment with it to get it to the best performance. Sam Charrington: [00:01:29] So, Scott, SigOpt, and you personally as well, have been huge supporters of everything that we've done here at the podcast in terms of AI platforms, from the e-book that we recently published, The Definitive Guide to Machine Learning Platforms, to the upcoming conference, TWIMLcon: AI Platforms. Tell us a little bit about why you're so excited about the conference, TWIMLcon: AI Platforms, and the space in general around machine learning platforms. Scott Clark: [00:02:03] Definitely. We're super excited about this because we have the privilege of working with some of the most advanced firms in the world when it comes to AI and ML, and we've noticed that a lot of them have started to build these platforms over the last few years. As you start to productionalize AI, as you start to solve some of the low hanging fruit problems, you start to notice areas of overlap. Areas where engineers can build tools to make the entire modeling process a little bit more efficient, a little bit more scalable, a little bit more robust, et cetera. So a lot of our customers have been building these over the last few years, and SigOpt is kind of a seamless component, via our rest API, to bolt into them and help supercharge that experimentation and configuration tuning aspect of modeling.  So we're very excited to have someone be shining a light on the problem, and helping those firms that may not have been doing modeling and production for the last decade kind of get a leg up and skip over some of the trials and tribulations that those that went before them have already solved. Sam Charrington: [00:03:04] That's awesome. And so you're personally going to be delivering a session at the conference. What can attendees expect to get out of your session? Scott Clark: [00:03:15] Yeah. I'll be building upon some of the stuff in your ebook, talking about how people can make different trade-offs as they look to standardize various components of their machine learning platforms. How they think about what things need to be bespoke and built for their very specific use cases, and other things that can be standardized. Whether that's using open source tools or partnering with firms like SigOpt. I'll talk about the different trade-offs there, and how you can have standardization without necessarily constraining what your researchers and modelers are doing as well. Sam Charrington: [00:03:51] Yeah, I'm really glad that you're going to be talking about that because I think one of the things that I try to convey in the e-book is that there's really no one size fits all. Everyone's coming from a different technology legacy, solving a different set of problems, have a different set of skill sets, and it is important that everyone that starts off on this journey or, you know, is just taking the next step on their journey to figure out, you know, what are the things that make the most sense for them, and how to evaluate the increasing number of options that are available in this space, from open source to commercial products, to as-a-service offerings.   It sounds like you get a pretty broad view of that across the different customers that you get a chance to work with.  Scott Clark: [00:04:38] Definitely, and we see that every customer has a unique, domain they're working in, a unique set of problems, a unique context that they need to be aware of, and what might work well for market making high frequency trading firm is fundamentally different than an oil and gas company, which is fundamentally different than a credit card company. And by making sure that you leverage your expertise where- ... it can make a difference, and then use the best tools in the world where it's an orthogonal approach can really allow you to accelerate and amplify what you can do individually with constrained resources. Sam Charrington: [00:05:15] And so your session is called Exploring Trade-Offs and Experiment Management as Part of AI Platforms, and it'll be on Tuesday at 10:50 on our technology track. SigOpt is also going to have a presence in our community hall. Can you tell us a little bit about what folks can expect to find when they show up to your booth? Scott Clark: [00:05:37] Definitely. We'll have a handful of experts from our team on site walking through all the different lessons we've learned from working with these leading firms for many years now. We'll talk about different trade-offs they found, different pitfalls they found, and how they leverage experimentation to really empower their experts to get the most out of their modeling. Sam Charrington: [00:05:57] Awesome. Awesome. Well, I'm really looking forward to seeing you and the team at TWIMLcon, and I really can't express enough my thanks to you and the company for supporting the conference. Really, really excited about this. Scott Clark: [00:06:11] Likewise, Sam. It's always a pleasure to chat, and we're really looking forward to seeing you and this amazing conference that you put together. Sam Charrington: [00:06:17] Awesome. Thanks, Scott. Scott Clark: [00:06:19] Cheers.  TWIMLcon: AI Platforms will be held on October 1st and 2nd at the Mission Bay Conference Center in San Francisco. Click here to learn more  
As teams scale their AI platforms, they must decide which capabilities to build versus buy. Whether balancing standards and flexibility or differentiation and scale, there is a playbook teams should run to make these decisions effectively. Join SigOpt Co-Founder & CEO Scott Clark’s session at TWIMLcon to learn how AI leaders weigh these tradeoffs. During this talk, Scott will draw on experience implementing SigOpt with cross sections of large companies who represent over $500B in market capitalization and nimble algorithmic trading firms with over $300B assets under management. This talk will touch on the end-to-end modeling process, but focus on Experiment Management. In particular, Scott will apply this decision making framework to different parts of Experiment Management in a customer case study format and discuss tradeoffs in methodologies for technical tasks like cluster management, distributed tuning and hyperparameter optimization in greater detail.
A few weeks ago I had the opportunity to visit Siemens’ Spotlight on Innovation event in Orlando, Florida. The event aimed to bring together industry leaders, technologists, local government leaders, and other innovators for a real-world look at the way technologies like AI, cybersecurity, IoT, digital twin, and smart infrastructure are helping businesses and cities unlock their potential. Siemens put together a nice pre-event program the day before the formal conference which included a tour of their Gamesa Wind Turbine Training Center. We got a peek into the way these machines are assembled, repaired, and managed. As expected, wind turbines are increasingly being fitted with sensors that, when coupled with machine learning algorithms, allow the company to optimize their performance and do predictive maintenance. AI figured prominently into the discussions at the main conference and the highlight for me was Norbert Gaus, head of R&D at Siemens, presenting an overview of the four main AI use cases that the company is interested in: Generative product design Automated product planning Adaptable autonomous machines Real-time simulation and digital twin He covered, though not in much detail, examples in each of these areas. (My Industrial AI ebook is a good reference for more on the opportunities, challenges, and use cases in this space.) Gaus also provided an interesting overview of the systems and software tools the company was building for internal and customer/partner use. These spanned AI-enabled hardware, industry-specific algorithms and services, AI development tools and workflows, pretrained AI models and software libraries, and industrial knowledge graphs. I was able to capture a couple of really interesting conversations with Siemens applied AI research engineers about some of the things the company is up to. Over on Twitter you can check out a short video I made with Siemens engineer Ines Ugalde where she demonstrates a computer vision powered robot arm that she worked on that uses the deep learning based YOLO algorithm for object detection and the Dex-Net grasp quality prediction algorithm designed in conjunction with Ken Goldberg’s AUTOLAB at UC Berkeley, with all inference running in real time on an Intel Movidius VPU. I also had an opportunity to interview Batu Arisoy for Episode 281 of the podcast. Batu is a research manager with the Vision Technologies & Solutions team at Siemens Corporate Technology. Batu’s research focus is solving limited-data computer vision problems. We cover a lot of ground in our conversation, including an interesting use case where simulation and synthetic data are used to recognize spare parts in place, in cases where the part cannot be isolated. “The first way we use simulation is actually to generate synthetic data and one great example use case that we have developed in the past is about spare part recognition. This is a problem if you have a mechanical object that you deploy in the field and you need to perform maintenance and service operations on this mechanical functional object over time. In order to solve this problem what we are working on is using simulation to synthetically generate a training data set for object recognition for large amount of entities. In other words, we synthetically generate images as if these images are collected in real world from an expert and they’re annotated from an expert and this actually comes for free using the simulation. […]We deployed this for the maintenance applications of trains and the main goal is a service engineer goes to the field, he takes his tablet, he takes a picture, then he draws a rectangle box and the system automatically identifies what is the object of interest that the service engineer would like to replace and in order to make the system reliable we have to take into consideration different lighting conditions, texture, colors, or whatever these parts can look like in a real world environment.” There’s a ton of great detail in this conversation. In particular, we dive into quite a few of the details behind how this works, including a couple of methods that they apply which were published in his group’s recent CVPR papers, including Tell Me Where to Look, which introduced the Guided Attention Inference Network, and Learning Without Memorizing. Definitely check out the full interview! Thanks once again to Siemens for hosting this event and for sponsoring my visit, this post, and my conversation with Batu.
Talk 228 – AI for Earth Interview Transcript Sam Charrington: Today's episode is part of a series of shows on the topic of AI for the benefit of society, that were excited to have partnered with Microsoft to produce. In this show, we're joined by Lucas Joppa and Zach Parisa. Lucas is the chief environmental officer at Microsoft, spearheading the company's five-year, $50 million, AI for Earth commitment, which seeks to apply machine learning and artificial intelligence across four key environmental areas, agriculture, water, biodiversity and climate change. Zack is co-founder and president of SilviaTerra, a Microsoft AI for Earth grantee, whose mission is to help people use modern data sources to better manage forest habitats and ecosystems. In our conversation we discussed the ways that machine learning and AI can be used to advance our understanding of forests and other ecosystems and support conservation efforts. We discuss how SilviaTerra uses computer vision and data from a wide array of sensors like LiDAR, combined with AI, to yield more detailed small area estimates of the various species in our forests. And we also discuss another AI for Earth project, WildMe, a computer vision-based wildlife conservation project that we discussed with Jason Holmberg back in episode 166. Before diving in I'd like to thank Microsoft for their support of the show and their sponsorship of this series. Microsoft is committed to ensuring the responsible development and use of AI and is empowering people around the world with this intelligent technology to help solve previously intractable societal challenges spanning sustainability accessibility and humanitarian action. Learn more about their plan at Microsoft.ai. Enjoy the show. Sam Charrington: [00:02:17] All right, everyone. I am here with Lucas Joppa and Zack Parisa. Lucas is the CEO of Microsoft, no, not that CEO, but the Chief Environmental Officer. Zack is the Co-Founder and President of Silvia Terra. Lucas and Zack, welcome to this week in Machine Learning and AI. Lucas Joppa: [00:00:22] Thanks for having us here. It’s a huge pleasure. Zack Parisa: [00:00:24] Great to be here. Sam Charrington: [00:00:25] Awesome. Let’s dive right in. We’ll be talking about Microsoft’s AI For Earth Initiative, but before we jump into that, Lucas, as the CEO of Microsoft. I think, I’m going to run this one all day. Tell me a little bit about your background and how you came to be the CEO of Microsoft. Lucas Joppa: [00:00:48] Yeah, sure. I would say I never dreamed of being the CEO of anything that’s for sure. Particularly, in the standard context of it, much less what it means in my specific title is the Chief Environmental Officer. I mean, I grew up in far northern rural Wisconsin, I was obsessed with being outside. My approach to school in life in general was what can, how can I get done with anything that I need to get done with so I can go play out in the woods? I think, I thought I was going to grow up to be a game warden or something similar to that. Technology was not a big factor in my life as well. I mean, I’ve never had a computer growing up or a TV or anything else. I eventually found my way into university, started discovering that I was really interested in thinking about a career in environmental science, studied Wildlife Ecology. Again, not the traditional career path for somebody at Microsoft. Went off and spent a little time, in the United States Peace Corps in Malawi, working for the Department of National Parks and wildlife and then came back and did my PhD in Ecology. It was really then that I started to put together this, the two kind of incredible ages that I think we’re alive in today and the way I see our world. Which is that we’re doing business here at the intersection of the information age, and then this also incredible age of negative human impacts on earth’s natural systems. It was during my PhD, I just was really struggling with what’s the right way to do science at a way that scales with the scale of the problem. That’s when computing, programming, Machine Learning all kind of came flooding into my life at the same time. Ended up at Microsoft and Microsoft Research leading programs and environmental and computer science, and then things just progressed from there. Sam Charrington: [00:02:41] You’re actively involved in academic research and a number of organizations. Can you share a little bit about that? We talked about it a bit earlier. Lucas Joppa: [00:02:51] Sure. I mean, once you, live long enough in the academic world, the Pavlovian response stored some of the rewards that, that environment installs. I mean, I’m not proud to say it, but since I’m not proud, I should just say it. I am still that academic that checks their citations every day when I wake up over breakfast. While I definitely have a much larger and more expanded per view of roles and responsibilities here at Microsoft. I still think, science is important. Science is what drives all of the environmental sustainability decisions that we make here at this company. It’s what ultimately led to why we invested in this program AI For Earth. I firmly believe that, you have to understand the details, if you’re going to try to lead an organization somewhere with a big picture vision, if you don’t understand the details, if you don’t understand the science and then it’s difficult to do that. Just the way my brain works, the easiest way to understand the details is to get your hands dirty and be in there with the rest of the world trying to build the solutions of the future. That’s where the academic research for me comes in. It’s just that opportunity to actually like go really deep and work on both sides of the equation. I still publish in the environmental science literature. I still publish in the computer science literature, and the most depressing thing about that is how few of us there are that do both of those things. It’s one of the things that I spend a lot of my time every day doing is just trying to bring those two worlds together, and publishing is a fantastic way to do that. Sam Charrington: [00:04:35] Zach, you’re a forester. Zack Parisa: [00:04:37] Yeah, yeah. Sam Charrington: [00:04:38] I didn’t know that was a thing beyond the Subaru. Zack Parisa: [00:04:40] Right, right, sure enough. It’s absolutely a thing and an exciting, I think, there’s a rebirth in forestry now. I’m hoping that it’ll become a more broadly known thing here, before too long. Sam Charrington: [00:04:56] Tell us about your background and about Silvia Terra. Zack Parisa: [00:04:59] Yeah, sure. The start of my story actually isn’t terribly dissimilar than Lucas’s. I grew up in North Alabama though not Wisconsin, but in this funny place that was like North Alabama’s, covered in woods, but it also has NASA installation, in Huntsville, Alabama. My youth was basically just spend in the woods. When I was in first grade, I wanted to be an Entomologist. When I was in third grade, I wanted to be a Zoologist. I went through, geology and so on and so forth until I finally met somebody who was a forester. Until you meet somebody and you have somebody walk you through what that is, it’s an obscure field. What that is to me is the confluence of economics and ecology for me. It was this brilliant opportunity at the time, and that’s the way that I saw it because it brought together everything that I cared about. From the ecology side, insects and soils, geology, the interconnected nature of all of those systems, but also the economic side. Not only what the forest is, but also what we want it to be and how we value that as a society, and how we mean to take it from one place now, which is where we find it today to where we want it to be, and what we believe we need. That was my entrance into it. I believed, I would carry that out. I would live and work as a forester by managing some tract of land for some owner movement, whether that’s public or private, but that I would be focused on that landscape. Going through Undergrad, what I became really interested in, were oddly and a surprise to me was the quantitative aspects of certain problems like insects in a forest. When I first got into forestry, my freshman year, there was a massive outbreak of southern pine beetle in the U.S. South, and it was killing lots of pine trees. That was a really compelling problem to me because it relates so much not only to the trees themselves and the beetle, but also how we’ve managed them historically and sort of what, how that impacts, locally economies and that type of thing. I started into pheromone plume modeling of all things in a forest and system and trying to take measurements of concentrations of pheromones in locations, and backtrack to where that originated from in the winter, to try and deal with these beetles more effectively. What I learned from that or what I gathered was that there’s this incredible ability to scale up my interests. To still focus on the things that I loved to most, but to look at them with a different lens and to potentially affect change in a different way, than I had conceived of before. I wound up doing a work in Brazil, I was really interested in Tropical Forestry. I took some time off from Undergrad to do that, and worked in other areas, Bolivia in South America. There I got to see situations where people were dependent on different aspects of land, in different ways and more direct ways than I think I was familiar with from my youth in the U.S. South. Where, they were hurting animals, they were collecting nuts, fruits, things like that. They’re collecting fuel wood to stay warm, to cook. They were also, wanting to sell wood into a market, and to develop as communities. Forestry is about trade offs. There are a lot of things that we can do, and there are a lot of potential futures that we have before us, but we have to address the complexity of those systems in more comprehensive ways than we have in the past. There’s far more than just a timber market now, there’s far more than just a concern for delivery of wood to build houses. When we spoke just a little bit before, but that was experienced very acutely here in the Pacific Northwest. When people were confronting the issue of whether we had enough spotted owl habitat or spotted owls themselves or not. Whether we had managed appropriately in the past to accommodate those and everything that’s related to that species, or the habitats and other species that are related, or whether we haven’t, whether we’d failed. If we needed to go back and reconsider the ways that we make decisions. That was a really freighted conversation, it brought people to boiling points, and that was before my time really, before I really entered into the profession in any meaningful way. That type of conversation goes on now and it’s even more complicated, and there are more issues and more dimensions that we have to consider than there were then. To have constructive conversations, we have to have information to inform those discussions to facilitate the communication that yields solutions, that people can live with. Sam Charrington: [00:10:40] I’m presuming that, that need is what led you to found Silvia Terra? Zack Parisa: [00:10:45] It is. Yeah. Absolutely. Sam Charrington: [00:10:47] What is Silvia Terra, what is the company? Zack Parisa: [00:10:48] Right, what we do here? Failing to answer your questions here. Silvia Terra we provide information, just like what I was speaking about there. Our objective is to help people use modern data sources, like remotely sensed information from satellites, from aerial basis, from UAVs and modern modeling techniques to help get more resolution on information and get more accuracy and precision on information. Not only just about trees, but about habitats and beyond. That’s the focus of our company. We’ve been at this for about nine years, a lot of the folks that we work with are timber companies, we also work with non environmental NGOs, we work with government agencies. All of them, they have effectively the same questions, they’re very similar needs. Initially, up until now we’ve been providing data project-to-project to help them answer those critical questions that they confront on a regular basis. I guess, the reason I’m in this room with you all here today is that, we were able to start working with Microsoft AI For Earth. To begin to scale and expand that work, to build a foundational data set that we can start to use to answer these questions and to build on, to improve our ability to manage for the future. Sam Charrington: [00:12:21] This may be a good segue to taking a step back and Lucas, what is AI For Earth? Lucas Joppa: [00:12:29] Sure. Well, I think in the context of this conversation, you can think about it. What is AI For Earth? That’s why a reformed forester, who’s now the co founder of a startup and a reformed wildlife ecologists, who’s now the Chief Environmental Officer at Microsoft are at a table talking with you on TWIML. Sam Charrington: [00:12:44] I feel like we’re in this recursive. Lucas Joppa: [00:12:46] That’s right. I know exactly, I can’t even see you guys anymore. I’m just staring at myself and an Infinity Mirror here. What AI For Earth is, is as of Tuesday of this week, a one-year-old program. Sam Charrington: [00:13:00] Happy birthday. Lucas Joppa: [00:13:01] Thank you. Thank you. It was fantastic. We spent it celebrating with our colleagues at National Geographic in Washington, D.C. Sam Charrington: [00:13:08] In the woods? Lucas Joppa: [00:13:10] Unfortunately no, but at the founders table of one of the most iconic and exploration driven organizations in the world. It was an incredible time. What AI For Earth is, is a five year, $50 million commitment on behalf of Microsoft to deploy our 35 years. Actually a little bit more than 35 years of fundamental research in the core fields of AI and Machine Learning. To deploy those to affect change in these four key areas of environment that we care deeply about, which is agriculture, water, biodiversity, and climate change. The reason that we’re doing that is, because we recognize that at Microsoft, I already spoke about this tale of two ages really, this time of this information age and this time of incredible, negative impacts of human activities on earth’s natural systems. You look and you realize that as a society we’re facing almost an unprecedented challenge. We somehow have to figure out how to mitigate and adapt to changing climates, ensure resilient water supply sustainably feed, human population, rapid, the growing to 10 billion people. All while stemming this ongoing and catastrophic loss of biodiversity that we see are around the world. We’ve got to do that while ensuring that the human experience continues to improve all around the world for everybody that economic growth and prosperity, continue to grow. That’s why I say it’s an unprecedented challenge. I mean, the scope and the scale are just incredible. If you look at the scope and scale of the problem and you step back and you ask yourself the same question as a company that I asked during my PhD, which is, “Well, what are the things that are growing in the same exponential fashion as the scale and complexity of that challenge of our environmental challenge?” Well, pretty much the only trends that are happening in an analogous fashion, are in the tech sector and particularly in the broader field of AI and the more narrow Machine Learning approaches that are getting a lot of attention today. That’s when we decided to put together this program to actually say, “Hey, we’ve been investing as a company for over a decade at the intersection, environmental science and computer science.” I led research programs in our blue sky research division called Microsoft Research for a fair number of years on that. But, then the technology reached a point, the criticality of the societal challenge, I think, reached a point that it was time for a company like Microsoft to step in and actually start to deploy some of those resources. Deploy them in ways that, ensure that we ultimately change the way that we monitor model and then ultimately manage earth’s natural systems in a way that we’ve never been able to before. We started out, as I said, a year ago with basically nothing but aspiration. We looked back this past Tuesday, this event that we had National Geographic where we inducted a new set of grantees into our portfolio, and realize that in that short year we’d set up relationships with organizations all over the world. Over 200 organizations all over the world, each that are dedicated to taking a Machine Learning first approach to solving challenges in these four domain areas that we focus on. There on all set, they’re working on all seven continents now, over 50 countries in the world, 34 countries here in the United States. Today, get the opportunity to sit down with one of the grantees, right? To hear a little bit more about, just their particular experience, and talk about the ways that that Machine Learning in particular can fundamentally change our ability to understand what’s going on on planet earth. Because I think, that most people don’t take the time to step back and realize when they hear terms like information age, just how narcissistic that really is, that almost every bit of information that we’ve been collecting is about ourselves, right? It’s about where the nearest Starbucks is, it’s about what people who searched for also searched for, right? It’s at the peril of ignoring the rest of life on earth and the ways that it supports us in our economies, it’s what Silvia Terra, I think, is so focused on, is using vast amounts of data, new approaches in Machine Learning to actually just ask them simple questions like, where are all the trees in the United States? We don’t know answers to things like that. I mean, that just blows my mind, and so that’s where a lot of this came from. It’s just a fundamental desire to change our ability to monitor and model life on earth. I guess, that isn’t all that simple, but- I also think it’s completely and totally doable, right? I mean, look at where we’ve come from, from an information processing capacity over the past 25 years to where we are today. I mean, if you would’ve tried to predict every little bit of it, it would have been impossible, but it seems preordained now that you look back at it. Sam Charrington: [00:18:38] When I think about the types of systems that we’ve been talking about thus far, both the economic systems, political systems as well as the biological systems. It jumps out at me that there’s a tremendous amount of complexity in those systems, and Machine Learning, deep learning in particular has this great ability to pick out patterns and abstract away from complexity, which kind of says to me, “Oh, it’s a no brainer to apply Machine Learning to this.” We’re still very early on in our ability to put these Machine Learning to work. I guess, I’m curious, maybe for you Zack, where you think the opportunity is with applying Machine Learning and AI, for the types of problems that concern you in particular with regard to forests? Zack Parisa: [00:19:43] Yeah, yeah, absolutely. I guess, listening to Lucas there, one thing that jumps out at me from when you first spoken that, your response to the second question there are lots of people that are very interested in natural resources and there are lots of people that are very interested in Machine Learning and AI, but it is a very small community of people. I think, it’s rare that you … it’s uncommon to start out believing, you’re going to spend all your time outside and then find yourself curled up in front of some code. The first thing, I think there’s a lot of opportunity for people to make that leap and just to begin to see that as a more natural thing, because the questions are very complex. Again, just like Lucas said, most of our focus has been on how to market to somebody to buy a cup of coffee here versus there. How to think about social networks and how to think about marketing networks and transportation networks. I think, it’s exciting to see that begin to percolate down and transition to the story behind how all of those materials come into our world and life. The fact is that everything around us and I think the surprising fact is that everything around us, every little bit of technology and everything that built this room that we’re in or that your listeners are in, those things were either grown or mined. Every piece of that, every little bit has some geographic story, some geographic stories, some physical story, some environmental story. If we were to be confronted with all of those stories, just from one day of our consumption, one day of us interacting as we normally do, it would take us years to even sift through all of those stories. There’s no way, there’s no way, but those stories, all amass to have a very large impact in how we all live. To me, that is the huge opportunity here. We with Microsoft AI For Earth have worked on this data set for the continental U.S. at high resolution to inform about, down to species and diameters, where trees are and what those structures and compositions are and moving forward what they could be. That’s not going to stop, the fact that we are all consumers and that while we have a conservation need, we also have a consumptive need. I think, there’s so much opportunity to begin to investigate how we balance that and how we feel about that and to engage a meaningful conversation, as at multiple levels in society about how that can best be done. Ask about opportunities. I mean, I was never excited about AI or Stats or Machine Learning for the sake of, I mean, it is awesome, I now understand that, and I do get jammed up about exciting advances there, but it’s about what it can answer. I mean, that’s what drew me out of the woods and put me in front of a computer, it was the ability to start to even think about those big questions, and put it all like distill it to something simple and right in front of us. That’s the opportunity. It allows us to know more about our world and ourselves and to create a better world and a better image of our of ourselves. Sam Charrington: [00:23:34] Can we maybe dig into a little bit more detail of either the Dataset that you just mentioned or another project and talk through, the process through it’s Silvia Terra uses Machine Learning, the challenges that you run into maybe walk us through a scenario. Zack Parisa: [00:23:54] Sure. Absolutely. I’ll just briefly tell you where we’re coming from. People have been managing forests for hundreds, a couple 100 years and in the U.S. about 100 plus. They needed information then, as they do now, but to get that they would do a statistical survey, they would go and put measurements in and you work up in average and you make a plan based on that average. That has been effective, it’s what people use a lot still today, but what we’re focused on doing is bringing imagery into bear and model assisted and model based methods to yield small area estimates. For us it’s at a 15 meter resolution, and for a 15 meter pixel, what we’re predicting is the number of stems, their sizes and species. When I say size, I mean the diameter of the trunk of the tree at four and a half feet off the ground. From there to, in a hierarchical context to predict them, maybe the height of the tree or the ratio of crown to just clear bowl at the bottom. From there, maybe the herbaceous, since we can infer or predict they’d be the light conditions under that forest, how much herbaceous plant matter there may be there? Carrying that forward. How many herbivores that could support scaling that up? How many large carnivores that could support? For now, the primary piece, this foundational data set that we’ve worked with Microsoft on is that tree list information for each one of those pixels, which hasn’t existed before, but that opens up so many doors for what we can begin to build onto and model further down the line. Sam Charrington: [00:25:46] At a resolution of 15 meters, single pixel might contain how many trees? Zack Parisa: [00:25:54] It could contain an awful lot. Easily, and this is the tricky thing because the tree could be as small as a seedling, it can be as large as a sequoia. You could have less than one, right? It could have 300 packed, but small, tiny little trees packed and tightened. This is the fundamental difference about what we’re working on here, to me than where we’re coming from. Which is, we need to transition away from the binary or the basically qualitative classifications, forest, non-forest. That’s not actually that informative about what that forest can … what habitat it can provide. What maybe we need to do or not do to ensure that it’s the type of forest that’s going to continue providing the things we care about. Clean water, carbon out of the atmosphere, wood to build this table. Those are the types of things. Beginning to quantify those aspects is very important. When I began working with this, everything was on the table. I mean, there was a potential to use LiDAR and neural nets, to try and clarify discrete trees. We do not do that for various reasons, largely bias in results. For us, parting out species became a massive problem. If you have, let’s say 40 trees of multiple species in one pixel, how do you begin to differentiate those when you’re looking at one pixel of data from lots of imagery sources. That was a technical challenge. Lucas Joppa: [00:27:40] One of the things that I think is interesting about this is like you’re talking about forestry, right? Whether or not people know it’s a profession, it’s an extremely old one. You know some people are going to … you don’t think that you’re going to be talking about Machine Learning. You also don’t think that you’re necessarily going to be talking about philosophy or existential questions, but you asked a question about 15 meter resolution, right? Which when you work with organizations like Silvia Terra that are looking down at the world and asking what is there, you end up having these existential conversations about what is a thing, right? At what level should we be taking data points to be able to feed into these Machine Learning algorithms? Because when you incorporate the zed dimension or the Z dimension or whatever you want to call it, whatever part of this planet earth we’re from, you can be looking down at a multitude of different objects, right? Depending on what sensor you’re using, you may only see one of them or you may see many of them. If you’re using something like LiDAR and you’re able to get your laser sensors enough to see enough of those things. You start struggling with all of these questions that are actually fairlyn unarticulated in the modern Machine Learning literature quite frankly. Where, all the standard libraries taken a 300 by 300 pixel image and they all have these harsh expectations and sure, maybe we think we all left the world of frequent statistics behind, but we still carry over it the ghosts of a lot of those, harsh binary classification results. It’s just fascinating I think, to think about, not just like what’s hard in the forestry space, and how modern Machine Learning techniques can help transform that, but also what the problems in the applications that an organization like Silvia Terra, and then the rest of our AI first grantees, what that brings to the Machine Learning community, which is what’s hard here? Why can’t we just take all the deep neural network advances that we’ve made and just voila, we’ve solved all the world’s problems, right? It’s because, as you said, we’re still at the infancy of a lot of what we hope to achieve in Machine Learning. We just also recognize the severely short amount of time that we have to answer some of these bigger and environmental questions. We have got to take everything that we have at our disposal and start to deploy it. Sam Charrington: [00:30:18] You mentioned sensors and LiDARs, a very specific curiosity question. I’ve always associated LiDAR, like a local, a very short range local sensing mechanism. Is that not the case? Can you do LiDAR from satellites? Lucas Joppa: [00:30:34] Yes, yes. Sam Charrington: [00:30:35] Talking about satellites or playing- Lucas Joppa: [00:30:36] Playing. Sam Charrington: [00:30:37] What are all the sensors that come into play here? Zack Parisa: [00:30:38] A new sensor was just launched a couple weeks ago. Lucas Joppa: [00:30:42] Something like that. Zack Parisa: [00:30:43] There’s JEDI Sensor, it’s called JEDI. I’m used to it now. Lucas Joppa: [00:30:48] I was going to say it. Sam Charrington: [00:30:49] Use the LiDAR? Zack Parisa: [00:30:50] Use the LiDAR, Lucas. Lucas Joppa: [00:30:52] JEDI, here’s a … Zack Parisa: [00:30:55] Well, it’s worth [crosstalk 00:30:56]. They’re strapping this thing onto the space station. It’s going to be pulsing down, not the polls, but basically everything between. I think, it’s full-waveform LiDAR and so absolutely, even historically there was iSAT, which was a satellite-based LiDAR Sensor. Moreover, and more commonly in forestry, and a lot of even in urban areas, they’re collecting LiDAR information from airplanes at different altitudes and different point densities. Something common one might be like 12 or 24 points per square meter. When you see that over a forest at canopy, some of those pulses reached the ground. The best elevation models that you see in the U.S. right now, are LiDAR derived elevation models. That’s the source of a lot of the information that we’re getting. You see it in a lot of flood plain areas, the Mississippi Delta area, so that we can better understand how flooding may occur or may not occur in certain areas. Lucas Joppa: [00:32:02] One more thing that I’m always struck by, when you start thinking about remote sensing and just sensing in general as applied to environmental systems, is that as we start to take a more digital or computational approach to sensing, we almost by definition have got to start taking a more Machine Learning approach to driving insights. Because, what computers are able to do, and I don’t know, maybe I’m just missing the conversation or maybe the conversation isn’t as fully articulated as it could be, but computers are able to sense the world in so many more dimensions than people are. Why do we model? Well, we model because we need a simplifying function to help us understand an already complex world. What was already complex according to our five senses has now become exponentially more complicated with things like hyper spectral resolution monitoring, where you’re getting thousands of bands back of imagery plus things like LiDAR that are getting 24 points per square meter. You can’t, humans don’t even know … It’s interesting, people always complain that they don’t understand what the layers and a deep neural network do. We also have no idea how to even interpret most of the signals that are coming back from the most advanced sensors in the world because they don’t correspond to dimensionality that we live in. Sam Charrington: [00:33:22] I was just going to ask that, when I’ve talked to folks that are using LiDAR in the context of self-driving vehicles and this whole idea of sensor fusion comes into play and making sense of all these disparate data sources. That example are very local and now we’re talking about, global data sources or at least much larger scale and with overlapping tiles and capabilities. There’s a ton of complexity, are those … is that type of complexity, some of the complexity that your company is working on managing or do you count on upstream providers to sort a lot of that out for you? Zack Parisa: [00:34:09] That’s exactly the type of complexity that we deal with. I mean, there are an enormous pool of potential data sources that exist and they all have potentially very useful attributes. Some of them less so, they have different timestamps associated with them, and there’s one very nice thing about measuring forests is that, as long as you don’t mess with them, they tend not to move too much. Trees, they’re pretty willing subjects just to be measured, but they are always changing. There’s growth associated, if there’s natural, there’s naturally occurring disturbance. There is human-caused disturbance and both of those we want to keep track of. What I see our role right now as being is taking that massive pool of potential sources of remotely sensed data, and the very small and often underappreciated pool of field measurements. The things that we actually might care about and translating between those things and creating something that is more highly resolved, more accurate, more precise and more useful than what could otherwise be achieved. So, yeah draw the signal out of the noise, the classic tale. Lucas Joppa: [00:35:24] If I look at kind of the full portfolio of AI For Earth grantees, well over 200, you see that, at least in my mind, Silvia Terra is as an organization one of the most mature, right? They’re actually out of the lab, their startup business model, et Cetera, et Cetera. When I think about why that is in the context of Machine Learning, why they’re able to take advantage of that. It’s because of one thing that we just heard, which is they’re taking advantage of these ground-based data points that they can use to train their models, right? That’s because forestry is something that is so inherently tied to our broader economy that we have here in the United States and all around the world. A history of going out boots on the ground and putting a tape measure around a tree and a GPS signal next to it and saying, “This tree is here, it’s this height and it’s of this species.” That’s so rare in the broader environmental space. It’s one of the reasons that I think, organizations like Silvia Terra are unfortunately standing alone in many respects is because there’s so few data sets. It’s called Machine Learning because we’re teaching computers, right? To teach, you have to be taught or to be taught, you need to be shown examples. It’s why we’ve seen, so significant of advances in other fields of Machine Learning but not in others. There’s just so few annotations in our space that when you come into a forestry space where the U.S. government has paid money for the past hundred years to go out and figure all this out. Companies like Silvia Terra can stand on top of that and really just kind of zoom off ahead. But, they are in many ways the exception to the rule, which is unfortunate I think. Sam Charrington: [00:37:18] Do you find that the kind of work that you’re doing, we talked about the sensing and pulling all that information together. Does this put you at the research frontier of using Machine Learning techniques or you able to use off the shelf types of models? Where does your work fall in the spectrum of complexity? Zack Parisa: [00:37:45] Boy. Sam Charrington: [00:37:46] Or maybe complexity is not the right word just in terms of the innovation cycle, are you able to apply things that people are doing in other fields pretty readily? Or are you having to push the limits and pull right out of academic research or things like that? Zack Parisa: [00:38:05] It’s a little bit of both. I mean, our core algorithm has been, it’s matured over the last nine years of doing the work that we have, and we’re a small team, we’re 10 people effectively. I guess, when I got into this, I originally, when I thought this quant path was something that really resonated with me that I wanted, that I connected with, and then I saw value in. I originally, then thought I was going to a professor, I would be a researcher somewhere. I would be putting papers out because that must be how change happens. My path changed when I went around to people that I’d worked with an industry and asked them what papers they were reading to effect, to change the way that they worked? What was the most influential journals that they were reading? The answer was that they weren’t reading the journals, they were busy managing land and that they wanted a tool, not a publication. I mean, that was a little eye opening, that’s what Max my other, Max Nova my Co-Founder and I set about to do is build tools. I don’t really, accept like a full dichotomy between, is it research or is it just off the shelf type stuff? I mean, we pride ourselves in our ability not only to understand the systems that we’re working in, but also, to be abreast of what’s happening in modern computational techniques and modeling efforts, your modeling tools. Which I imagine everybody would probably say, right? Like everybody would tell you, no. We’re right on the edge. The funny thing that I learned when I got into this, I’m on the applied side. I mean, I talk with people that are trying to figure out wildfire modeling and how to pick which communities to allocate funds and efforts to help manage a forest to prevent catastrophic fires. I work with people that are trying to figure out how to manage for forest carbon. I work with people that try and figure out how to manage forests to deliver wood to a mill to make paper. What’s I guess, striking to me from where I started to now, is I thought that what people needed to see was the math. I thought I would show up at their offices and be like, “Good news. We figured it out. Check this new method out. We pipe in this data. We put in these measurements from the ground. We’re able to model this more effectively now.” What I learned is that if I can’t communicate effectively about what we’ve done, if it really truly seems like magic than it is by definition, it’s incredible in the truest sense of the word, it is not credible, and credibility counts. In some cases where, when we’re working with people, we may not use the most fantastic new thing. We may use something that is slightly more costly in terms of input data that it requires or costly in terms of model fit, but that is more easily understood and explained and more robust to, like the boot test. You go out and it just makes sense. Sam Charrington: [00:41:36] Lucas, does that experience ring true for the other grantees that you work with or are there a spectrum of experiences there in terms of where they are and applying? Lucas Joppa: [00:41:47] Some of our grantees are using almost commodity services at this moment. I mean, Microsoft for instance has a service called Custom Vision AI, sorry, Custom Vision API. They want to do, some of our grantees want to do, simple image recognition tasks and the service works for them. They literally just drag and a whole bunch of photos of one type and a whole bunch of photos of another type and the system learns it and produces a result for them and that’s fine. Right? That’s pretty far on the one side of just like commoditized services. Then there are other grantees that are out there creating exceptionally custom algorithms for their work. I think, we’ve got a grantee, called Wild Me that does basically facial recognition for species, so that they can provide better wildlife population estimates of a species like giraffe, and zebra, things that they can. Everybody knows a giraffe or everybody has heard that every giraffe’s pattern is unique, but look at a couple of photos of giraffes and you realize just how hard it is for the human eye to spot those differences. Right? They’re building algorithms to differentiate any particular, zebra or giraffe and then plug those into statistical models for estimating populations. There’s nothing off the shelf that does that. In fact, most of the main libraries, they have to go back and modify the core code of, so it’s a full, full spectrum. We’re willing to support all of it, right? Because, what we’re trying to get people to understand is, well, first and foremost, we’re just trying to break down the access barrier, right? We want to ensure that budget isn’t a barrier to getting this stuff done. Because as I think, sure you and many of your listeners are aware, sometimes the latest Machine Learning approaches can be fairly expensive. If not, it might be an open source library, but somebody needs 1000 GPUs to run this thing on, right? We make sure that the infrastructure gets in the hands of folks, et Cetera, but it’s also just awareness that you could be thinking about this, you don’t have to be. We want the world’s leading Machine Learning scientists to be thinking about what they could be doing, but we don’t want the rest of the world to think that they have to be one of the world’s Machine Learning experts to have a crack at this, right? That there’s software and services that can help them as well. We see the full spectrum and I think it’s super healthy. We also see the full spectrum of, if I would encapsulate what Zack was saying there and just two words of interest in what we would call Explainable AI, right? Do people really care why an algorithm said that this was a giraffe and that was a zebra? Not really. You don’t have to explain that to them. Right? Do they want to understand why some decision support algorithm, like land, like a spatial optimization algorithm that assigns this part of the country or this part of the county into protected land and this part into industrial use and this part into urban growth and expansion? How that works and why people thought that this was the better policy than that? Sam Charrington: [00:45:14] Probably so. Lucas Joppa: [00:45:15] Yes, they do. I think, there’s a lot of hand wringing and angst right now around conversations like Explainable AI and whatever. I think, it’s no different than the conversation we’ve always had about modeling, which is why it’s a model of a complex system. Why are you building it? If it’s being built to just do a simple classification task and it’s easy for a human to go and check the accuracy left or right then great. You can use some really advanced statistical techniques, if it’s something that, if that model instead is a model of, for instance, a human decision process, then I think the onus on kind of explainability is much higher. Sam Charrington: [00:46:03] Along those lines, we’ve used computation to understand the environment climate for a very long time. Weather for example, has been a great focus of high performance computing. Taking a step back from, the fact that we’re all really excited about AI. Where do you think AI offers unique opportunities relative to the things that we’ve done for a long time? Lucas Joppa: [00:46:31] Sure. Well, I think that the answer to that will be super complex, I’ll try to make it simple, you mentioned weather. I think sure, there’s no question that statistics, and math and then the computational platforms that started to support them over the recent decades have been used for environmental monitoring. I mean, Fisher was, it goes all the way back to some of these guys were biologists. Right? The bigger question is why are we excited about this today? For me it really is the full broad definition of what we mean by AI. It’s the recognition that we’re finally deploying computing systems that can collect unprecedented amounts of data and not just amounts, but we were talking about the full crazy dimensionality of the data that we’re starting to take on. We’ve got this breakthrough in data, we’ve got this breakthrough in infrastructure, where you can … I made a joke about needing 1000 GPUs. Well, if you need one, 1000, 10,000, you just got to turn a knob these days and get access to it. Sam Charrington: [00:47:43] Wherever you are on the novice, still a lot cheaper than a supercomputer. Lucas Joppa: [00:47:47] Extremely. We have made crazy advances and just a whole plethora of algorithms, but for a lot of the most important ones, we’ve directly accelerated the compute, through the perspective of those algorithms. For the first time, and then of course we’ve made it so easy to deploy these algorithms as web based services, as APIs, right? Then, of course the software infrastructure stack and all of that is incredible. We’ve made it commodity level infrastructure, anybody can get access to this stuff. You hear this term Democratizing AI, what we mean by that is bringing it all into a stack that anybody can use. You don’t need access to a government-run super computer anymore, that’s all one side of it. The other thing is from weather, as a great example here where traditional weather forecasting was strong numerical simulation. That’s one type of math, right? But, there wasn’t a lot of learning in real time about what was going on. We took a physical process, we built a model that we thought strongly corresponded with it, and then we ran numerical simulations of it. Fast forward and yeah, just for the simulation perspective, you need a lot of compute. The question is, but all sorts of crazy things happen when we do that, that we don’t quite understand. Right? Little eddy flux has happened in some atmospheric layer or whatever and we don’t really know why. Then the weather community started using Machine Learning to not necessarily learn why, but to be able to predict for one reason or another when those things were going to come and weather forecasting got a lot better. Same thing is happening now in climate modeling as well. We know there’s things that we just can’t do, from our traditional approach to climate modeling. There’s a whole new group that just spun out, that’s taking purely Machine Learning first approach to building a new climate model for the world and not positioning themselves as better, but positioning themselves as complimentary. I think, that there’s a lot of work that’s just happened and commoditizing all of this stuff as well as, recognizing that while we’ve taken a hugely mathematical, statistical and computational approach to doing some of the stuff in the past. Machine Learning is a different approach, right? It’s a data driven approach, and that can be very complimentary and we’ve seen it accelerate extremely economically important things like weather cap, forecasting, forestry, agriculture, and on and on. Sam Charrington: [00:50:31] As we wind up. Zack, can you share something that you’re particularly excited about, looking forward in terms of the application of AI to Forestry? Zack Parisa: [00:50:42] Yeah, absolutely. I mean, obviously we’re excited to be releasing this data set, but it’s really about what it enables. We’re excited to see more nuanced and reactive markets around environmental services like species, habitat, carbon, water, be informed by these type of data and to play a part in that process to integrate these concerns into ongoing management decisions. That’s the biggest piece. It’s what you can do with this information, as you even move it from data to information to decisions. Sam Charrington: [00:51:29] Lucas, how about from your product, as you look at this from both a very technical and research perspective, but also as managing and interacting with this portfolio of innovators that are working in this space. What are you excited about? Lucas Joppa: [00:51:48] Well, ultimately the future I see, and the way that we’ve structured the whole program is we think the world fundamentally needs is the ability or what society needs is the ability to query the planet by X, Y, and T. We need to be able to ask questions just like we ask some potentially- Sam Charrington: [00:52:10] No zed? Lucas Joppa: [00:52:10] What’s that? Sam Charrington: [00:52:11] No zed? Lucas Joppa: [00:52:12] No zed. Well, I was actually speaking with my team the other day and I had sent a slide that said X, Y, T. Apostrophe Z and I was like, I said, “Stretch goal.” So, yeah, we get the zed dimension then I can retire. But no, I think, ultimately that’s where we need to go, we need to be able to allow people to ask for any particular piece of land or water, what was there? What’s there now? What could be there? Empower policy makers to figure out what should be there. We’re far from that. Now, Microsoft has always had an empowering an ecosystem of customers and partners approach. We don’t look at the world and say, “Oh, say we buy into my X, Y, T vision.” We don’t see that as some fantastical crystal ball that the world spins around and taps on, we see it as a constellation of services and products and solutions brought by all sectors. What we’re looking to do is engage with the Silvia Terra’s of the world, unfortunately, there are far too few at the moment. Engage with those that are there, bring up the next generation and the next and the next, until eventually there’s a self supporting community of Machine Learning, we talk about born digital. I think, about born Machine Learning, these organizations that it’s just baked into their DNA, but the organization doesn’t exist because of Machine Learning. It exists because of the challenges that we face in the environmental space. They just are capable of ingesting Machine Learning approaches natively and efficiently and treat space and time as first class data citizens in this world of Machine Learning. Sam Charrington: [00:54:07] Fantastic. Well, Lucas in Zack, thanks so much for taking the time to chat with me. Lucas Joppa: [00:54:13] Thank you. It was a pleasure. Zack Parisa: [00:54:14] Yeah. Thanks Sam. Appreciate it.
Bits & Bytes Microsoft leads the AI patent race. As per EconSight research findings, Microsoft leads the AI patent race going into 2019 with 697 patents that the firm classifies as having a significant competitive impact as of November 2018. Out of the top 30 companies and research institutions as defined by EconSight in their recent analysis, Microsoft has created 20% of all patents in the global group of patent-producing companies and institutions. AI hides data from its creators to cheat at its appointed task. Research from Stanford and Google found that the ML agent intended to transform aerial images into street maps and back was found to be hiding information it would need later. Tech Mahindra launches GAiA for enterprises. GAiA is the first commercial version of the open source Acumos platform, explored in detail in my conversation with project sponsor Mazin Gilbert about a year ago. Taiwan AI Labs and Microsoft launch AI platform to facilitate genetic analysis. The new AI platform “TaiGenomics” utilizes AI techniques to process, analyze, and draw inferences from vast amounts of medical and genetic data provided by patients and hospitals. Google to open AI lab in Princeton. The AI lab will comprise a mix of faculty members and students. Elad Hazan and Yoram Singer, who both work at Google and Princeton and are co-developers of the AdaGrad algorithm, will lead the lab. The focus of the group is developing efficient methods for faster training. IBM designs AI-enabled fingernail sensor to track diseases. This tiny, wearable fingernail sensor can track disease progression and share details on medication effectiveness for Parkinson’s disease and cardiovascular health. ZestFinance and Microsoft collaborate on AI solution for credit underwriting. Financial institutions will be able to use the Zest Automated Machine Learning (ZAML) tools to build, deploy, and monitor credit models using the Microsoft Azure cloud and ML Server. Dollars & Sense Swiss startup  Sophia Genetics raises $77M to expand its AI diagnostic platform Baraja, LiDAR start-up, has raised $32M in a series A round of funding Semiconductor firm QuickLogic announced that it has acquired SensiML, a specialist in ML for IoT applications Donnelley Financial Solutions announced the acquisition of eBrevia, a provider of AI-based data extraction and contract analytics software solutions Graphcore, a UK-based AI chipmaker, has secured $200M in funding, investors include BMW Ventures and Microsoft Dataiku Inc, offering an enterprise data science and ML platform, has raised $101M in Series C funding Ada, a Toronto-based co focused on automating customer service, has raised $19M in funding To receive the Bits & Bytes to your inbox, subscribe to our Newsletter.
Bits and Bytes This week news from Google I/O and Microsoft Build have dominated the news. Here are the highlights: Oracle rolling out AI applications for manufacturing. The applications leverage machine learning and AI to sift through large amounts of data from production environments to identify and trace issues from production through to customer delivery. IBM granted patent for AI-powered traffic management. The system would use computer vision powered cameras instead of timers to manage the flow of traffic. My friends over at SWIM are also doing interesting work in this area with one of their customers. Top Baidu AI executive stepping down. Top executive behind Baidu's artificial intelligence programs, Lu Qi, is stepping down. Lu is a former Microsoft executive and AI expert and has been responsible for day to day operations at Baidu's AI unit. Boston dynamics announces plans to sell SpotMini robot. The announcement came from Boston Dynamics founder Marc Raibert at the TC Sessions: Robotics conference at Berkeley. The robots are currently in pre-production but could available for sale come the middle of 2019. Researchers pair AI and drones to help manage agriculture. The University of South Australia system allows farmers to pinpoint areas that need more nutrients and water. This potentially improves crop outcomes and reduce resource mismanagement. Intel launches OpenVINO to accelerate computer vision development. The new toolkit, already in use at customers Agent Vi, Dahua, Dell, Current by GE, GE Healthcare, Hikvision, and Honeywell, includes three major APIs: The Deep Learning Deployment toolkit, a common deep learning inference toolkit, as well as optimized functions for OpenCV and OpenVX. Dollars & Sense Primal, an AI consumer and enterprise company, raises $2.3M BrainQ Technologies, a developer of AI to treat neuro-disorders, raises$8.8M in funding Motorleaf, a startup focused on data-driven insights for greenhouse and indoor operators, raises $2.85m Dialpad acquires TalkIQ to bring voice-driven AI to communications. Competitor 8x8 acquires MarianaIQ to strengthen AI capabilities as well. Oracle buys DataScience.com for data science platform Microsoft acquires Semantic Machines, to advance conversational AI Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
I’m still a bit high off of the energy from this week’s TWIML AI Summit at the Interop ITX conference. What a great event! I’m very grateful for a super-engaged audience that made moderating the two days a true pleasure. Fun fact: I would never have believed this if you’d told me in advance, but we had folks join us from as far away as Denmark, Finland, and New Zealand! As was nicely captured in this InformationWeek article about the Summit, the idea behind the event was to create a learning experience for IT and business leaders who will play a role in making ML and AI a reality in their enterprises. Specifically, I aimed to make sure that attendees left the event with the knowledge they’d need in order to: Understand key ML and AI technologies and ideas Identify opportunities to apply them Chart a strategy for their organizations Interface with implementers, like data scientists Expand their own career opportunities in this area And, most importantly, I wanted to offer attendees two key super-powers in this space: a decoder ring and a well-tuned BS meter. To achieve these goals I offered an ML/AI 101 session, three sessions on key AI technologies (NLP/conversational, computer vision, and IoT/edge applications) and three focused on more strategic aspects of enterprise AI (data collection and annotation, creating an ML-friendly culture and operationalizing ML, and enterprise AI strategy). We finished the day with a Q&A session, which I’m really glad I included. It went the entire two hours, was great fun, and the audience got to work out some of the key challenges presented by their own companies and industries—live, and in real time—with an expert panel of event speakers. Without further ado, here are some of the key ideas from each session: Intro to ML and AI. After reviewing our plan for the Summit, I dove right into this session first thing Monday morning. One of my main points throughout the talk was that while there are a lot of fancy-sounding algorithms out there, machine learning–unlike human intelligence–is really made up of fairly simple and single-purpose tools. Most business applications of ML are simply trying to predict values, classify things, or group things. When you understand these key ideas, you have the foundations for reasoning through how ML and AI can be applied in your business. After explaining these concepts, we walked through a bunch of common industry and functional use cases and decomposed them into their basic primitives. NLP and conversational applications. Next up, David Karandish, founder and CEO of Ai Software gave a presentation focused on his top seven lessons-learned from building an NLP/conversational software product: (1) conversational experiences involve a lot of nuances, (2) NLP itself is actually hundreds of interconnected subproblems, (3) integrating with other applications is a necessary challenge to overcome, (4) extracting meaning from otherwise unstructured documents offers tremendous value, (5) most of their successes involve a combination of ensembles of simpler models, active learning, and humans-in-the-loop, (6) plugging into existing interfaces is important, and (7) administrative interfaces are key. I found the discussion around points #2, 4 and 5 to be particularly interesting. Computer vision. Next, Siddha Ganju, data scientist from DeepVision and TWIML Talk #95 guest, provided a very thorough overview of computer vision research and applications. What I liked best about this talk–besides from a great overview of the history of the field--was that Siddha helped attendees gain an intuitive feel for convolutions and the other types of layers that make up convolutional neural networks, and how these are combined to perform image recognition and other CV applications. Her presentation of a bunch of novel use cases and how they were created was also really interesting. Data collection and annotation. Kiran Vajapey’s (TWIML Talk #130) talk helped ground attendees on the importance of data and the relationship between it and the knowledge and predictions we seek. His discussion around the role of humans in labeling data and how to build interfaces to allow them to do so effectively was particularly insightful. AI for IoT and at the Edge. Janakiram MSV’s talk really appealed to the geek in me with nice live demos involving a bunch of hardware and (literal) toys. He did a nice job articulating some of the challenges in working with current technologies, including the need to integrate multiple systems and the fragility of some IoT hardware and tools. His overview of the IoT ecosystem was also solid, and he delivered one of my favorite quotes from the session.   ML operationalization and culture. Jennifer Prendki’s (TWIML Talk #46) talk, which included a bunch of hand-drawn sketches (!) was fun and super informative. She really hammered home the differences between software engineering and data science, emphasizing that the latter is inherently exploratory and requires a different set of skills, inclinations, and processes than the former. One of her slides depicted the ROI of machine learning process as an S-curve, indicating that because of the heavy up-front exploratory phase, value tends to come later in the process than with traditional engineering efforts–an important concept for stakeholders to understand. Another favorite quote: “Science is the important word in ‘data science’.” AI strategy. Finally, Friederike Schüür–research engineer, Cloudera Fast Forward Labs–delivered a great presentation on crafting an enterprise AI strategy. I loved her slide (with credit to Domino Data Lab) on all the ways AI projects can fail: solve the wrong problem, solve right problem but already solved, solve right problem with wrong tools, solve right problem but too slowly to matter, solve right problem wrong way, solve right problem but world changes, solve a few good problems but not enough at once. I also liked her emphasis on managing uncertainty and optimizing for knowledge gain in the data science process, and she provided a nice template that attendees could use for tracking project goals, assumptions, and approaches. To close, she did a really nice job articulating the importance of ethical considerations for organizations undertaking AI projects, and offered tips for increasing awareness of bias in data and models, collecting better data, and de-biasing models. A few of you reached out to me and expressed regret that you couldn’t make it to Vegas for the Summit. Several expressed interest in seeing a private version of this program put on for their own companies or industry groups. If you think a program like this would be a good fit for your group, or you’d be interested in attending a similar Summit at a later date, please email me or comment below and let me know!
Bits & Bytes Intel open sources nGraph neural network compiler. The newly open-sourced compiler, originally announced last summer and discussed on TWIML Talk #31, provides support for multiple deep learning frameworks while optimizing models for different hardware solutions. It supports six deep learning frameworks: TensorFlow, MXNet, neon, PyTorch, CNTK, and Caffe2. Google unveils augmented reality microscope. The prototype, which can detect cancer in real-time, was unveiled at an event organized by the American Association for Cancer Research. The new tool relays its predictions directly into the field of view of the user and has the ability to be retrofitted into existing microscopes. Google extends semantic language capabilities. Building on the hierarchical vector models at the heart of Gmails's Smart Reply feature, the new work extends these ideas by creating vectors for larger chunks of language such as full sentences and small paragraphs. The company published a paper on its Universal Sentence Encoder and launched the Semantic Experiences demonstration site. A pre-trained TensorFlow model was also released. IBM releases Adversarial Robustness Toolbox. The open-source software library aims to support researchers and developers in defending deep neural nets against adversarial attacks. The software, which currently works with TensorFlow and Keras, can assess a DNNs robustness, increase robustness as needed, and offer runtime detection of potential threats. MATLAB 2018a adds deep learning features. Many self-taught data scientists were initially exposed to MATLAB via Octave, the open source clone Andrew Ng used in his original Stanford machine learning online course. Well, the commercial software continues to evolve, with its latest version adding a host of new deep-learning related features including support for regression and bidirectional LSTMs, automatic validation of custom layers, and improved hardware support. Dollars & Sense Sword Health, a Portuguese medtech company, raises $4.6 million LawGeex, a contract review automation business, raises $12 million XpertSea, applying computer vision to aquaculture, raises $10 million Konux, a sensor and AI analytics startup, raises $20 million Citrine, materials data and AI platform, raises $8 million Eightfold.ai launches talent intelligence platform, closes $18 million round Voicera, the AI-powered productivity service, announces acquisition of Wrappup Adobe announces acquisition of voice technology business, Sayspring Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
My travel comes in waves centered around the spring and fall conference seasons. A couple of weeks ago, in spite of there being no signs of a true springtime here in St. Louis, things shifted into high gear with me attending the Scaled ML conference at Stanford and Nvidia GTC over the course of a few days. Following me on Twitter is the best way to stay on top of the action as it happens, but for those who missed my live-tweeting, I thought I’d reflect a bit on Nvidia and GTC. (You’ll need to check out my #scaledmlconf tweets for my fleeting thoughts on that one.) In many ways, Nvidia is the beneficiary of having been in the right place at the right time with regards to AI. It just so happened that (a) a confluence of advances in computing, data, and algorithms led to explosive progress and interest in deep neural networks, and (b) that our current approach to training these depends pretty heavily on mathematical operations that Nvidia’s graphics cards happened to be really efficient at. That’s not to say that Nvidia hasn’t executed extremely well once the opportunity presented itself. To their credit, they recognized the trend early and invested heavily in it, before it really made sense for them to do so, besting the “innovator’s dilemma” that’s caused many a great (or formerly great) company to miss out. Nvidia has really excelled in developing software and ecosystems that take advantage of their hardware and are deeply tailored to the different domains in which it's being used. This was evidenced in full at GTC 2018, with the company rolling out a number of interesting new hardware, software, application, and ecosystem announcements for its deep learning customers.   A few of the announcements I found most interesting were: New DGX-2 deep learning supercomputer After announcing the doubling of the V100 GPU memory to 32GB, Nvidia unveiled the DGX-2, a deep-learning optimized server containing 16 V100s and a new high-performance interconnect called NVSwitch. The DGX-2delivers 2 petaFLOPS of compute power and offers significant cost and energy savings relative to traditional server architectures. For a challenging representative task like training a FAIRSeq neural machine translation (NMT) model, the DGX-2 completed the task in a day and a half, versus the previous generation DGX-1’s 15 days. Deep learning inference and TensorRT 4 Inference (using DL models, versus training them) was a big focus area for Nvidia CEO Jensen Huang. During his keynote, Jensen spoke to the rapid increase in complexity of AI models and offered a mnemonic for thinking about the needs of inference systems both in the datacenter and at the edge–PLASTER, for Programmability, Latency, Accuracy, Size, Throughput, Energy Efficiency, and Rate of Learning. To meet these needs, he announced the release of TensorRT 4, the latest version of its software for optimizing inference performance on Nvidia GPUs. The new version of TensorRT has been integrated with TensorFlow and also includes support for the ONNX deep learning interoperability framework, allowing it to be used with models developed with the PyTorch, Caffe2, MxNet, CNTK, and Chainer frameworks. The new version's performance was highlighted, including an 8x increase in TensorFlow performance when used with TensorRT 4 vs TensorFlow alone and 45x higher throughput vs. CPUs for certain network architectures. New Kubernetes support Kubernetes (K8s) is an open source platform for orchestrating workloads on public and private clouds. It came out of Google and is growing very rapidly. While the majority of Kubernetes deployments are focused on web application workloads, the software has been gaining popularity among deep learning users. (Check out my interviews with Matroid’s Reza Zadehand OpenAI’s Jonas Schneider for more.) To date, working with GPUs in Kubernetes has been pretty frustrating. According to the official K8s docs, “support for NVIDIA GPUs was added in v1.6 and has gone through multiple backwards incompatible iterations.” Yikes! Nvidia hopes its new GPU Device Plugin (confusingly referred to as “Kubernetes on GPUs” in Jensen’s keynote) will allow workloads to more easily target GPUs in a Kubernetes cluster. New applications: Project Clara and DRIVE Sim Combining its strengths in both graphics and deep learning, Nvidia shared a couple of interesting new applications it has developed. Project Clara is able to create rich cinematic renderings of medical imagery, allowing doctors to more easily diagnose medical conditions. Amazingly, it does this in the cloud using deep neural networks to enhance traditional images, without requiring updates to the three million imaging instruments currently installed at medical facilities. DRIVE Sim is a simulation platform for self-driving cars. There have been many efforts to train deep learning models for self-driving cars using simulation, including using commercial games like Grand Theft Auto. (In fact, the GTA publisher has shut several of these efforts down for copyright reasons). Training a learning algorithm on synthetic roads and cityscapes hasn’t been the big problem though. Rather, the challenge has been that models trained on synthetic roads haven’t generalized well to the real world. I spoke to Nvidia chief scientist Bill Dally about this and he says they’ve seen good generalization by incorporating a couple of techniques proven out in their research, namely by combining real and simulated data in the training set and by using domain adaptation techniques, including this one from NIPS 2017 based on coupled GANS. (See also the discussion around a related Apple paper presented at the very first TWIML Online meetup.) Impressively, for as much as Nvidia announced for the deep learning user, the conference and keynote also had a ton to offer their graphics, robotics and self-driving car users, as well as users from industries like healthcare, financial services, oil and gas, and others. Nvidia is not without challengers in the deep learning hardware space, as I’ve previously written, but the company seems to be doing all the right things. I’m already looking forward to next year’s GTC and seeing what the company is able to pull off in the next twelve months. Sign up for our Newsletter to receive this weekly to your inbox.
Bits and Bytes Apple hires Google’s AI head Google forms A.I. business unit. The latest in the AI talent wars, John Giannanderea, previously Google's chief of search and AI, was hired to run Apple’s “machine learning and A.I. strategy.” It’s an important victory for Apple who has lagged behind in AI. Google took the change as an opportunity to put AI into its own business unit under recent TWIML guest Jeff Dean. As the AI “arms race” intensifies, larger players are putting ever more resources into solidifying their positions. Last week we shared a similar story from Microsoft on its own reorg to better focus on AI. Researchers at MIT-IBM Watson AI Lab train models to recognize dynamic events. It’s easy for humans to recognize dynamic events, for example, opening a door, a book, or a bottle. MIT-IBM researchers hope to train models to recognize these types of dynamic events. They've released a Moments in Time dataset and are hosting a Moments in Time competition at CVPR. Note: I recently discussed similar work from the Univerisity of Montreal and startup Twenty Billion Neurons with its chief scientist Roland Memisevic. GridGain's newest release includes continuous learning framework. The company's in-memory computing framework based on Apache Ignite now includes machine learning and a multilayer perceptron (MLP) neural network that enables companies to run ML and deep learning algorithms against petabyte-scale operational datasets in real-time. Amazon SageMaker update. They’ve added support for more instance sizes and open sourced their MXNet and Tensorflow containers. The updated containers can be downloaded to support local development. Data scientist uses cloud ML to classify bowls of ramen. Nevermind hot dog/not hot dog... Data scientist Kenji Doi used Google Cloud AutoML Vision to successfully identify the exact shop each bowl is made at. A very impressive feat when you consider how similar the bowls of ramen actually look. Dollars and Sense Insider, an AI-enabled growth marketing platform, raises $11 million Comet.ml, a platform for managing AI projects, raises $2.3 million Audioburst, an AI-enabled audio search platform, raises $4.6 million from Samsung Conga to acquire, the contract discovery and analytics company Counselytics, to bolster AI strategy and document automation capabilities Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits & Bytes Google AI to predict heart disease with eye scans. The tech is being developed by Google’s health subsidiary Verily. It works by scanning the back of a patient’s eye, it then uses that image to deduce patient age, blood pressure, smoking status, and their risk of heart attack. It’s still in its early stages, though, and is not ready for clinical use. Google debuts ‘auto ads’ for intelligently ad placement. While Google has long used machine learning to determine the best ads to show on a web page, this new feature reads the target page and selects the best ad placement on the page. Google claims that participating publishers saw ad revenue increases of 10-15%, however, some beta users were not happy about the number of ads being placed on their pages. IBM partners with game dev platform Unity to create IBM Watson Unity SDK. I’ve had my eye on Unity since my interview with Danny Lange, their VP for ML and AI. The new SDK is being launchedon the Unity Asset Store and will allow developers to integrate visual recognition, speech text, and language classification features into their games and AR/VR applications more easily. Qualcomm adds AI engine to Snapdragon mobile platform. The Qualcomm AI Engine consists of software, hardware and APIs meant to support efficient neural network inference on client devicesrunning Snapdragon processors. Accenture launches AI testing service. Accenture’s taking a “Teach and Test” approach to the service, with the former focused on the choice of data, models, and algorithms used to train ML models, and the latter on up-front and ongoing evaluation of model performance, explainability and bias. MindBridge adds NLP to its AI-powered auditing software. The update allows audit professionals to naturally ask query transactional data and gain insight into potential errors and risky transactions. Dollars & Sense Vectra, a cybersecurity startup, raises $36M for global expansion of its AI-Based Security Platform SparkCognition, an AI solutions startup, raises $56.5 million Series B For International Expansion StatusToday, an employee productivity startup, raises $3.91 million to improve employee productivity with AI Prophesee, a machine vision startup, raises $19 million for its machine vision technology Agent IQ, an AI customer service bot startup, raises $6.3M Benevolentai acquires Cambridge research facility to accelerate AI-enabled drug development Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Bits and Bytes Interesting tidbits from recent news: Microsoft develops AI powered sketch artist. The new bot, based on recent GAN research, is capable of generating “drawings” from caption-like text descriptions. Applications for this technology include the arts, design, and perhaps at some point, police sketches. Overall very cool. IBM and Salesforce announce Watson + Einstein collaboration. The two tech giants are teaming up to integrate their two eponymously named, over-marketed, poorly understood machine learning products. Oh boy! Although it’s not immediately obvious in what ways Watson and Einstein are “combining”, Salesforce and IBM are making it clear that they are prioritizing AI and fleshing out their offerings. #SnarkLevelHigh Baidu grows AI research team. The new hires are Dr. Kenneth Church a pioneer in Natural Language Pioneering, Dr. Jun Huan a big data and data mining expert and Dr. Hui Xiong who specializes in data and knowledge engineering. Dating services firm Lunch Actually to launch ICO for Viola.AI. The dating service aims to not only match couples but also track their relationships, suggest date venues, remind them of new dates and advise them on relationship problems. Potentially a very interesting AI application, but one with tons of potential privacy implications. UC Berkeley & Facebook introduce House3D for reinforcement learning. The two teamed up to enable more robust intelligent agents by publishing a new dataset called “House3D”. House3D contains 45,622 3D scenes of houses, ranging from single-room studios to multi-storeyed houses equipped fully labeled 3D objects. In doing so, the groups aim to push RL research away towards focusing on tasks that more easily applicable to the real world. App claims to predict if an image will “go viral.” ParallelDots released the app with an open API that allows user to upload images then receive a “virality” score. It’s no secret that viral sharing is the dream of many marketers, so it’ll be interesting to see if this type of service could provide beneficial insights when planning ad campaigns. Amazon launched SageMaker BlazingText. BlazingText is an unsupervised learning algorithm for generating word2vec (see TT # 48) embeddings and is the latest addition to Amazon SageMaker’s suite of built-in algorithms. Deal Flow There seemed to be an abundance of deals last week: Smartphone-maker Coolpad has raised $300 million from Chinese property mogul Chen Hua-backed Power Sun Ventures to enhance its artificial intelligence capabilities. Understand.ai, a Karlsruhe, Germany-based machine learning startup for training and validation data in autonomous vehicles, raised $2.8 million in seed funding. C3 IoT, a provider whose software offerings include AI-for-IoT tools, announced a $100 million new round of financing. Data Nerds, a Canada-based developer of data products, raised $3m in Series A funding. Techcyte, Inc. closed a $4.3 million funding round to commercialize its digital pathology platform. Babblabs, a fresh start-up in advanced speech processing, announced today a Series Seed investment of $4 million. Owkin, a NYC-based predictive analytics company that utilizes transfer learning to accelerate drug discovery and development, raised $11m in Series A funding. Pony.ai, a year-old California-based self-driving car startup, announced it recently completed a $112 million Series A funding round. Smartsheet, that builds software for corporate process management, acquires business automation chatbot startup Converse.AI. Workday, the cloud HR and financials SaaS provider, buys SkipFlag to bolster machine learning capabilities. Sign up for our Newsletter to receive the Bits & Bytes weekly to your inbox.
Hey there! This week's main article is a bit longer than usual, but I hope you'll find it both interesting and thought provoking. Google's New Cloud AutoML: What is it and broader implications Developing machine learning systems is an inherently iterative process and one which can be, as a result, tedious, time consuming and expensive. In addition to the data science work that needs to be done, the goal of which is the development of a predictive model, there are also a host of IT issues that need to be addressed to put ML and AI models into production. To help organizations overcome these issues, the major cloud vendors, as well as many independents, offer cloud-based prediction APIs that developers can easily integrate into their applications. The benefit of these APIs is in their ease-of-use and the fast time-to-market they enable. In most cases, developers are only minutes away from retrieving predictions using these APIs. AI-as-a-Service Challenges The fly in the ointment, so to speak, of these APIs has traditionally been the fact that they really only work well in fairly generic use cases. Consider the case of an object detector for photos and videos. That object detector is based on deep learning, and was trained using millions of labeled example images or video samples. But if the objects you want to identify in your videos are not well represented in the labeled data set, the neural network can’t really hope to offer accurate predictions. [A fly in the ointment, according to Google Cloud Vision API] As a result, developers using AI-as-a-Service offerings face challenges relating to: Domain specificity. If a given service’s training dataset doesn’t span my domain, it can’t help me. For example, if I need to visually identify when toy cars are present in my images, but the training set only includes a few images containing toy cars, the service can’t really be expected to do a very good job with them. Fine-grained detection. What if I need to not just identify the presence of toy cars in my images, but also distinguish between different types of toy cars? Now I not only need a lot of them present in the training dataset, and need more fine grained labeling (requiring more expertise to develop by-the-way), but I also need a network architecture that is robust enough to capture the fine-grained distinctions among different types of toy cars. Bias avoidance. Even if our object detector has plenty of toy cars with granular labels, the performance of the detector for our application can be impacted by a variety of other factors: the distribution of the cars in the training set, the orientation of the cars in the training set, the backgrounds of the images in the training set, the lighting, quality and resolution of the images in the training set, to name a few. If any of these factors is poorly aligned with the types of images I need to analyze, my predictor is likely to underperform. Enter Cloud AutoML Google Cloud AutoML is a new service aimed at helping developers overcome these challenges. Cloud AutoML operates similarly to other AI-as-a-Service APIs, however, Cloud AutoML users can upload their own training datasets to augment those already collected by Google. The first service to be released under the Cloud AutoML brand is Cloud AutoML Vision, which allows users to train custom vision models. Cloud AutoML Vision is currently in alpha release, meaning the service is in testing and access to it is limited. Interested customers must apply for access, agree to applicable terms, and have their projects cleared for use against the API. Notable features include: Powered by transfer learning. We’ve talked about transfer learning a bit on the podcast (e.g. TWIML Talk # 62, 88). It’s a methodology for training a neural network on one dataset and then refining its training with another dataset. The advantage of transfer learning is that the first training, which typically uses a much larger training dataset, does much of the heavy lifting of teaching the network to make coarse-grained distinctions. This allows the second training dataset to be much smaller. In this case the first dataset is Google’s own labeled training data and the second training dataset is the customer’s. Google claims that transfer learning allows custom AutoML Vision models to achieve high domain specific performance with as few as “a few dozen photographic samples.” This is sure to be highly use case dependent. Automated model optimization. Cloud AutoML is powered by a number of advanced optimization techniques, like neuroevolution (discussed in detail in TT # 94), reinforcement learning (which we’ve discussed extensively, though in other contexts, such as TT # 24, 28, 43), and other automated techniques. This helps ensure that the models created by the service perform well. Human labeling. If you’ve got data, but it’s not labeled, Google can help you with this as part of your use of the service. It will be interesting to compare how their offering in this area compares to that of more specialized providers like MightyAI (TT # 6, 57) or CrowdFlower. For sure, Cloud AutoML is an exciting addition to the Google Cloud Platform, and to my knowledge they’re the first major cloud vendor to announce support for customer-provided training data for cognitive APIs. This is the natural direction for these kinds of services, though, and I’d expect all major cloud vendors to announce similar capabilities within 12-18 months. On Automated Machine Learning While this article has already gotten fairly long for this newsletter, I wanted to comment briefly on the broader notion of automated machine learning. Google’s clearly trying to stake some thought-leadership ground here with its naming choice for this service. I get this, and this may prove to be effective short-term positioning for them, but ultimately all users of AI-as-a-Service, particularly the less sophisticated users (from a data science perspective) that these services target, expect high levels of automation. And, as expressed above, I think the ability to bring your own training data (which kind of assumes transfer learning and automated model search/optimization) will be table stakes within a year or two. A broader challenge posed by the AutoML name is furthering the idea that magic, black box services can get you all the way there, and that some degree of data science or statistically knowledge isn’t necessary. Google and others in this space are certainly providing a valuable service in lowering the barrier to entry for developers and enterprises interested in using machine learning, but often the issue is that without a certain degree of savvy, you don’t know what you don’t know. This is particularly true with the bias-related challenges this offering is meant to address in the first place. Further, there are complaints that Google is attempting to hijack the term AutoML, which has existing meaning within the data science community and which has for years been the name of a workshop on the topic held at ICML and other academic conferences.   The broader field of AutoML encompasses the automation of a wide range of data science tasks, including data preprocessing, model selection, hyperparameter optimization, performance analysis and more. Google Cloud AutoML, while powerful, doesn’t quite live up to the broader vision of AutoML: well-understood tools, algorithms and methodologies that increase the throughput and effectiveness of data scientists. This may be splitting hairs, but I do agree that the distinction between closed, black-box automation vs open and transparent tools that can be integrated into a user’s ML pipeline is an important one. Examples of projects that have evolved out of these broader efforts include the open source auto-sklearn, AutoWEKA, and Auto Tune Models (ATM). Commercial offerings like Bonsai (TT # 43], SigOpt (TT # 50), h2o’s Driverless AI, and DataRobot also exist, falling in varying places along the transparency spectrum. To Google’s credit, they’ve published extensively in this area and in the academic literature generally. Further, the company is certainly heavily invested in open source tools in this domain (e.g. TensorFlow). And Cloud AutoML Vision is but a first installment toward a broader vision of an automated ML platform. It wouldn’t surprise me at all to see Google Cloud AutoML technologies eventually surface as open source projects within in the TensorFlow ecosystem over time. Finally, there’s an interesting conversation to be had about the impact of automated ML tools on the space: Will lowering the barrier to entry result in a flood of faulty, poorly-understood models entering them market via internal and public-facing enterprise systems? How will these systems impact their users? Will users accept the (relative) lack of transparency offered by these systems for production workloads? Will Google and others develop tools that help users understand the statistical biases of their datasets? What do you think the biggest issues will be? If you made it this far, I’d be very interested in hearing your thoughts on the above, so please don’t hesitate to reply! In any case, what’s clear and most exciting here is that powerful tools and technologies are rapidly becoming more accessible, and this will have a huge impact on how, and how quickly, machine learning and AI are adopted across many industries. Sign up for our Newsletter to receive this weekly to your inbox.
AI Networking Happy Hour Join us for a night of networking during the happiest hours of Thursday, 6/29/17 at The Ainsworth Midtown after the O'Reilly AI Conference. This is an informal night of drinks for all those interested in AI & Machine Learning. We look forward to seeing you there!  We've partnered up with our friends over at NYAI to make this event happen! NYAI is an events based networking group based out of New York. They are 2,300+ researchers, founders, data scientists, investors, and generalist that gather to learn and discuss new trends in AI & Machine Learning. Bu sure to check them out here, and if you're in NY, join the group! We're grateful for the support of Clarifai, who's sponsoring drinks at the event. Check them out at clarifai.com and be sure to check out Sam's interview with their founder Matt Zeiler (TWIML Talk #22). If you plan on coming to hang with us for the evening, let us know below! Update: This meetup has passed. Thanks to everyone who came--it was great!