We could not locate the page you were looking for.

Below we have generated a list of search results based on the page you were trying to reach.

404 Error
Google’s premier cloud computing AI conference, Google Cloud Next 2023, took place the last week of August at Moscone Center in San Francisco. I attended the event and had the opportunity to spend several days in a variety of keynotes, briefings, sessions, as well as explore the event’s expo floor. Of course, I shared some of my real-time observations via Twitter X, which you can check out here. Here, I’ll share a few of my key takeaways from the event. This was the first in-person Google Cloud Next event in three years. While the event felt a lot smaller and more compact than the last one I attended, it was still large for a post-pandemic conference with approximately 15,000 attendees present. Generative AI in FocusNo surprise here, but Generative AI was very much a key theme flowing throughout the event, though there was plenty of content for folks more interested in traditional cloud computing topics.In addition to enabling new features and capabilities for the company’s the core AI stack (AI-oriented infrastructure and accelerators, AI/ML/DS platforms, and AI-powered applications), Google is weaving generative AI into non-AI products through Duet AI, which adds AI-bases assistant technologies to a wide range of Google Cloud products.A good indication of the breadth of work they’ve done to quickly build generative AI into their product base can be seen in the many AI-related announcements they made during the event. Here’s a summary of the most interesting AI-focused ones, out of the full list of 161 noted in Alison Wagonfeld’s wrap-up post:Duet AI in Google Cloud is now in preview with new capabilities, and general availability coming later this year. There were a dozen more announcements covering Duet AI features for specific Google Cloud tools, but you can check out the blog post for a summary.Vertex AI Search and Conversation, formerly Enterprise Search on Generative AI App Builder and Conversational AI on Generative AI App Builder, are both now generally available.Google Cloud added new models to Vertex AI Model Garden including Meta’s Llama 2 and Code Llama and Technology Innovation Institute's Falcon LLM, and pre-announced Anthropic’s Claude 2. The PaLM 2 foundation model now supports 38 languages, and 32,000-token context windows that make it possible to process long documents in prompts.The Codey chat and code generation model offers up to a 25% quality improvement in major supported languages for code generation and code chat. The Imagen image generation model features improved visual appeal, image editing, captioning, a new tuning feature to align images to guidelines with 10 or fewer samples, and visual questions and answering, as well as digital watermarking functionality powered by Google DeepMind SynthID. Adapter tuning in Vertex AI is generally available for PaLM 2 for text. Reinforcement Learning with Human Feedback (RLHF) is now in public preview. New Vertex AI Extensions let models take actions and retrieve specific information in real time and act on behalf of users across Google and third-party applications like Datastax, MongoDB and Redis. New Vertex AI data connectors help ingest data from enterprise and third-party applications like Salesforce, Confluence, and JIRA.Vertex AI now supports Ray, an open-source unified compute framework to scale AI and Python workloads. Google Cloud announced Colab Enterprise, a managed service in public preview that combines the ease-of-use of Google’s Colab notebooks with enterprise-level security and compliance support capabilities. Next month Google will make Med-PaLM 2, a medically-tuned version of PaLM 2, available as a preview to more customers in the healthcare and life sciences industry.New features to enhance MLOps for generative AI, including Automatic Metrics in Vertex AI to evaluate models based on a defined task and “ground truth” dataset, and Automatic Side by Side in Vertex AI, which uses a large model to evaluate the output of multiple models being tested, helping to augment human evaluation at scale, and a new generation of Vertex AI Feature Store, now built on BigQuery, to help avoid data duplication and preserve data access policies.Now Vertex AI foundation models, including PaLM 2, can be accessed directly from BigQuery. New model inference in BigQuery lets users run model inferences across formats like TensorFlow, ONNX, and XGBoost, and new capabilities for real-time inference can identify patterns and automatically generate alerts. Vector and semantic search for model tuning now supported in BigQuery. You also can automatically synchronize vector embeddings in BigQuery with Vertex AI Feature Store for model grounding. A3 VMs, based on NVIDIA H100 GPUs and delivered as a GPU supercomputer, will be generally available next month. The new Google Cloud TPU v5e, in preview, has up to 2x higher training performance per dollar and up to 2.5x inference performance per dollar for LLMs and generative AI models compared to Cloud TPU v4. New Multislice technology in preview lets you scale AI models beyond the boundaries of physical TPU pods, with tens of thousands of Cloud TPU v5e or TPU v4 chips. Support for Cloud TPUs in GKE is now available for Cloud TPU v5e and Cloud TPU v4. Support for AI inference on Cloud TPUs is also in preview. GKE now supports Cloud TPU v5e, A3 VMs with NVIDIA H100 GPUs, and Google Cloud Storage FUSE on GKE (GA).Key TakeawaysMy takeaways from Google Cloud Next are very much in the same vein as those from my attendance at Google’s Cloud Executive Forum held earlier in the summer. I continued to be impressed with Google Cloud’s velocity and focus when it comes to attacking the opportunity presented by generative AI. The company clearly sees gen AI as a way to leap ahead of competitors AWS and Microsoft and is taking an “all in” approach. The company has also been very quick to rally customers around its new gen AI product offerings. In addition to the product announcements noted above, Google Cloud announced and highlighted new and expanded generative-AI-focused collaborations with a wide variety of customers and partners, including AdoreMe, Anthropic, Bayer Pharmaceuticals, Canoo, Deutsche Bank, Dun & Bradstreet, Fox Sports, GE Appliances, General Motors, Ginkgo Bioworks, Hackensack Meridian Health, HCA Healthcare, Huma, Infinitus, Meditech, MSCI, NVIDIA, Runway, Six Flags, eleven generative AI startups, DocuSign, SAP, and more.Interesting overview of @FOXSports use of Gen AI. Have 27 PB of video, ingest 10k hrs per month. Have custom models for things like celebrity detection, foul ball prediction, and more. Use the tech to allow analysts to more easily search archives. #GoogleCloudNext pic.twitter.com/ea3tQCVXU0— Sam Charrington (@samcharrington) August 29, 2023 AI-Driven Transformation panel at #googlecloudnext Analyst Summit featuring data leaders from ⁦@Snap⁩ and ⁦@Wayfair⁩. pic.twitter.com/aANlHv6nHT— Sam Charrington (@samcharrington) August 29, 2023 https://twitter.com/samcharrington/status/1696597457134817490https://twitter.com/samcharrington/status/1696597126090985574"For the first time, the business is really engaged in transformation... We will figure out hallucinations, omissions, etc., ... but the level of engagement is game changing."- Gil Perez, Chief Innovation Officer, Deutsche Bank /*! elementor - v3.12.2 - 23-04-2023 */ .elementor-widget-divider{--divider-border-style:none;--divider-border-width:1px;--divider-color:#0c0d0e;--divider-icon-size:20px;--divider-element-spacing:10px;--divider-pattern-height:24px;--divider-pattern-size:20px;--divider-pattern-url:none;--divider-pattern-repeat:repeat-x}.elementor-widget-divider .elementor-divider{display:flex}.elementor-widget-divider .elementor-divider__text{font-size:15px;line-height:1;max-width:95%}.elementor-widget-divider .elementor-divider__element{margin:0 var(--divider-element-spacing);flex-shrink:0}.elementor-widget-divider .elementor-icon{font-size:var(--divider-icon-size)}.elementor-widget-divider .elementor-divider-separator{display:flex;margin:0;direction:ltr}.elementor-widget-divider--view-line_icon .elementor-divider-separator,.elementor-widget-divider--view-line_text .elementor-divider-separator{align-items:center}.elementor-widget-divider--view-line_icon .elementor-divider-separator:after,.elementor-widget-divider--view-line_icon .elementor-divider-separator:before,.elementor-widget-divider--view-line_text .elementor-divider-separator:after,.elementor-widget-divider--view-line_text .elementor-divider-separator:before{display:block;content:"";border-bottom:0;flex-grow:1;border-top:var(--divider-border-width) var(--divider-border-style) var(--divider-color)}.elementor-widget-divider--element-align-left .elementor-divider .elementor-divider-separator>.elementor-divider__svg:first-of-type{flex-grow:0;flex-shrink:100}.elementor-widget-divider--element-align-left .elementor-divider-separator:before{content:none}.elementor-widget-divider--element-align-left .elementor-divider__element{margin-left:0}.elementor-widget-divider--element-align-right .elementor-divider .elementor-divider-separator>.elementor-divider__svg:last-of-type{flex-grow:0;flex-shrink:100}.elementor-widget-divider--element-align-right .elementor-divider-separator:after{content:none}.elementor-widget-divider--element-align-right .elementor-divider__element{margin-right:0}.elementor-widget-divider:not(.elementor-widget-divider--view-line_text):not(.elementor-widget-divider--view-line_icon) .elementor-divider-separator{border-top:var(--divider-border-width) var(--divider-border-style) var(--divider-color)}.elementor-widget-divider--separator-type-pattern{--divider-border-style:none}.elementor-widget-divider--separator-type-pattern.elementor-widget-divider--view-line .elementor-divider-separator,.elementor-widget-divider--separator-type-pattern:not(.elementor-widget-divider--view-line) .elementor-divider-separator:after,.elementor-widget-divider--separator-type-pattern:not(.elementor-widget-divider--view-line) .elementor-divider-separator:before,.elementor-widget-divider--separator-type-pattern:not([class*=elementor-widget-divider--view]) .elementor-divider-separator{width:100%;min-height:var(--divider-pattern-height);-webkit-mask-size:var(--divider-pattern-size) 100%;mask-size:var(--divider-pattern-size) 100%;-webkit-mask-repeat:var(--divider-pattern-repeat);mask-repeat:var(--divider-pattern-repeat);background-color:var(--divider-color);-webkit-mask-image:var(--divider-pattern-url);mask-image:var(--divider-pattern-url)}.elementor-widget-divider--no-spacing{--divider-pattern-size:auto}.elementor-widget-divider--bg-round{--divider-pattern-repeat:round}.rtl .elementor-widget-divider .elementor-divider__text{direction:rtl}.e-con-inner>.elementor-widget-divider,.e-con>.elementor-widget-divider{width:var(--container-widget-width,100%);--flex-grow:var(--container-widget-flex-grow)} Additionally, Google Cloud continues to grow their generative AI ecosystem, announcing availability of Anthropic’s Claude2 and Meta’s Llama2 & CodeLlama models in the Vertex AI Model Garden.TK highlighting breadth of model catalog in Vertex AI, via new and existing model partners. Announcing support for @AnthropicAI Claude2 and @MetaAI Llama2 and CodeLlama models. #googlecloudnext pic.twitter.com/E1gkpT59UA— Sam Charrington (@samcharrington) August 29, 2023 /*! elementor - v3.12.2 - 23-04-2023 */ .elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=".svg"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block} OpportunitiesNumerous opportunities remain for Google Cloud, most notably in managing complexity in both their messaging and communication as well as in the products themselves.From a messaging perspective, with so many new ideas to talk about, it is not always clear what is actually a new feature or product capability, vs. simply a trendy topic that the company wants to be able to talk about. For example, the company mentioned new Grounding features for LLMs numerous times but I’ve been unable to find any concrete detail about how new features enable this on the platform. The wrap-up blog post noted previously links to an older blog post on the broader topic of using embeddings to ground LLM output using 1st party and 3rd party products. It’s a nice resource but not really related to any new product features.And since the conference, I’ve spent some time exploring various Vertex AI features and APIs and generally still find the console and example notebooks a bit confusing to use and the documentation a bit inconsistent. To be fair, these complaints could be leveled at any of Google Cloud’s major competitors as well, but coming from an underdog position in the cloud computing race, Google has the most to lose if product complexity makes switching costs too high.Nonetheless, I’m looking forward to seeing how things evolve for Google Cloud over the next few months. In fact, we won’t need to wait a full year for updates, since Google Cloud Next ‘24 will take place in the spring, April 9-11, in Las Vegas.
I recently had the opportunity to attend the Google Cloud Executive Forum, held at Google’s impressive new Bay View campus, in Mountain View, California. The Forum was an invitation-only event that brought together CIOs and CTOs of leading companies to discuss Generative AI and showcase Google Cloud’s latest advancements in the domain. I shared my real-time reactions to the event content via Twitter, some of which you can find here. (Some weren't hash-tagged, but you can find most by navigating the threads.) In this post I’ll add a few key takeaways and observations from the day I spent at the event. Key Takeaways Continued product velocity Google Cloud has executed impressively against the Generative AI opportunity, with a wide variety of product offerings announced at the Google Data Cloud & AI Summit in March and at Google I/O in May. These include new tools like Generative AI Studio and Generative AI App Builder; models like PaLM for Text and Chat, Chirp, Imagen, and Codey; Embeddings APIs for Text and Images; Duet AI for Google Workspace and Google Cloud; new hardware offerings; and more. The company took the opportunity of the Forum to announce the general availability of Generative AI Studio and Model Garden, both part of the Vertex AI platform, as well as the pre-order availability of Duet AI for Google Workspace. Nenshad Bardoliwalla, product director for Vertex AI, delivered an impressive demo showing one-click fine tuning and deployment of foundation models on the platform. /*! elementor - v3.12.2 - 23-04-2023 */.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=".svg"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block} Considering that the post-ChatGPT Generative AI wave is only six months old, Google’s ability to quickly get Gen AI products out the door and into customer hands quickly has been noteworthy. Customer and partner traction Speaking of customers, this was another area where Google Cloud’s performance has been impressive. The company announced several new Generative AI customer case studies at the Forum, including Mayo Clinic, GA Telesis, Priceline, and PhotoRoom. Executives from Wendy’s, Wayfair, Priceline and Mayo participated in an engaging customer panel that was part of the opening keynote session. Several other customers were mentioned during various keynotes and sessions, as well as in private meetings I had with Google Cloud execs. See my Twitter thread for highlights and perspectives from the customer panel, which shared interesting insights about how those orgs are thinking about generative AI. Strong positioning While Models Aren’t Everything™, in a generative AI competitive landscape in which Microsoft’s strategy is strongly oriented around a single opaque model (ChatGPT via its OpenAI investment) and AWS’ strategy is strongly oriented around models from partners and open source communities, Google Cloud is promoting itself as a one-stop shop with strong first party models from Google AI, support for open source models via its Model Garden, as well as partnerships with external research labs like AI21, Anthropic and Cohere. The company also demonstrates a strong understanding of enterprise customer requirements around generative AI, with particular emphasis on data and model privacy, security and governance. The company’s strategy will continue to evolve and unfold in the upcoming months and much more will be discussed at Google Cloud Next in August, but I liked what I heard from product leaders at the event about the direction they’re heading. One hint: they have some strong ideas about how to address hallucination, which is one of the biggest drawbacks to enterprise use of large language models (LLMs). I don’t believe that hallucinations by LLMs can ever be completely eliminated, but in the context of a complete system with access to a comprehensive map of the world’s knowledge, there’s a good chance that the issue can be sufficiently mitigated to make LLMs useful in a wide variety of customer-facing enterprise use cases. Complex communication environment and need to educate In his opening keynote to an audience of executives, TK introduced concepts like reinforcement learning from human feedback, low-rank adaptation, synthetic data generation, and more. While impressive, and to some degree an issue of TK’s personal style, it’s also a bit indicative of where we are in this market that we’re talking to CIOs about LoRA and not ROI. This will certainly evolve as customers get more sophisticated and use cases get more stabilized, but it’s indicative of the complex communication challenges Google faces in evangelizing highly technical products in a brand new space to a rapidly growing audience. This also highlights the need for strong customer and market education efforts, to help bring all the new entrants up to speed. To this end, Google Cloud announced new consulting offerings, learning journeys, and reference architectures at the Forum to help customers get up to speed. (To add to the training courses announced at I/O). I also got to chat 1:1 with one of their “black belt ambassadors,” part of a team they’ve put in place to help support the broader engineering, sales and other internal teams at the company. Overall, I think the company’s success will be in large part dependent on their effectiveness at helping to bring these external and internal communities up to speed on Generative AI. Broad range of attitudes A broad range of attitudes about Generative AI was present at the event. On the one hand there was what I took as a very healthy “moderated enthusiasm” on the part of some. Wayfair CTO Fiona Tan exemplified this perspective both in her comments on the customer panel and in our lunch discussion. She talked about the need to manage “digital legacy” and the importance of platform investments, and was clear in noting that many of the company’s early investments in generative AI were experiments (e.g. a stable-diffusion based room designer they’re working on). On the other hand, there were comments clearly indicative of “inflated expectations,” like those of another panelist who speculated that using code generation would allow enterprises to reduce the time it takes to build applications from six weeks to two days or those of a fellow analyst who proclaimed that generative AI was the solution to healthcare in America. The quicker we get everyone past this stage the better. For its part, Google Cloud did a good job navigating this communication challenge by staying grounded on what real companies were doing with its products. I’m grateful to the Google Cloud Analyst Relations team for bringing me out to attend the event. Disclosure: Google is a client.
Dr. Jilei Hou is a Vice President of Engineering at Qualcomm Technologies, Inc. and the Head of Qualcomm AI Research initiative at Qualcomm Research. Jilei obtained his Ph.D. from the University of California, San Diego and joined Qualcomm in 2003. He made substantial contributions to technology innovation, standardization, and product commercialization across wireless 3G/4G/5G standards. In 2011, he moved to Beijing and became the Head of Qualcomm Research China. In this role, he developed the China R&D team into a local research powerhouse, initiating 5G research and ground robotics programs that benefit Qualcomm business interests in Greater China. In 2017, he moved back to San Diego and currently leads the AI Research Program at Qualcomm Research. He is responsible for building the machine learning research infrastructure, driving technical innovations for next-gen hardware and software platforms, and leading forward looking research to benefit technology verticals. He is an IEEE Senior Member and participated in several Frontiers of Engineering (FOE) Symposia organized by US and/or China National Academies of Engineering.    
A recent New Yorker article, "ChatGPT Is a Blurry JPEG of the Web,” has been making the rounds.After sharing an anecdote about a Xerox copier with an overly aggressive compression system, author Ted Chiang constructs an analogy between ChatGPT and compression technology, comparing the AI-powered chatbot to a Photoshop-esque blur tool applied to paragraphs from the internet.With the blurry JPEG analogy in place, the author extends it to ask, and answer, the question of whether ChatGPT is a useful tool for writers. His conclusion: No, it is not."Given that stipulation, can the text generated by large language models be a useful starting point for writers to build off when writing something original, whether it's fiction or nonfiction?""Obviously, no one can speak for all writers, but let me make the argument that starting with a blurry copy of unoriginal work isn't a good way to create original work.""The hours spent choosing the right word and rearranging sentences to better follow one another are what teach you how meaning is conveyed by prose."While it's certainly true that time spent writing is what makes you a good writer, the author's comments strike me a bit like those of a 19th-century writer arguing that the feel of the plume, the act of dipping it into the inkwell, and the need to repeatedly rewrite pages by hand during editing are all essential to the craft of writing. Such a writer might have dismissed typewriters, claiming that "real writers" would never use one.Here's where the article's core analogy and argument are a miss:ChatGPT isn't just lossy compression. The author overlooks an undeniably creative aspect of ChatGPT. Xerox bug notwithstanding, lossy compression is generally a one-way road to worse. ChatGPT isn't like that. When was the last time you compressed a JPEG picture of your mom in the kitchen and opened it to find a picture of her in another world slaying dragons? Probably never; you just got a worse picture of your mom in the kitchen. But while there's (probably) no "original" paragraph on the internet about your mom slaying dragons, I bet ChatGPT can come up with one for us.Human creativity is in the loop. ChatGPT is not just randomly serving up low-quality versions of paragraphs that already exist, it is creating new paragraphs a word at a time based on a prompt provided by the user. This prompt is the creative spark that gets the large language model (LLM) engine going. It discounts human ingenuity to conclude that savvy writers won't be able to use these prompts to coax the tool into producing useful output.ChatGPT will evolve. The product is still only a preview, and OpenAI has already made improvements since its initial release. While some of ChatGPT's flaws may prove insurmountable within its current architecture, the rapid advancement of generative AI technologies makes it unwise to suggest that writers ignore this tool as if it will forever be fossilized in its current state.I am a writer. Perhaps more of a utilitarian writer than the artisanal variety contemplated by Chiang, but I do write professionally and consider it something I'm good at. I am also a ChatGPT user, having applied it to a wide variety of writing tasks since it became available.It's early days yet, but I already see great potential in LLMs like ChatGPT as a writing and editing partner. As is the case with most tools, it will become more useful as I become more skilled in using it, and as its capabilities and user interface improves. And in a year or two I'll be way further ahead than those writers who heeded this article's bad advice and sat out the first few rounds.What do you think? Is ChatGPT a useful tool for writing? Have you already added it to your toolbox? Let me know with a reply, or join the conversation over on LinkedIn.
Today we conclude our coverage of the 2022 NeurIPS series joined by Catherine Nakalembe, an associate research professor at the University of Maryland, and Africa Program Director under NASA Harvest. In our conversation with Catherine, we take a deep dive into her talk from the ML in the Physical Sciences workshop, Supporting Food Security in Africa using Machine Learning and Earth Observations. We discuss the broad challenges associated with food insecurity, as well as Catherine’s role and the priorities of Harvest Africa, a program focused on advancing innovative satellite-driven methods to produce automated within-season crop type and crop-specific condition products that support agricultural assessments. We explore some of the technical challenges of her work, including the limited, but growing, access to remote sensing and earth observation datasets and how the availability of that data has changed in recent years, the lack of benchmarks for the tasks she’s working on, examples of how they’ve applied techniques like multi-task learning and task-informed meta-learning, and much more.
With the widespread news of tech company layoffs, it looks like the industry is entering the inevitable downward phase in the economic cycle. This is the second-and-a-half time I’ve experienced this in my career. The first was the dot-com bust in the early 2000s. The second was the global financial crisis of 2007-2008. The "half" was the blip at the beginning of the pandemic, which while it impacted many, ended up being more of a realignment for those of us fortunate enough to work in technology, with many more opportunities created than destroyed. While I remain optimistic for our field, this will be a painful period for many, both those directly affected by layoffs as well as those for whom uncertainty and the specter of a broader recession loom. And all before we’ve really had an opportunity to mentally recover from 2020 and 2021. If you have been impacted by layoffs, please know that there are a lot of people who want to help, me included. One of the best ways I think I can help is by connecting those who are hiring with those who are looking, and those looking to help in other ways with those who might benefit. Though the current macroeconomic climate is one that will leave many organizations cautious about hiring, savvy organizations with solid business fundamentals will recognize it as a hiring opportunity. If you are in this position and you are hiring data and AI talent, including AI-savvy marketing and business roles, please reach out to me and I'll do my best to connect you with quality candidates. And if you are looking for a job, please reach out as well and I'll do my best to connect you with companies that I learn are hiring. If you’re neither hiring nor looking but would like to help in other ways, consider mentoring or coaching someone, or leading a study group in the TWIML Community for those who want to learn new skills. Let me know if any of those are of interest. If you have other ideas for how I might be able to help out, please let me know. Thank you for considering how you can help those in our community who have been impacted by layoffs. I look forward to hearing from you soon.
Today we’re joined by Arash Behboodi, a machine learning researcher at Qualcomm Technologies. In our conversation with Arash, we explore his paper Equivariant Priors for Compressed Sensing with Unknown Orientation, which proposes using equivariant generative models as a prior means to show that signals with unknown orientations can be recovered with iterative gradient descent on the latent space of these models and provide additional theoretical recovery guarantees. We discuss the differences between compression and compressed sensing, how he was able to evolve a traditional VAE architecture to understand equivalence, and some of the research areas he’s applying this work, including cryo-electron microscopy. We also discuss a few of the other papers that his colleagues have submitted to the conference, including Overcoming Oscillations in Quantization-Aware Training, Variational On-the-Fly Personalization, and CITRIS: Causal Identifiability from Temporal Intervened Sequences.
Today we wrap up our coverage of the 2022 CVPR conference joined by Aljosa Osep, a postdoc at the Technical University of Munich & Carnegie Mellon University. In our conversation with Aljosa, we explore his broader research interests in achieving robot vision, and his vision for what it will look like when that goal is achieved. The first paper we dig into is Text2Pos: Text-to-Point-Cloud Cross-Modal Localization, which proposes a cross-modal localization module that learns to align textual descriptions with localization cues in a coarse-to-fine manner. Next up, we explore the paper Forecasting from LiDAR via Future Object Detection, which proposes an end-to-end approach for detection and motion forecasting based on raw sensor measurement as opposed to ground truth tracks. Finally, we discuss Aljosa’s third and final paper Opening up Open-World Tracking, which proposes a new benchmark to analyze existing efforts in multi-object tracking and constructs a baseline for these tasks.
Today we continue our CVPR series joined by Kate Saenko, an associate professor at Boston University and a consulting professor for the MIT-IBM Watson AI Lab. In our conversation with Kate, we explore her research in multimodal learning, which she spoke about at the Multimodal Learning and Applications Workshop, one of a whopping 6 workshops she spoke at. We discuss the emergence of multimodal learning, the current research frontier, and Kate’s thoughts on the inherent bias in LLMs and how to deal with it. We also talk through some of the challenges that come up when building out applications, including the cost of labeling, and some of the methods she’s had success with. Finally, we discuss Kate’s perspective on the monopolizing of compute resources for “foundational” models, and her paper Unsupervised Domain Generalization by learning a Bridge Across Domains.
Peter Skomoroch is an entrepreneur, investor, and the former Head of Data Products at Workday and LinkedIn. He was Co-Founder and CEO of SkipFlag, a venture backed deep learning startup acquired by Workday in 2018. Peter is a senior executive with extensive experience building and running teams that develop products powered by data and machine learning. He was an early member of the data team at LinkedIn, the world's largest professional network with over 500 million members worldwide. As a Principal Data Scientist at LinkedIn, he led data science teams focused on reputation, search, inferred identity, and building data products. He was also the creator of LinkedIn Skills and Endorsements, one of the fastest growing new product features in LinkedIn's history. Before joining LinkedIn, Peter was Director of Analytics at Juice Analytics and a Senior Research Engineer at AOL Search. In a previous life, he developed price optimization models for Fortune 500 retailers, studied machine learning at MIT, and worked on Biodefense projects for DARPA and The Department of Defense. Peter has a B.S. in Mathematics and Physics from Brandeis University and research experience in Machine Learning and Neuroscience.
Nick has been a data scientist since the early 2000s. After obtaining an undergraduate degree in geology at Cambridge University in England (2000), he completed Masters (2001) and PhD (2004) degrees in Astronomy at the University of Sussex, then moved to North America, completing postdoctoral positions in Astronomy at the University of Illinois at Urbana-Champaign (2004-9, joint with the National Center for Supercomputing Applications), and the Herzberg Institute of Astrophysics in Victoria, BC, Canada (2009-2013). He joined Skytree, a startup company specializing in machine learning, in 2012,and in 2017 the Skytree technology and team was acquired by infosys. Machine learning has been part of his work since 2000, first applying it to large astronomical datasets, followed by wide ranges of application as a generalist data scientist at Skytree, Infosys, Oracle, and now Dotscience.
Jack Ploshnick is a Customer Data Scientist at Splice Machine. His work focuses on using analytics to support the sales and marketing teams, as well as onboarding new customers. Prior to Splice Machine, Jack worked in politics as a data scientist. Jack received his undergraduate degree from Washington University in St. Louis.  
As a Principal Architect of Ethical AI Practice at Salesforce, Kathy develops research-informed best practice to educate Salesforce employees, customers, and the industry on the development of responsible AI. She collaborates and partners with external AI and ethics experts to continuously evolve Salesforce policies, practices, and products. Prior to Salesforce, she worked at Google, eBay, and Oracle in User Experience Research. She received her MS in Engineering Psychology and BS in Applied Psychology from the Georgia Institute of Technology. The second edition of her book, "Understanding your users," was published in May 2015. You can read about her current research at einstein.ai/ethics.
Today we're joined by return guest Xavier Amatriain, Co-founder and CTO of Curai. With the goal of providing the world's best primary care to patients via their smartphone, Xavier turned to machine learning and AI to bring down costs and make Curai accessible and scaleable. In our conversation, we touch on the shortcomings of traditional primary care, and how Curai fills that role, and some of the unique challenges his team faces in applying this use case in the healthcare space. We also discuss the use of expert systems, how they develop and train these systems with synthetic data through noise injection, and how NLP projects like BERT, Transformer, and GPT-2 fit into what Curai is building.    
Inspired by the article Geometric Foundations of Deep Learning, and the fact that three of the authors are former podcast guests, we present TWIML’s Geometry in Machine Learning playlist:
am a cognitive science PhD researcher at the Complex Software Lab in the school of computer science at University college Dublin , Ireland. My interdisciplinary research sits at the intersection of complex adaptive systems, machine learning, and critical race studies. On the one hand, complexity science tells us that people, as complex adaptive systems are inherently indeterminable. On the other, machine learning systems that claim to predict human behaviour are becoming ubiquitous in all spheres of social life. Machine prediction, when deployed to high-stake situations, not only is erroneous but also presents real harm to those at the margins of society. I examine questions of such nature in my PhD. I co-lead the Data Economies and Data Governance working group, one of the Mechanism Design for Social Good (MD4SG) working groups. I am also a member of the Coalition For Critical Technology group. I am currently working as a Research Scientist intern at DeepMind with the Ethics Research team. I have numerous ongoing projects and I look forward to sharing them as they come to completion. As well as my full-time intern position at DeepMind, I am also currently in the final year of my PhD which means I am unable to accept invitations to speak at conferences, workshops, or similar events.
I am currently a post-doctoral researcher at MIT, and an affiliate assistant professor in computer science at the Paul G. Allen School at the University of Washington. I will join the department as an assistant professor in Fall 2022. At MIT, I am collaborating with Professor Russ Tedrake and Professor Pulkit Agarwal. I am currently looking for highly motivated students interested in pushing the frontier of robotics and machine learning problems to work with me at UW. Please apply directly through the UW admissions portal, and drop me a line to look through your application. I completed my PhD in machine learning and robotics at BAIR at UC Berkeley, where I was advised by Professor Sergey Levine and Professor Pieter Abbeel and funded by the NSF GRFP. In a previous life, I completed my bachelors degree at UC Berkeley. My main research goal is to develop algorithms which enable robotic systems to learn how to perform complex tasks in a variety of unstructured environments. To that end, I work towards building deep reinforcement learning algorithms that can learn in the real world. Recently, I have been specifically focusing on the problems of reward specification, continual real world data collection and learning, offline reinforcement learning for robotics and multi-task learning and dexterous manipulation with robotic hands.
Arul Menezes is Distinguished Engineer at Microsoft and the founder of Microsoft Translator. He has grown it from a small research project in Microsoft Research into one of Microsoft’s most successful flagship AI services within the Azure Cognitive Services family, translating 90+ languages and dialects, used by hundreds of millions of consumers, and tens of thousands of developers and businesses worldwide. It is also embedded in Microsoft products such as Office, Bing, Windows, Skype. Arul has 30+ years of deep experience in computer science, software development, and 20+ years in natural language processing and artificial intelligence. In building Microsoft Translator, Arul followed the model of a startup embedded in Microsoft, owning Translation from basic research to technology productization, data acquisition, model training, web service and API (99.95% SLA), as well as consumer-facing mobile and PC applications. Neural Machine translation is one of the most advanced and demanding of the current wave of AI technologies, regularly modelling terabytes of data. Arul's team recently announced several major breakthroughs. In March 2018, the Translator team announced it had reached parity with professional human translators, a first for MT technology. This was demonstrated using a standard research community test set of Chinese news (translated into English) and all data and evaluation results were released to the research community. In April 2018, the team announced neural offline-translation on Android and iOS with translation quality almost matching the Cloud. This is the first availability of neural MT models running locally on regular phones. In May 2018, the team announced Custom Translator, enabling, for the first time in the industry, self-service customization of neural machine translation models to customer data and domains. His team has also applied the same technology to a wide variety of AI tasks, including grammatical error correction, and natural language understanding.
I was at senior scientist at like.com (formerly Riya.com) from 2005 through 2007. There I worked on face recognition in consumer photos, and on the company's visual search engine. Since 2007 I have been working at Google. Experience leading engineering and applied research teams to solve open-ended problems and ship products. Expertise in designing large-scale data processing systems that have at their core an optimization or machine learning component. Expertise in machine learning, with applications to computer vision, computer graphics and robotics. For the last ~5 years, leading teams that innovate and apply machine learning techniques to all major autonomous driving systems: mapping, perception, behavior prediction, planning and simulation. Technical depth in image understanding (object detection and classification), 3D perception (3D reconstruction, 3D object pose/shape estimation and tracking) and SLAM (mapping, localization, sensor calibration). Hands on experience with image, lidar, radar, IMU, wheel odometry and GPS data. Expertise in behavior prediction, planning and simulation for autonomous vehicles.
Since Sep 2016, I am a University Lecturer (equivalent to US Assistant Professor) in Machine Learning at the Department of Engineering in the University of Cambridge, UK. I was before a postdoctoral fellow in the Harvard Intelligent Probabilistic Systems group at the School of Engineering and Applied Sciencies of Harvard University, working with the group leader Prof. Ryan Adams. This position was funded through a post-doctoral fellowship given by the Rafael del Pino Foundation. Before that, I was a postdoctoral research associate in the Machine Learning Group at the Department of Engineering in the University of Cambridge (UK) from June 2011 to August 2014, working with Prof. Zoubin Ghahramani. During my first two years in Cambridge I worked in a collaboration project with the Indian multinational company Infosys Technologies. I also spent two weeks giving lectures on Bayesian Machine Learning at Charles University in Prague (Czech Republic). From December 2010 to May 2011, I was a teaching assistant at the Computer Science Department in Universidad Autónoma de Madrid (Spain), where I completed my Ph.D. and M.Phil. in Computer Science in December 2010 and June 2007, respectively. I also obtained a B.Sc. in Computer Science from this institution in June 2004, with a special prize to the best academic record on graduation. My research revolves around model based machine learning with a focus on probabilistic learning techniques and with a particular interest on Bayesian optimization, matrix factorization methods, copulas, Gaussian processes and sparse linear models. A general feature of my work is also an emphasis on fast methods for approximate Bayesian inference that scale to large datasets. The results of my research have been published at top machine learning journals (Journal of Machine Learning Research) and conferences (NIPS and ICML).
Roland Memisevic received a Ph.D. in Computer Science from the University of Toronto in 2008 under the supervision of Prof. Geoffrey Hinton. In 2021, he joined Qualcomm AI research via the acquisition of startup Twenty Billion Neurons, which he co-founded in 2015. Roland has previously been a faculty member at MILA (University of Montreal) and at the University of Frankfurt (Germany), as well as a postdoc at ETH Zurich (Switzerland). Roland is interested in human-like AI and in the emergence of common sense in neural networks.
Sergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a Ph.D. in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as applications in other decision-making domains. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more.
Pursuing principles of computational intelligence. Exploring opportunities to leverage complementarities of human and machine reasoning. Passionate about harnessing computing advances to enhance the quality of peoples' lives. Before moving into role of Chief Scientific Officer, Eric served as director of Microsoft Research Labs, including labs at Redmond, New York, New England, Montreal, Bangalore, and Cambridge UK. The Microsoft Research home page is a starting point for learning about people and projects at Microsoft Research.
Cynthia Bennett is a postdoctoral researcher at Carnegie Mellon University's Human-Computer Interaction Institute, and a researcher at Apple, supervised by Jeffrey Bigham. Her research sits at the intersection of Human-Computer Interaction, accessibility, and Disability Studies. Cynthia's work spans from the critique and development of HCI theory and methods to designing emergent accessible interactions with technology. Irrespective of the project, her aim is to inform what she does as much as possible with the lived experiences and creativity of people with disabilities.
Chancey Fleet is a 2018-19 Data & Society Fellow and current Affiliate-in-Residence whose writing, organizing and advocacy aims to catalyze critical inquiry into how cloud-connected accessibility tools benefit and harm, empower and expose communities of disability. Chancey is the Assistive Technology Coordinator at the New York Public Library where she founded and maintains the Dimensions Project, a free open lab for the exploration and creation of accessible images, models and data representations through tactile graphics, 3d models and nonvisual approaches to coding, CAD and "visual" arts. Chancey was recognized as a 2017 Library Journal Mover and Shaker.
Ya Xu leads the LinkedIn data science practice, consisting of hundreds of researchers distributed across the USA (Sunnyvale, Mountain View, San Francisco, New York), India and Dublin. The team touches every aspect of the organization, helping to inform decisions about new product features, business investments, surfacing economic insights for policy makers, and much more. Prior to LinkedIn, Ya was an Applied Researcher at Microsoft in Washington. Ya is also committed to helping lead LinkedIn's efforts to practice responsible data science, tackling tough challenges like "How do we maintain members' privacy when working with their data?" and "How can we ensure that product updates benefit all members equally?" Her team works with cutting-edge techniques in areas like differential privacy, equality A/B testing, and others in order to uphold LinkedIn's ethical values in their work to use LinkedIn's vast data to create economic opportunity. She has a BS in Economics and Mathematics from Williams College, and PhD in Statistics from Stanford University.
Venkatesh is a PhD Student in Computer Science and Engineering at Paul G. Allen School of Computer Science and Engineering, University of Washington, advised by Prof jennifer Mankoff where his research focus is to make software engineering inclusive to developers who are blind or visually impaired (BVI) through accessible programming tools and efficient screen reader interactions. His current research at UW aims to make Graphical User Interface (GUI) development nonvisually accessible. Venkatesh is a google Lime scholar. he spends his time outside of research working to improve STEM education and opportunities for people with disabilities through I-Stem, an organization aimed at improving STEM education and career opportunities for people with disabilities in the capacity of a co-founder.  
Srivathsan is the Dev Manager for the ML Platform. Prior to Intuit, he was responsible for building the cloud platform at eBay. His team built the cloud platform-as-a-service, enabling eBay to run 1000s of services handling 10s of Billions of calls every day. The journey of building a cloud platform took ebay from a non-cloud environment, to a private cloud on VMWare to a private cloud running on OpenStack, to adopting Kubernetes at scale. Our team was one of the early contributors and adopters of Kubernetes in an on-premise environment.
Founder of the inclusive design firm Prime Access Consulting (PAC), Sina Bahram is an accessibility consultant, computer scientist, researcher, speaker, and entrepreneur. In 2012, Sina was recognized as a White House Champion of Change by President Barack Obama for his doctoral research work enabling users with disabilities to succeed in Science, Technology, Engineering, and Math (STEM) fields. Believing that accessibility is sustainable when adopted as a culture, not just a tactic, Sina and his team work with executive management, policy makers, engineering teams, content creators, designers, and other stakeholders within institutions to promulgate accessibility and inclusive design throughout the fabric of an organization. Under Sina's direction, PAC has helped over 100 organizations to meet and exceed their inclusivity goals, from the creation of accessible websites and mobile apps to achieving a comprehensive inclusive design methodology across the enterprise. In addition to serving on and chairing various boards, conferences, committees, and working groups across corporate, non-profit, and research entities, Sina serves as an invited expert on the World Wide Web Consortium (W3C) Accessible Rich Internet Applications (ARIA) working group where he helps shape the next generation of digital accessibility standards and best practices.
Shalini Kantayya is a Brooklyn based filmmaker and activist known for her debut feature film Catching the Sun, which focuses on the race for a clean energy future. Catching the Sun premiered at the Los Angeles Film Festival and was named a New York Times Critics' Pick. Most recently Shalini directed the critically acclaimed film Coded Bias, which explores the fallout of MIT Media Lab researcher Joy Buolamwini's startling discovery that facial recognition does not see dark-skinned faces accurately, and her journey to push for the first-ever legislation in the U.S. to govern against bias in the algorithms that impact us all. Shalini Kantayya's production company 7th Empire Media works to create a culture of human rights and a sustainable planet through imaginative film and television that makes real impact.
Dr. Sameer Singh is an Associate Professor of Computer Science at the University of California, Irvine (UCI). He is working primarily on robustness and interpretability of machine learning algorithms, along with models that reason with text and structure for natural language processing. Sameer was a postdoctoral researcher at the University of Washington and received his PhD from the University of Massachusetts, Amherst, during which he interned at Microsoft Research, Google Research, and Yahoo! Labs. He has received the NSF CAREER award, selected as a DARPA Riser, UCI ICS Mid-Career Excellence in research award, and the Hellman and the Noyce Faculty Fellowships. His group has received funding from Allen Institute for AI, Amazon, NSF, DARPA, Adobe Research, Hasso Plattner Institute, NEC, Base 11, and FICO. Sameer has published extensively at machine learning and natural language processing venues, including paper awards at KDD 2016, ACL 2018, EMNLP 2019, AKBC 2020, and ACL 2020.
Hi, I am Pavan Turaga. I am an Associate Professor jointly between the departments of Arts, Media, and Engineering and Electrical Engineering (ECEE). I direct the Geometric Media Lab (GML) at ASU, with Max Bernstein. Our work spans tools for representation drawn from statistics, optimization, geometry, and topology, and applied to the areas of computer vision, machine learning, immersive technologies, health-analytics, public understanding of science, arts and performance, and more. Please browse the above tabs for a sampling of our research themes, and publications.
I was born and raised in Quito, Ecuador, and moved to Montreal after high school to study at McGill. I stayed in Montreal for the next 10 years, finished my bachelors, worked at a flight simulator company, and then eventually obtained my masters and PhD at McGill, focusing on Reinforcement Learning under the supervision of Doina Precup and Prakash Panangaden. After my PhD I did a 10-month postdoc in Paris before moving to Pittsburgh to join Google. I have worked at Google for close to 9 years, and am currently a staff research Software Developer in Google Brain in Montreal, focusing on fundamental Reinforcement Learning research, Machine Learning and Creativity, and being a regular advocate for increasing the LatinX representation in the research community. Aside from my interest in coding/AI/math, I am an active musician, love running (6 marathons so far, including Boston!), and discussing politics and activism.
Dr. Morris is a computer scientist conducting research in the areas of human-computer interaction (HCI), computer-supported cooperative work (CSCW), social computing, and accessibility. She is the Research Area Manager for Interaction, Accessibility, and Mixed Reality at Microsoft Research, where she founded the Ability research team. She is also an Affiliate Professor at the University of Washington in the Allen School of Computer Science and Engineering and in The Information School.
Meredith Broussard is an associate professor at the Arthur L. Carter Journalism Institute of New York University and the author of Artificial Unintelligence: How Computers Misunderstand the World. Her research focuses on artificial intelligence in investigative reporting, with a particular interest in using data analysis for social good. Follow her on Twitter @merbroussard.
Jeremy Howard is co-founder of fast.ai, and researcher in residence on medical data science at the University of San Francisco. He is Chief Scientist at platform.ai, and held this role previously at doc.ai and Kaggle, where he was also President.
Dr. Priestley is a Professor of Statistics and Data Science. Since 2004, she has served as the Associate Dean of the Graduate College and as the Executive Director of the Analytics and Data Science Institute at Kennesaw State University. In 2012, the SAS Institute recognized Dr. Priestley as the 2012 Distinguished Statistics Professor of the Year. She served as the 2012 and 2015 Co-Chair of the National Analytics Conference. Datanami recognized Dr. Priestley as one of the top 12 "Data Scientists to Watch in 2016." She architected the first Ph.D. Program in Data Science, which launched in February 2015. Dr. Priestley has been a featured international speaker at The World Statistical Congress, The South African Statistical Association, SAS Global Forum, Big Data Week, Technology Association of Georgia, Data Science ATL, The Atlanta CEO Council, Predictive Analytics World, INFORMS and dozens of academic and corporate conferences addressing issues related to the evolution of data science. She has authored dozens of articles on Binary Classification, Risk Modeling, Sampling, Statistical Methodologies for Problem Solving and Applications of Big Data Analytics. Prior to receiving a Ph.D. in Statistics, Dr. Priestley worked in the Financial Services industry for 11 years. Her positions included Vice President of Business Development for VISA EU in London, as well as for MasterCard US and an analytical consultant with Accenture's strategic services group. Dr. Priestley received a Ph.D. from Georgia State University, an MBA from The Pennsylvania State University, and a BS from Georgia Institute of Technology.
Gurdeep Singh Pall is the corporate vice president for the Information Platform & Experience team at Microsoft Corp., and is part of the Online Services Division's senior leadership team. He is responsible for vision, product strategy and R&D for the Bing services and platform that includes mobile, mapping, and speech. Pall joined Microsoft in January 1990 as a software design engineer. He has worked on many breakthrough products in his tenure, starting with LAN Manager Remote Access Service. Pall was part of the Windows NT development team, working on the first version of Windows NT 3.1 in 1993 as a software design engineer, all the way through Windows XP in 2001 as general manager of Windows Networking. During his work on Windows, he led design and implementation of core networking technologies such as PPP, TCP/IP, UPnP, VPNs, routing and Wi-Fi, and parts of the operating system. Pall co-authored the first VPN protocol in the industry - Point-to-Point Tunneling Protocol (PPTP) - which received the prestigious Innovation of the Year award from PC Magazine in 1996. He also authored several documents and standards in the networking area in the Internet Engineering Task Force (IETF) standards body in the mid-1990s. Pall was previously the corporate vice president for the Office Lync & Speech Group, overseeing the vision, product strategy and R&D for the Microsoft Lync family of products and Microsoft Tellme services. He was appointed general manager of Windows Real-Time Communications efforts in January 2002 and helped develop a broad strategy that led to the formation of the Real Time Collaboration division and acquisition of PlaceWare Inc. (now called Microsoft Office Live Meeting). Since then, Pall has led acquisitions of media-streams.com AG and Parlano, and key industry partnerships with HP, Polycom and Aspect. Microsoft's Unified Communications efforts have received many technical and design industry awards. Pall was named one of the "15 Innovators & Influencers Who Will Make A Difference" in 2008 by Information Week. He co-authored "Institutional Memory Goes Digital," which was published by Harvard Business Review as part of "Breakthrough Ideas for 2009" and subsequently presented at the World Economic Forum 2009 in Davos. Pall has more than 20 patents (in process or approved) in networking, VoIP and collaboration areas. He holds a master's degree in computer science from the University of Oregon and an undergraduate degree in computer engineering from Birla Institute of Technology in India.
Deb Raji is a 2020 Mozilla Fellow who has worked closely with the Algorithmic Justice League initiative, founded by Joy Buolamwini of the MIT Media Lab, on several projects to highlight cases of bias in computer vision. Her first-author work with Joy has been featured in the New York Times, Washington Post, The Verge, Venture Beats, National Post, EnGadget, Toronto Star and won the Best Student Paper Award at the ACM/AAAI Conference for AI Ethics & Society. An enthusiastic student eager to make constructive contributions to society, while heavily investing in self improvement along the way. A self-starter, personally responsible for projects with a notable impact on the local, regional, national and global scale. A determined social entrepreneur and volunteer, committed to helping her surrounding community. Feel free to contact me about internship and full time opportunities : deborah.raji@mail.utoronto.ca
General Manager, #AI and Innovation at Microsoft, author of O'Reilly's book "The AI Organization", passionate about technology and business and fortunate to drive Microsoft AI's business, thought leadership, and incubation motions across Enterprise and developer products and services. There are three things to know about me: - What defines me: a passion for both technology and business. In my career between business and technical leadership roles, I'm usually pointed as the technical guy in Business and the business guy in Engineering, and I'm proud of it! - My motto: "every developer, every app, every platform". For years obsessed with making Microsoft a developer company for all developers and all platforms, I spent most of my energy on transforming our developer products and bringing them to millions to developers. .NET open sourcing and cross-platform, Xamarin acquisition, Visual Studio Code on Mac and Linux, the TypeScript language, and Visual Studio Team Services are my proudest moments in those years. - My purpose: Always attracted to AI, I was honored to be asked to lead a new mission in Microsoft - to bring AI to every developer and every organization. Working on the strategy and execution to make that happen has been my new purpose in the past years and it keeps me jumping out of the bed every morning!
Charlene Chambliss is a Machine Learning Engineer at Primer AI, where she enjoys working on the broad class of tasks she likes to call "helping people find what they need." So far, that has included building a state-of-the-art training and evaluation pipeline for relation extraction models, as well as streamlining the data science, data versioning, and data labeling process on the Applied Research team. Previously, she did data science at Curology, completed 3/4 of an MS in Statistics, and studied psychology at Stanford.
I'm a computational linguist working on language in brains and machines. Currently an Assistant Professor at the University of Chicago. I do computational linguistics, including work on natural language processing systems and work in computational cognitive modeling.
Today we’re joined by Jonathan Le Roux, a senior principal research scientist at Mitsubishi Electric Research Laboratories (MERL). At MERL, Jonathan and his team are focused on using machine learning to solve the “cocktail party problem”, focusing on not only the separation of speech from noise, but also the separation of speech from speech. In our conversation with Jonathan, we focus on his paper The Cocktail Fork Problem: Three-Stem Audio Separation For Real-World Soundtracks, which looks to separate and enhance a complex acoustic scene into three distinct categories, speech, music, and sound effects. We explore the challenges of working with such noisy data, the model architecture used to solve this problem, how ML/DL fits into solving the larger cocktail party problem, future directions for this line of research, and much more!
There are few things I love more than cuddling up with an exciting new book. There are always more things I want to learn than time I have in the day, and I think books are such a fun, long-form way of engaging (one where I won’t be tempted to check Twitter partway through). This book roundup is a selection from the last few years of TWIML guests, counting only the ones related to ML/AI published in the past 10 years. We hope that some of their insights are useful to you! If you liked their book or want to hear more about them before taking the leap into longform writing, check out the accompanying podcast episode (linked on the guest’s name). (Note: These links are affiliate links, which means that ordering through them helps support our show!) Adversarial ML Generative Adversarial Learning: Architectures and Applications (2022), Jürgen Schmidhuber AI Ethics Sex, Race, and Robots: How to Be Human in the Age of AI (2019), Ayanna Howard Ethics and Data Science (2018), Hilary Mason AI Sci-Fi AI 2041: Ten Visions for Our Future (2021), Kai-Fu Lee AI Analysis AI Superpowers: China, Silicon Valley, And The New World Order (2018), Kai-Fu Lee Rebooting AI: Building Artificial Intelligence We Can Trust (2019), Gary Marcus Artificial Unintelligence: How Computers Misunderstand the World (The MIT Press) (2019), Meredith Broussard Complexity: A Guided Tour (2011), Melanie Mitchell Artificial Intelligence: A Guide for Thinking Humans (2019), Melanie Mitchell Career Insights My Journey into AI (2018), Kai-Fu Lee Build a Career in Data Science (2020), Jacqueline Nolis Computational Neuroscience The Computational Brain (2016), Terrence Sejnowski Computer Vision Large-Scale Visual Geo-Localization (Advances in Computer Vision and Pattern Recognition) (2016), Amir Zamir Image Understanding using Sparse Representations (2014), Pavan Turaga Visual Attributes (Advances in Computer Vision and Pattern Recognition) (2017), Devi Parikh Crowdsourcing in Computer Vision (Foundations and Trends(r) in Computer Graphics and Vision) (2016), Adriana Kovashka Riemannian Computing in Computer Vision (2015), Pavan Turaga Databases Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases (2021), Xin Luna Dong Big Data Integration (Synthesis Lectures on Data Management) (2015), Xin Luna Dong Deep Learning The Deep Learning Revolution (2016), Terrence Sejnowski Dive into Deep Learning (2021), Zachary Lipton Introduction to Machine Learning A Course in Machine Learning (2020), Hal Daume III Approaching (Almost) Any Machine Learning Problem (2020), Abhishek Thakur Building Machine Learning Powered Applications: Going from Idea to Product (2020), Emmanuel Ameisen ML Organization Data Driven (2015), Hilary Mason The AI Organization: Learn from Real Companies and Microsoft’s Journey How to Redefine Your Organization with AI (2019), David Carmona MLOps Effective Data Science Infrastructure: How to make data scientists productive (2022), Ville Tuulos Model Specifics An Introduction to Variational Autoencoders (Foundations and Trends(r) in Machine Learning) (2019), Max Welling NLP Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics (2013), Emily M. Bender Robotics What to Expect When You’re Expecting Robots (2021), Julie Shah The New Breed: What Our History with Animals Reveals about Our Future with Robots (2021), Kate Darling Software How To Kernel-based Approximation Methods Using Matlab (2015), Michael McCourt