We could not locate the page you were looking for.

Below we have generated a list of search results based on the page you were trying to reach.

404 Error
Technical Program Manager with over 20 years experience in the field as developer, development lead and program manager. I have experience presenting to large audiences, I enjoy technically challenging management roles, and I thrive in a fast paced culture focused on building high quality products. I started my career in US in 1998 working on Windows 2000 as a developer. I have shipped Windows 2000, Windows XP, Windows 2003 and Windows 7, working on various technologies from Print Spooler Service, Active Directory, 64bit Interop and RPC, TCP/IP and USB device connectivity, XPS document printing, Winb32 and .Net System APIs. I progressed from developer to development lead and I lead a team of 6 developers for 3 years. From 2009 to 2014 I have worked on Windows Phone as Senior Program Manager, driving multiple key projects from defining the process for Application certification of apps in the Windows Phone Store to Fast Application Switching and Fast Application Resume, app pre-compilation, deployment of phone apps in the Windows Store, multitasking for Location apps, VOIP, Audio, etc. As a Principal Program Manager Lead, I've lead a team of 5 Program Managers, driving the feature definition for the Windows Phone Execution Model and App Lifecycle, Resource Management, Multitasking, App-to-App and Page Navigation Model. From 2014 to 2018 I have worked on Speech Recognition for Cortana on PC, Windows Virtual Reality and Hololens experiences. I have worked on improving speech recognition accuracy via personalization of user's language models and acoustic models. I have been on point for enabling 3rd party skills for Cortana via voice commands and for enabling voice input for chat bots built with Microsoft Bot Framework. Since January 2019 I am working on Artificial Intelligence for Cognitive Services and Computer Vision at Microsoft, driving product requirements and working with Engineering and Research teams through the product lifecycle. I drove the product requirements and the engineering release for Computer Vision for Spatial Analysis. For more on Spatial Analysis and Azure Cognitive Services see https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
There are few things I love more than cuddling up with an exciting new book. There are always more things I want to learn than time I have in the day, and I think books are such a fun, long-form way of engaging (one where I won’t be tempted to check Twitter partway through). This book roundup is a selection from the last few years of TWIML guests, counting only the ones related to ML/AI published in the past 10 years. We hope that some of their insights are useful to you! If you liked their book or want to hear more about them before taking the leap into longform writing, check out the accompanying podcast episode (linked on the guest’s name). (Note: These links are affiliate links, which means that ordering through them helps support our show!) Adversarial ML Generative Adversarial Learning: Architectures and Applications (2022), Jürgen Schmidhuber AI Ethics Sex, Race, and Robots: How to Be Human in the Age of AI (2019), Ayanna Howard Ethics and Data Science (2018), Hilary Mason AI Sci-Fi AI 2041: Ten Visions for Our Future (2021), Kai-Fu Lee AI Analysis AI Superpowers: China, Silicon Valley, And The New World Order (2018), Kai-Fu Lee Rebooting AI: Building Artificial Intelligence We Can Trust (2019), Gary Marcus Artificial Unintelligence: How Computers Misunderstand the World (The MIT Press) (2019), Meredith Broussard Complexity: A Guided Tour (2011), Melanie Mitchell Artificial Intelligence: A Guide for Thinking Humans (2019), Melanie Mitchell Career Insights My Journey into AI (2018), Kai-Fu Lee Build a Career in Data Science (2020), Jacqueline Nolis Computational Neuroscience The Computational Brain (2016), Terrence Sejnowski Computer Vision Large-Scale Visual Geo-Localization (Advances in Computer Vision and Pattern Recognition) (2016), Amir Zamir Image Understanding using Sparse Representations (2014), Pavan Turaga Visual Attributes (Advances in Computer Vision and Pattern Recognition) (2017), Devi Parikh Crowdsourcing in Computer Vision (Foundations and Trends(r) in Computer Graphics and Vision) (2016), Adriana Kovashka Riemannian Computing in Computer Vision (2015), Pavan Turaga Databases Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases (2021), Xin Luna Dong Big Data Integration (Synthesis Lectures on Data Management) (2015), Xin Luna Dong Deep Learning The Deep Learning Revolution (2016), Terrence Sejnowski Dive into Deep Learning (2021), Zachary Lipton Introduction to Machine Learning A Course in Machine Learning (2020), Hal Daume III Approaching (Almost) Any Machine Learning Problem (2020), Abhishek Thakur Building Machine Learning Powered Applications: Going from Idea to Product (2020), Emmanuel Ameisen ML Organization Data Driven (2015), Hilary Mason The AI Organization: Learn from Real Companies and Microsoft’s Journey How to Redefine Your Organization with AI (2019), David Carmona MLOps Effective Data Science Infrastructure: How to make data scientists productive (2022), Ville Tuulos Model Specifics An Introduction to Variational Autoencoders (Foundations and Trends(r) in Machine Learning) (2019), Max Welling NLP Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics (2013), Emily M. Bender Robotics What to Expect When You’re Expecting Robots (2021), Julie Shah The New Breed: What Our History with Animals Reveals about Our Future with Robots (2021), Kate Darling Software How To Kernel-based Approximation Methods Using Matlab (2015), Michael McCourt
Today we close out AI Rewind 2019 joined by Amir Zamir, who recently began his tenure as an Assistant Professor of Computer Science at the Swiss Federal Institute of Technology. Amir joined us on the podcast back in 2018 to discuss his CVPR Best Paper winner, and in today's conversation, we continue with the thread of Computer Vision. In our conversation, we discuss quite a few topics, including Vision-for-Robotics, the expansion of the field of 3D Vision, Self-Supervised Learning for CV Tasks, and much more! Check out the rest of the series at twimlai.com/rewind19! We want to hear from you! Send your thoughts on the year that was 2019 below in the comments, or via Twitter at @samcharrington or @twimlai!
Over the past couple weeks I got to sit on the other side of the (proverbial) interview table and take part in a few fantastic podcasts and video conversations about the state of machine learning in the enterprise. We also cover current trends in AI, and some of the exciting plans we have in store for TWIMLcon: AI Platforms. Each of these chats has its own unique flavor and I’m excited to share them with you. The New Stack Makers Podcast.I had a great chat with my friend, Alex Williams, founder of The New Stack, a popular tech blog focused on DevOps and modern software development. We focused on MLOps and the increasingly significant convergence of software engineering and data science. Minter Dialogue. I spoke with Minter Dial, host of the popular podcast, Minter Dialogue, and author of the book Heartificial Empathy: Putting Heart into Business and Artificial Intelligence. We had a wide ranging conversation in which we talked about the future of AI, AI ethics, and the state of AI in businesses. Datamation. In this video chat with James Maguire for Datamation, we discuss some of the key trends surrounding AI in the enterprise, and the steps businesses are taking to operationalize and productionalize machine learning. Hope you enjoy the talks! If you're not already registered for TWIMLcon we'd love to have you join us! Register now!
Welcome to #TWIMLcon Shorts - a series where I sit down with some of our awesome Founding Sponsors and talk about their ML/AI journey, current work in the field and what we can expect from them at TWIMLcon: AI Platforms! First up is Luke Marsden, Founder & CEO of Dotscience. Based in Bristol, UK, Luke joins me to share the Dotscience story and why he is most excited for #TWIMLcon next month! From a stellar breakout session featuring the Dotscience manifesto to live demos at their booth, we can’t wait! Sam Charrington: [00:00:00] All right everyone, I am on the line with Luke Marsden. Luke is the founder and CEO of Dotscience a founding sponsor for TWIMLcon: AI Platforms. So Luke, we go back a little bit from your involvement in the docker space. I remember introducing you at a session at Dockercon quite a few years back, but for those who aren't familiar with your background, who are you? Luke Marsden: [00:00:51] So hey Sam, and thanks for having me on. My name is Luke Marsden, I'm the founder and CEO of Dotscience and I come from a devops background. My last startup was called Cluster HQ, and we were solving the problem of running stateful containers in docker. And so I'm a sort of serial entrepreneur based out of the UK. I live in the beautiful city of Bristol in the southwest and very excited to be involved with TWIML. Sam Charrington: [00:01:28] Awesome. So tell us a little bit about Dotscience and what the company is up to in the AI platform space.  Luke Marsden: [00:01:36] Yeah, sure. So we started Dotscience a couple of years ago. Initially, we were targeting the area of data versioning and devops but we quickly realized that the tool that we built which is an open source project called dotmesh was actually much more relevant and important to the world of AI and machine learning which has a big data versioning and reproducibility problems. So we pivoted to that about a year in, and we've been building an AI platform around that core concept of data versioning.  Sam Charrington: [00:02:13] So tell me a little bit more about that. How are you taking on data versioning? And why is that an important element of the puzzle for folks that are doing AI? Luke Marsden: [00:02:25] Absolutely. So there's really sort of four main pieces of the puzzle that I believe need to be solved to achieve devops for AI, devops for machine learning, and number one is reproducibility - and that's where the data versioning piece comes in. So what we've seen is that there's a lot of chaos and pain that happens when AI or ML teams start trying to operationalize the models that they're developing. And one of the big pain points is if you can't actually get back to the exact version of the data that you use to train your model, then you can't go back and solve problems with it. You can't fix bugs in the model or or really reliably understand sort of exactly where that model came from. So that's kind of that fundamental problem of like which version of the data that I trained is this model on and that's what we solve with with Dotscience. Every time you train a model in Dotscience, you are automatically versioning all of the dependent data sets that that model training happens on. And by using copy-on-write technology, which is a file system technology and in dotmesh, which is part of the Dotscience platform, it does that very efficiently using no more disk space than is required to achieve reproducibility. Sam Charrington: [00:03:52] Awesome. So tell me why are you excited about TWIMLcon: AI Platforms? Luke Marsden: [00:03:59] TWIMLcon looks to be an awesome event. We were actually planning on hosting our own event around the same time in San Francisco to promote Dotscience, but TWIML was such a good fit for what we're trying to do, and the themes and the topics that are being discussed in the space, that we decided to join forces with you guys and become a Founding sponsor rather than running our own things. So yeah, really, really excited and looking forward to it.  Sam Charrington: [00:04:34] That's fantastic and we are super appreciative to have you on board as a Founding sponsor, it is great to have your support in that way. When folks come to your breakout session at TWIMLcon, tell us a little bit about what you'll be covering there, who will be presenting, what can attendees expect to learn from the breakout session. Luke Marsden: [00:04:57] Yes, so the session will be run by my colleague Nick who's our principal data scientist, and the basic premise of the talk really touches on some of the things I mentioned earlier. There's a lot of chaos and pain trying to operationalize AI and that we have this manifesto of things that we believe are needed to go from, sort of the "no-process" process that is the default. So when you start an AI or machine learning project and you have maybe a small number of data scientists or machine learning engineers doing that work, they'll invent a process, right? Any technical group that's doing technical work will make up a process as they go based on the tools that they're familiar with and they'll do their best. But the point of the talk is that the "no-process process," it gets your first model into production when your team is small, but that's really where the problems begin and (Laughter) you end up with this sort of this kind of mess of models and data sets and deployments and hyperparameters and metrics and all these different things flying around, because machine learning is fundamentally more complicated than software development software engineering. And so, by just sort of doing things in an ad-hoc way, you get yourself into this sort of mess quite quickly, and this is something we've seen across hundreds of companies that we've spoken to in the industry. And so basically what we're proposing is a Manifesto, that you should make your machine learning process, the whole process of building, training, deploying, monitoring machine learning models that you should make that that whole process reproducible, accountable, collaborative, and continuous.  And so what I mean by reproducible is that somebody else should be able to come and reproduce the model that I trained now, like 9 or 12 months later without me still needing to be there, without me needing to have kept meticulous manual documentation. Somebody else should be able to go and rerun that model training against the same version of the data with the same version of Tensorflow with the same code, with the same hyperparameters, and get the same accuracy score to within a few percent. If your development environment isn't reproducible, then you won't be able to do that, but we believe that that is key to achieving devops for ML.  So anyway, that's kind of a snapshot of some of the things we'll be talking about in the session. So yeah, please please come along.   Sam Charrington: [00:08:00] Awesome. You'll also be present in TWIMLcon's Community Hall, what can attendees expect to see at the company's booth? Will they be able to get hands on?  Luke Marsden: [00:08:15] Absolutely, so we'll have live demos at the booth. You can see the full end-to-end platform and our Engineers as I speak in the early part of September today, are busily working on the latest features that we're going to have ready in time for the conference in true startup conference driven development mode. (Laughter) So, we will have the deploy to production and statistical monitoring pieces ready in time for the conference. So, it's probably going to be the first time that you can come and see those pieces of the product and and get hands-on with the product will be at TWIML, so please come and check it out. Sam Charrington: [00:09:00] Fantastic.  Luke, thanks so much for chatting with me about what you're up to and what you'll be showing at the event, we are super excited to have you on board with us for TWIMLcon: AI Platforms.   Luke Marsden: [00:09:10] Awesome. Thank you Sam. TWIMLcon: AI Platforms will be held on October 1st and 2nd at the Mission Bay Conference Center in San Francisco. Click here to learn more
Bits & Bytes Microsoft open sources Bing vector search. The company published its vector search toolkit, Space Partition Tree and Graph (SPTAG) [Github], which provides tools for building, searching and serving large scale vector indexes. Intel makes progress toward optical neural networks. A new article on the Intel AI blog (which opens with a reference to TWIML Talk #267 guest Max Welling’s 2018 ICML keynote) describes research by Intel and UC Berkeley into new nanophotonic neural network architectures. A fault tolerant architecture is presented, which sacrifices accuracy to achieve greater robustness to manufacturing imprecision. Microsoft research demonstrates realistic speech with little labeled training data. Researchers have crafted an “almost unsupervised” text-to-speech model that can generate realistic speech using just 200 transcribed voice samples (about 20 minutes’ worth), together with additional unpaired speech and text data. Google deep learning model demonstrates promising results in detecting lung cancer. The system demonstrated the ability to detect lung cancer from low-dose chest computed tomography imagery, outperforming a panel of radiologists. Researchers trained the system on more than 42,000 CT scans. The resulting algorithms turned up 11% fewer false positives and 5% fewer false negatives than their human counterparts. Facebook open-sources Pythia for multimodal vision and language research. Pythia [Github] [arXiv] is a deep learning framework for vision and language multimodal research framework that helps researchers build, reproduce, and benchmark models. Pythia is built on PyTorch and designed for Visual Question Answering (VQA) research, and includes support for multitask learning and distributed training. Facebook unveils what secretive robotics division is working on. The company outlined some of the focus areas for its robotics research team, which include teaching robots to learn how to walk on their own, using curiosity to learn more effectively, and learning through tactile sensing. Dollars & Sense Algorithmia raises $25M Series B for its AI platform Icometrix, a provider of brain imaging AI solutions, has raised $18M Quadric, a startup developing a custom-designed chip and software suite for autonomous systems, has raised $15M in a funding Novi Labs, a developer of AI-driven unconventional well planning software, has raised $7M To receive the Bits & Bytes to your inbox, subscribe to our Newsletter.
Bits & Bytes Google scraps controversial AI ethics council days after it was announced. Google has shuttered its new artificial intelligence ethics council, a little more than a week after announcing it, in large part due to employee reactions to the appointment of a conservative think tank leader to the group. WIPO launches AI-based image search tool for brands. The World Intellectual Property Organization (WIPO) has launched a new AI-powered image search technology that uses deep learning to identify combinations of concepts within an image, thus making it faster and easier to establish a trademark’s distinctiveness. Tim Apple poaches Ian GANfellow. Apple has poached another of Google’s top AI researchers. This time it’s Ian Goodfellow, best known as the creator of GANs, or generative adversarial networks, who has joined Apple in a director role. FDA Proposes Regulatory Framework for AI- and Machine Learning-Driven SaMD. US Food and Drug Administration (FDA) requested feedback on a new discussion paper that proposes applying a “focused review” approach to premarket assessments of software as a medical device (SaMD) technologies that are powered by AI and ML. [Paper] Qualcomm Reveals “Cloud AI 100” AI Inference Accelerator. Qualcomm announced their first discrete dedicated AI processors, the Qualcomm Cloud AI 100 family. The chip, which it expects to begin producing in 2020, is designed for use in datacenters to meet increasing demand for AI inference processing. Google launches an end-to-end AI platform. At Google Next ’19, Google announced the beta version of its end-to-end AI platform, which aims to offer developers and data scientists a one-stop shop for building, testing and deploying models. At Next, the company made several additional announcements as well, including updates to its suite of Cloud AI Solutions, AutoML, BigQuery ML, its pre-trained ML APIs, and more. Petuum Unveils Industrial AI Product for Complex Manufacturing Operations. Petuum announced the Petuum Industrial AI Autopilot product, which enables optimization of complex manufacturing operations with modules that continuously learn and adapt. Dollars & Sense Intel Capital announced investments in five AI companies at its annual Global Summit. IntelCapital led a $13M round in AI chip startup Untether AI and a $150M round in AI systems company SambaNova Systems. In addition, the company invested undisclosed amounts in Andrew Ng’s Landing AI and China-based CloudPick and Zhuhai Eeasy Technology. Run.AI, a startup building a new virtualization and acceleration platform for deep learning, announced that it has raised $13M Enlitic, a San Francisco, CA-based company leveraging AI for medical imaging raised $15M in Series B financing Boston Dynamics announced that it is acquiring California-based startup Kinema Systems, which makes computer-vision and ML systems for warehouse robots Evolve IP, announced that it has acquired Jog.ai, a speech analytics and natural language technology firm based in Austin, Texas Rasa, San Francisco-based an open source company that enables developers to build contextual AI assistants, has secured $13M in Series A funding Labelbox, a collaborative training data platform for machine learning applications, has raised a $10M Series A funding Observe.AI has secured $8M in a Series A funding To receive the Bits & Bytes to your inbox, subscribe to our Newsletter.
Bits & Bytes AWS offers Nvidia’s Tesla T4 chip for AI inference. The new EC2 instances, with Tesla T4 GPUs, will provide AWS customers with a new option for reduced-cost AI inference. T4 will also be available through the Amazon Elastic Container service for Kubernetes. Qualcomm launches new, AI-enabled smart speaker platform. The new Qualcomm QCS400 SoC series brings its high-performance, low-power computer capabilities, and audio technology together to help deliver highly optimized, AI-enabled solutions designed for smarter audio and IoT applications. Expert System announces the release of its new AI platform, Cogito. The new version of Cogito (14.4) delivers key updates in the areas of knowledge graphs, ML and RPA to help organizations accelerate and simplify the adoption of AI in their business workflows. New Falkonry product seeks to simplify the creation and deployment of predictive models at the edge. The product, the Falkonry Edge Analyzer, is a portable self-contained engine that lets customers deploy predictive analysis on edge devices for low latency applications in disconnected, industrial environments or close to data sources. IBM and Intel outline quantum AI advances. In a recent study, IBM researchers shared a quantum supervised learning algorithm with the potential to enable ML on quantum computers in the near future. Meanwhile, Intel and Hebrew University researchers unveiled a proof of deep learning’s superior ability to simulate the computations involved in quantum computing – revealing problems deep learning can excel at and proposing a promising way forward in quantum computing. Neurala launches Brain Builder platform to accelerate the creation of custom vision AI solutions. The SaaS platform, Brain Builder, streamlines the creation of custom computer vision solutions by offering an all-in-one tool for data tagging, training, deployment, and analysis. I saw a demo of the product at GTC and it looked pretty interesting in its ability to quickly create an image classifier with just a few labeled data points, but I have questions about how it works (I was told it didn’t work based on transfer learning). Facebook open-sources hardware designs for AI model training and inference. Facebook is donating its three hardware platforms for AI to the Open Compute Project. These include Zion for training workloads, Kings Canyon for AI inference, and Mount Shasta for video transcoding. Google announces an all-neural on-device speech recognizer. Google rolled out an end-to-end, all-neural, on-device speech recognizer to power speech input in Gboard. In a research paper, Google presented a model trained using RNN transducer (RNN-T) technology that is compact enough to reside on a phone [Paper]. Dollars & Sense AmplifAI Solutions, a company which makes AI-enabled software for call center operations, has raised $3.9M Evolv Technology Solutions, a San Francisco-based autonomous optimization platform for the digital era, closed a $10M Series A financing round Skymind, a San Francisco-based open-core data science company, secured $11.5M in Series A financing TartanSense, a Bengaluru-based AI-powered robot maker, has raised $2M from Omnivore, Blume Determined AI, a San Francisco-based deep learning management startup, raised $11M in funding Automation Hero, a company which helps sales organizations automate back-office processes, has secured $14.5M in new funding led by Atomico Sonasoft Corp. announced that it has jointly signed a definitive purchase agreement to acquire the AI company, Hotify, Inc Apple has acquired Laserlike, a startup that uses ML-based personalization to let users follow news and other topics To receive the Bits & Bytes to your inbox, subscribe to our Newsletter.
Bits & Bytes Google announces TensorFlow 2.0 Alpha, TensorFlow Federated, TensorFlow Privacy. At the 3rd annual TensorFlow Developer Summit, Google announced the first alpha release of TensorFlow 2.0 and several other new releases such as: TensorFlow Federated – a new open-source framework that allows developers to use all the ML-training features from TF while keeping the data local; TensorFlow Privacy – which uses differential privacy to process data in a private manner; extensions to TensorFlow Extended (TFX), a platform for end-to-end machine learning; and Activation Atlases – which attempts to visualize and explain how neural networks process images. Google open sources GPipe, a library for parallel training of large-scale neural networks. GPipe, which is based on the Lingvo (a TensorFlow framework for sequence modeling), is applicable to any network consisting of multiple sequential layers and allows researchers to “easily” scale performance. [Paper] Facebook AI researchers create a text-based adventure to study how AI speak and act. Researchers from Facebook and University College London specifically investigated the impact of grounding dialogue – a collection of mutual knowledge, beliefs, and assumptions essential for communication between two people–on AI agents. Google announces Coral platform for building IoT hardware with on-device AI. Coral targets developers creating IoT hardware from prototyping to production. It is powered by a TPU that is specifically designed to run at the edge and is available in beta. Google and DeepMind are using AI to predict the energy output of wind farms. Google announced that it has made energy produced by wind farms more viable using DeepMind’s ML algorithms to better predict the wind output. Ben-Gurion U. develops new AI platform for ALS care. Researchers at Ben-Gurion University have used ML models to develop a new method of monitoring and predicting the progression of neurodegenerative and help identify markers for personalized patient care and improve drug development. Google rolls out AI grammar checker for G Suite users. Google applies ML techniques to understand complex grammar rules and identify “tricky” grammatical errors by G Suite users. Dollars & Sense PolyAI, a London, UK-based platform for conversational AI, raised $12M in Series A funding Wade & Wendy, a NYC-based AI recruitment platform, closed a $7.6M Series A funding Brodmann17, a Tel Aviv, based provider of vision-first technology for automated driving, raised $11M in Series A funding Paradox.ai, a Scottsdale-based assistive intelligence platform raised $13.34M series A funding Apple acquires patents from AI security camera maker Lighthouse Horizon Robotics, China-based AI chip maker raises $600M ELSA, US-based AI language learning app, raised $7M Modulate, a Cambridge-based ML startup raised $2M in seed funding Zone7, which uses AI to predict injuries in sports, has secured $2.5M DataRobot acquires a data collaboration platform company, Cursor Splice Machine announced that it has raised $16M for unified ML platform Senseon has raised $6.4M to tackle cybersecurity threats with an AI ‘triangulation’ approach Ctrl-labs, a New York startup announced that it has raised $28M in a funding round led by GV, Google’s venture capital arm Armorblox, a Sunnyvale, CA-based provider of a natural language understanding platform for cybersecurity, raised $16.5M Series A funding ViSenze, Singapore-based AI startup, has raised $20M in series C funding BlackBerry announces the acquisition of Cylance, a cybersecurity and AI firm To receive the Bits & Bytes to your inbox, subscribe to our Newsletter.
Sam Charrington: Today we're excited to continue the AI for the benefit of society series that we've partnered with Microsoft to bring to you. Today we're joined by Hanna Wallach principal researcher at Microsoft research. Hanna and I really dig into how bias and a lack of interpretability and transparency show up across machine learning. We discuss the role that human biases, even those that are inadvertent, play in tainting data, whether deployment of fair ML algorithms can actually be achieved in practice and much more. Along the way, Hannah points us to a ton of papers and resources to further explore the topic of fairness in ML. You'll definitely want to check out the show notes page for this episode, which you'll find at twimlai.com/talk/232. Before diving in I'd like to thank Microsoft for their support of the show and their sponsorship of this series. Microsoft is committed to ensuring the responsible development and use of AI and is empowering people around the world with this intelligent technology to help solve previously intractable societal challenges, spanning sustainability, accessibility and humanitarian action. Learn more about their plan at Microsoft.ai. Enjoy. Sam Charrington: [00:02:18] All right everyone, I am on the line with Hanna Wallack, Hanna is a principal researcher at Microsoft Research in New York City. Hanna, welcome to this week in Machine Learning and AI. Hanna Wallach:[00:00:11] Thanks, Sam. It's really awesome to be here. Sam Charrington: [00:00:14] It is a pleasure to have you on the show, and I'm really looking forward to this conversation. You are clearly very well known in the machine learning and AI space. Last year, you were the program chair at one of the largest conferences in the field, NeurIPS. In 2019, you'll be it's general chair. But for those who don't know about your background, tell us a little bit about how you got involved and started in ML and AI. Hanna Wallach:[00:00:48] Sure. Absolutely. So I am a machine learning researcher by training, as you might expect. I've been doing machine learning for about 17 years now. So since way before this stuff was even remotely fashionable, or popular, or cool, or whatever it is nowadays. In that time, we've really seen machine learning change a lot. It's sort of gone from this weirdo academic discipline only of interest to nerds like me, to something that's so mainstream that it's on billboards, it's in TV shows, and so on and so forth. It's been pretty incredible to see that shift over that time. I got into machine learning sort of by accident, I think that's often what happens. I had taken some undergrad classes on information theory and stuff like that, found that to be really interesting, but thought that I was probably going to go into human computer interaction research. But through a research assistantship during the summer between my undergrad degree and my Master's degree, I ended up discovering machine learning, and was completely blown away by it. I realized that this is what I wanted to do. I've been focusing on machine learning in various different forms since them. My PHD was specifically on Bayesian Latent Variable methods, typically for analyzing text and documents. So topic models, that kind of thing. But during my PHD, I really began to realize that I'm not particularly interested in analyzing documents for the sake of analyzing documents, I'm interested in analyzing documents because humans write documents to communicate with one another. It's really that underlying social process that I'm most interested in. So then during my postdoc, I started to shift direction from primarily looking at text and documents to thinking really about those social processes. So not just what are people saying, but also who’s interacting with whom, and thinking about machine learning methods for analyzing the structure and content of social processes in combination. I then dove into this much more when I got a faculty job, because I was hired as part of UMass Amherst’s Computational Social Science Initiative. So at that point I started focusing really in depth on this idea of using machine learning to study society. I established collaborations with a number of different social scientists, focusing on a number of different topics. Over the years, I've mostly ended up working with political scientists, and often study questions relating to government transparency, and still looking at sort of this whole idea of a social process consists of individuals, or groups of individuals interacting with one another, information that might be used in or arising from these interactions, and then the fact that these things might change over time. I often use one of these or two of these modalities, so structure, content, or dynamics, to learn about one or more of the other ones as well. As I continued to work in this space, I started to think more, not just about how we can use machine learning to study society, but the fact that machine learning is becoming much more prevalent within society. About four years ago, I started really thinking more about these issues of fairness, accountability, transparency, and ethics. It was a pretty natural fit for me to start moving in this direction. Not only was I already thinking about questions to do with people, but I've done a lot of diversity and inclusion work in my non research life. So I'm one of the co-founders of the Women in Machine Learning workshop, I also co-founded two organizations to get more women involved in free and open source software development. So issues related to fairness and stuff like that are really something that I tend to think about a lot in general. So I ended up making sort of this shift a little bit in my research focus. That's not to say that I don't still work on things to do with core computational social science, but increasingly my research is focusing on the ways that machine learning impacts society. So fairness, accountability, transparency, and ethics. Sam Charrington: [00:05:53] We will certainly dive deep into those topics. But before we do, you've mentioned a couple of times the term computational social science. That's not a term that I've heard before, I don't believe. Can you ... Is that ... I guess I'm curious how established that is as a field, or is it something that is specific to that institution that you were working at? Hanna Wallach:[00:06:19] Sure. So this is really a discipline that started emerging in maybe sort of 2009, 2008, that kind of time. By 2010, which is when I was hired at UMass, it really was sort of its own little emerging field with a bunch of different computer scientists and social scientists really committed to pushing this forward as a discipline. The basic idea, of course, is you know social scientists study society and social processes, and they've been doing this for decades. But often using qualitative methods. But of course, as more of society moves towards digitized interaction methods, and online platforms, and other kinds of things like that, we're beginning to see much more of this sort of digital data. At the same time, we've seen this massive increase, as I've said, in the popularity of machine learning and machine learning methods that are really suitable for analyzing data about social processes in society. So computational social science is really the sort of emerging discipline at the intersection of computer science, the social sciences, and statistics as well. The real goal is to develop and use computational and statistical methods, so machine learning methods, for example, to understand society, social processes, and answer questions that are substantively interesting to social scientists. At this point, there are people at a number of different institutions focusing on computational social science. So yes, of course, UMass, as I've mentioned before. But also Northwestern, Northeastern, University of Washington, in fact have been doing this for years, and of course, Microsoft Research is no exception in this regard. Part of the reason why I joined Microsoft Research was that we have a truly exceptional group of researchers in computational social science here. That was really very appealing to me. Sam Charrington: [00:08:31] Oh, awesome, awesome. So you talked about your transition to focusing on fairness, accountability, transparency, and ethics in machine learning and AI. Can you talk a little bit about what those terms mean to you, and your broader research? Hanna Wallach:[00:08:54] Yeah, absolutely. So I think the bulk of my own research in that sort of broad umbrella falls within two categories. So the first is fairness, and the second is what I would sort of describe as interpretability of machine learning. So in that fairness bucket, really, much of my research is focused on studying the ways in which machine learning can inadvertently harm or disadvantage groups of people or individual people in various different, usually unintended, ways. I'm interested in understanding not only why this occurs, but what we can do to mitigate it, and what we can do to really develop fairer machine learning systems. So systems that don't inadvertently harm individuals or groups of people. In the intelligibility bucket, so there, I'm really interested in how we can make machine learning methods that are interpretable to humans in different roles for particular purposes. There has been a lot of research in this area over the past few years, focusing on oftentimes developing simple machine learning models that can be easily understood my humans simply by exposing their internals, and also on developing methods that can generate explanations for either entire models or the predictions of models. Those models might be potentially very complex. My own work typically focuses really more on the human side of intelligibility, so what is it that might make a system intelligible or interpretable to a human trying to carry out some particular task? I do a lot of human subjects experiments to really try and understand some of those questions with a variety of different folks here at Microsoft Research. Sam Charrington: [00:11:01] On the topic of fairness and avoiding inadvertent harm, there are a lot of examples that I think many of our audience would be familiar with, the ProPublica work into the use of machine learning systems in the justice process, and others. Are there examples that come to mind for you that are maybe less well known, but that illustrate for you the importance of that type of work? Hanna Wallach:[00:11:36] Yes. So when I typically think about this space, I tend to think about this in terms of the types of different harms that can occur. I have some work with Aaron Shapiro, Solon Barocas, and Kate Crawford on the different types of harms that can occur. Kate Crawford actually did a fantastic job of talking about this work in her invited talk at the NeurIPS conference in 2017. But to give you some concrete examples, so many of the examples that people are most familiar with are these scenarios as you mentioned where machine learning systems are being used to allocate or withhold resources, opportunities, or information. So one example would be of the compass recidivism prediction system being used to make decisions about whether people should be released on bail. Another example would be from a story, a news story that happened in November where Amazon revealed that it had abandoned an automating hiring tool because of fears that the tool would reinforce existing gender imbalances in the workplace. So there you're looking at these existing gender imbalances, and seeing that this tool is perhaps withholding opportunities from women in the tech industry in an undesirable way. There was a lot of coverage about this very sensible decision that Amazon made to abandon that tool. Some other examples would be more related to quality of service issues even when no resources or opportunities are being allocated or withheld. So a great example there would be the work that Joy Buolamwini and Timnit Gebru did focusing on the ways that commercial gender classification systems might perform less well, so less accurate, for certain groups of people. Another example you might think of, let's say, speech recognition systems. You can imagine systems that work really well for people with certain types of accents, or for people with voices at certain pitches. But less well for other people, certainly for me. I'm British, and I have a lisp. I know that oftentimes speech recognition systems don't do a great job of understanding what I'm saying. This is much less of an issue nowadays, but you know, five or so years ago, this was really frustrating for me. Some other examples are things like stereotyping. So here the most famous example of stereotyping in machine learning is Latanya Sweeney's work from 2013, where she showed that advertisements that were being shown on web searches for different people's names would more typically be advertisements that reinforced stereotypes about black criminality when people searched for sort of black sounding names, than when people searched for stereotypically white sounding names. So there the issue is this sort of reinforcement of these negative stereotypes within society by the placement of particular ads for particular different types of searches. So another example of stereotyping in machine learning would be the work done by Joanna Bryson and others at Princeton University on stereotypes in word embeddings. There has also been some similar work done by my colleague, Adam Kalai, here at Microsoft Research. Both of these groups of researchers showed that if you train word embedding methods, so things like Word2Vec, that try and identify a low dimensional embedding for word types based on the surrounding words that are typically used in conjunction with them in sentences, you end up seeing that these word embeddings reinforce existing gender stereotypes. For example, so the word man ends up being embedded much closer to programmer and similarly woman ends up being embedded much closer to homemaker than vice versa. So that would be another kind of example. Then we see other kinds of examples of unfairness and harms within machine learning as well. So for example, over and under representation. So Matthew Kay and some others at the University of Washington have this really nice paper where they show that for professions with an equal or higher percentage of men than women, the image search results are much more heavily skewed towards images of men than reality. So that would be another kind of example. What you'll see from all of these examples that I've mentioned is that they affect a really wide range of systems and types of machine learning applications. The types of harms or unfairness that might occur are also pretty wide ranging as well, going from yes, sure, allocational withholding of resources, opportunities of information, but moving beyond that to stereotyping and representation and so on. Sam Charrington: [00:17:02] So often when thinking about fairness and bias in machine learning and the types of harm that can come about when unfair systems are developed, the kind of all roads lead back to the data itself, and the biases that are inherent in that data. Given that machine learning and AI is so dependent on data, and often much of the data that we have is biased, what can we do about that, and what are the kinds of things that your research is exploring to help us address these issues? Hanna Wallach:[00:17:41] Absolutely. Yeah, so you've hit on a really important point there, which is that in a lot of the sort of public discourse about fairness in machine learning, you have people making comments about algorithms being unfair, or algorithms being biased. Really, I think this misses some of the most fundamental points about why this is such a challenging landscape. So I want to just emphasize a couple of those here in response to your question. So the first thing is that machine learning is all about taking data, finding patterns in that data, and then often training systems to mimic the decisions that are represented within that data. Of course, we know that the society we live in is not fair. It is biased. There are structural disadvantages and discrimination all over the place. So it's pretty inevitable that if you take data from a society like that, and then train machine learning systems to find patterns expressed in that data, and to mimic the decisions made within that society, you will necessarily reproduce those structural disadvantages, that bias, that discrimination, and so on. So you're absolutely right that a lot of this does indeed come from data. But the other point that I want to make is that it's not just from data and it's not from algorithms per se. The issue is really, as I see it, and as my colleagues here at Microsoft Research see it, the issue is really about people and people's decisions at every point in that machine learning life cycle. So I've done some work on this with a number of people here at Microsoft, most recently I put together a tutorial on machine learning and fairness in collaboration with my colleague Jenn Wortman Vaughan. The way we really think about this is that you have to prioritize fairness at every stage of that machine learning lifecycle. You can't think about it as an afterthought. The reason why is that decisions that we make at every stage can fundamentally impact whether or not a system treats people fairly. So I think it's really important when we're thinking about fairness in machine learning to not just sort of make general statements about algorithms being unfair, or systems being unfair, but really to go back to those particular points and think about how unfairness can kind of creep in at any one of those stages. That might be as early as the task definition stage, so when you're sitting down to develop some machine learning system, it's really important to ask the question of who does this take power from, and who does this give power to? The answers to that question often reveal a lot about whether or not that technology should even be built in this first place. Sometimes the answer to addressing fairness in machine learning is simply, no, we should not be building that technology. But there are all kinds of other decisions and assumptions at other points in that machine learning life cycle as well. So the way we typically like to think about it is that a machine learning model, or method, is effectively an abstraction of the world. In making that abstraction, you necessarily have to make a bunch of assumptions about the world. Some of these assumptions will be more or less justified, some of these assumptions will be better fit for the reality than others. But if you're not thinking really carefully about what those assumptions are, when you are developing your machine learning system, this is one of the most obvious places that you can inadvertently end up introducing bias or unfairness. Sam Charrington: [00:21:42] Can you give us some concrete examples there? Hanna Wallach:[00:21:45] Yeah. Absolutely. One common example of this form would be stuff to do with teacher evaluation. So there have been a couple of high profile lawsuits about this kind of thing. But I think it illustrates the point nicely. So it's common for teachers to be evaluated based on a number of different factors, but including their student's test scores. Indeed, many of the methods that have been developed to analyze teacher quality using machine learning systems have really focused predominantly on student's test scores. But this assumes that student's test scores are in fact an accurate predictor of teacher quality. This isn't actually always the case. A good teacher should obviously do more than test prep. So any system that really looks just at test scores when trying to predict teacher quality is going to do a bad job of capturing these other properties. So that would be one example. Another example involves predictive policing. So a predictive policing system might make predictions about where crimes will be committed based on historic arrest data. But an implicit assumption here is that the number of arrests in an area is an accurate proxy for the amount of crime. It doesn't take into account the fact that policing practices can be racially biased, or there might be historic over policing in less affluent neighborhoods. I'll give you another example as well. So many machine learning methods work by defining some objective function, and then learning the parameters of the model so as to optimize that objective function. So for example, if you define an objective function in the context of, let's say, a search engine, that prioritizes user clicks, you may end up with search results that don't necessarily reflect what you want them to. This is because users may click on certain types of search results over other search results, and that might not be reflective of what you want to be showing when you show users a page of search results. So as a concrete example, many search engines, if you search for the word boy, you see a bunch of pictures of male children. But if you search for the world girl, you see a bunch of pictures of grown up women. These are pretty different to each other. This probably comes from the fact that search engines typically optimize for clicks among other metrics. This really shows how hard it can be to even address these kinds of fairness issues, because in different circumstances the word girl may be referring to a child or a woman, and users search for this term with different intentions. In this particular example, as you can probably imagine, one of these intentions might be more prevalent than the other. Sam Charrington: [00:24:57] You've identified lots of opportunities for pitfalls in the process of fielding systems going all the way back to the way you define your system, and state your intentions, and formulate the problem that you're going after. Beyond simply being mindful of the potential for bias and unfairness and just saying simply, I realize that that's not simple, that it's work to be mindful of this. But beyond that, what does your research offer in terms of how to overcome these kinds of issues? Hanna Wallach:[00:25:43] Yeah, this is a really good question. It's a question that I get a lot from people is what can we actually do in practice? There are a number of things that can be done in practice. Not all of them are easy things to do, as you say. So one of the most important things is that issues relating to fairness in machine learning are fundamentally socio-technical. They're not going to be addressed by computer scientists or developers alone. It's really important to involve a range of diverse stakeholders in these conversations when we're developing machine learning systems so that we have a bunch of different perspectives represented. So moving beyond just involving computer scientists and developers on teams, it's really important that we involve social scientists, lawyers, policy makers, end users, people who are going to be affected or impacted by these systems down the line, and so on and so forth. That's one really concrete thing you can do. There is a project that came out of the University of Washington called the Diverse Voices project. It provides a way of getting feedback from stakeholders on tech policy documents. It's really good, they have a great how-to guide that I definitely recommend checking out. But many of the things that they recommend doing there, you can also think about when you're trying to get feedback from stakeholders on, let's say, the definition of a machine learning system. So that task definition stage. Some of these could even potentially be expanded to consider other stages of that machine learning pipeline as well. So there are a number of things that you can do at every single stage of the machine learning pipeline. In fact, this tutorial that I mentioned earlier, that I worked on with my colleague Jenn Wortman Vaughan actually has guidelines for every single step of the pipeline. But to give you examples, here are some things, for instance, that you can do when you're selecting a data source. So for example, it's really important to think critically before even collecting any data. It's often very tempting to say, oh, there is already some dataset that I can probably repurpose for this. But it's really important to take that step back and before immediately acting based on availability to actually think about whether that data source is appropriate for the task you want to use it for. There is a number of reasons why it might not be, it could be to do with biases and the data source selection process. There might be societal biases present in the data source itself. It might be that the data source doesn't match the deployment context, that's a really important one that people really should be taking into account. Where are you thinking about deploying your machine learning system and does the data you have availability for training and development match that context? As another example, still related to data, it's really important to think about biases in the technology used to collect data. So as an example here, there was an app released in the city of Boston back in 2011, I think it was called Street Bump. The way it worked is it used iPhone data and specifically the sort of positional movement of iPhones as people were driving around, to gather data on where there were potholes that should be repaired by the city. But pretty quickly, the city of Boston figured out that this actually wasn't a great way to get that kind of data, because back in 2011, the people who had iPhones were typically quite affluent and only lived in certain neighborhoods. So that would be an example about thinking carefully about the technology even used to collect data. It's also really important to make sure that there is sufficient representation of different subpopulations who might be ultimately using or affected by your machine learning system to make sure that you really do have good representation overall. Moving onto things like the model, there is a number of different things that you can do there, for instance, as well. So in the case of a model, I mentioned a bit about assumptions being really important. It's great to really clearly define all of your assumptions about the model, and then to question whether there might be any explicit or implicit biases present in those assumptions. That's a really important thing to do when you're thinking about choosing any particular model or model structure. You could even, in some scenarios, include some quantitative notion of parity, for instance, in your model objective function as well. There have been a number of academic papers that take that approach in the literature over the past few years. Sam Charrington: [00:30:43] Can you give an example of that last point? Hanna Wallach:[00:30:46] Yeah, sure. So imagine you have some kind of a machine learning classifier that's going to make decisions of the form, let's say loan, no loan, hire, no hire, bail, no bail, and so on. The way we normally develop these classifiers is to take a bunch of labeled data, so data points labeled with, let's say, loan, no loan, and then we train a model, a machine learning model, a classifier, to optimize accuracy on that training data. So you end up setting the parameters of that model such that it does a good job of accurately predicting those labels from the training data. So the objective function that's typically used is one that considers, usually, only accuracy. But something else you can do is define some quantitative definition of fairness, some quantitative fairness metric, and then try to simultaneously optimize both of these objectives. So classifier accuracy and whatever your chosen fairness metric is. There is a number of these different quantitative metrics that have been proposed out there that all typically are looking at parity across groups of some sort. So I think it's really important to remember that even though these are often referred to as fairness metrics, they're really parity metrics. They neglect many of the really important other aspects of fairness, like justice, and due process, and so on and so forth. But, it is absolutely possible to take these parity metrics and to incorporate them into the objective function of, say, a classifier, and then to try and prioritize satisfying and optimizing that fairness metric at the same time as optimizing classifier accuracy. There have been a number of papers that focus on this kind of approach, many of them will focus on one particular type of classifier, so like SBMs, or neural networks, or something like that, and one particular fairness metric. There are a bunch of standard fairness metrics that people like to look at. I actually have some work with some colleagues here at Microsoft where we have a slightly more general way of doing this that will work with many different types of classifiers, and many different types of fairness metrics. So there is no reason to start again from scratch if you want to switch to a different classifier or a different fairness metric. We actually have some open source Python code available on GitHub that implements our approach. Sam Charrington: [00:33:27] So you've talked about the idea that kind of people are fundamentally the root of the issue, that these are societal issues, that they're not going to be solved by technological advancements or processes alone. At the same time, there has been a ton of new research happening in this area by folks in your group and elsewhere. Does that lead to a mismatch between what's happening in academia and on the technical side with the way this stuff actually gets put into practice? Hanna Wallach:[00:34:11] That's an awesome question. The simple answer is yes. This actually relates to one of my most recent research projects, which I'm really, really excited about. So last summer, some of my colleagues and I, specifically Jenn Wortman Vaughan, Miro Dudík, and Hal Daumé, along with our incredible intern, Ken Holstein from CMU, conducted the first systematic investigation of industry practitioner's challenges and needs for support relating to developing fairer machine learning systems. This work actually came about because we were thinking about ways of developing interfaces for that fair classification work that I mentioned a minute ago. Through a number of conversations with people in different product groups here at Microsoft and people at other companies, we realized that these kinds of classification tasks, while they're incredibly well studied within the fairness and machine learning literature, are maybe less common than we had thought in practice within industry. So that got us thinking about whether there might be, actually, a mismatch between the academic literature on fairness and machine learning, and practitioner's actual needs. What we ended up doing was this super interesting research project that was a pretty different style of research for me and for my colleagues. So I am a machine learning researcher, so is Jen, so is Howell, and so is Miro. Ken, our intern, is an HCI researcher. What we ended up doing was this qualitative HCI work to really understand what it is that practitioners are facing in reality when they try and develop fairer machine learning systems. To do this, we conducted semi structured interviews with 35 people, spanning 25 different teams, in 10 different companies. These people were in a number of different roles, ranging from social scientist, data labeler, product manager, program manager, to data scientists and researcher. Where possible, we tried to interview multiple people from the same team in order to get a variety of perspectives on that team's challenges and needs for support. We then took our findings from these interviews, and developed a survey which was then completed by another 267 industry practitioners, again, in a variety of different companies and a variety of different roles. What we found, at a high level, was that yes, there is a mismatch between the academic literature on fairness in machine learning and industry practitioner's actual challenges and needs for support on the ground. So firstly, much of the machine learning literature on fairness focuses on classification, and on supervised machine learning methods. In fact, what we found is that industry practitioners are grappling with fairness issues in a much wider range of applications beyond classification or prediction scenarios. In fact, many times the systems they're dealing with involve these really rich, complex interactions between users and the system. So for example, chat bots, or adaptive tutoring, or personalized retail, and so on and so forth. So as a result, they often struggle to use existing fairness research from the literature, because the things that they're facing are much less amenable to these quantitative fairness metrics. Indeed, very few teams have fairness KPIs or automated tests that they can use within their domain. One of the other things that we found is that the machine learning literature typically assumes access to sensitive attributes like race or gender, for the purpose of auditing systems for fairness. But in practice, many teams have no access to these kinds of attributes, and certainly not at the level of individuals. So they express needs for support in detecting biases and unfairness with access only to core screened, partial, or indirect information. This is something that we've seen much less focus on in the academic literature. Sam Charrington: [00:38:41] That last point is an interesting one, and one that I've brought up on the podcast previously. In many of the places you might want to use an approach like that, it's forbidden, from a regulatory perspective, to use the information that you want to use in your classifier to achieve fairness in any part of the decisioning process. Hanna Wallach:[00:39:04] Exactly. This sets up this really difficult tension between doing the right thing in practice from a machine learning perspective, and what is legally allowed. I'm actually working on a paper at the moment with a lawyer, Zack Conard, actually, a law student, Zack Conard, at Stanford University, on exactly this issue. This challenge between what you want to do from a machine learning perspective, and what you are required to do from a legal perspective, based on humans and how humans behave, and hundreds of years of law in that realm. It's really challenging, and there is this complicated trade off there that we really need to be thinking about. Sam Charrington: [00:39:48] It does make me wonder if techniques like or analogous to a differential privacy or something like that could be used to provide a regulatorily acceptable way to access protected attributes, so that they can be incorporated into algorithms like this. Hanna Wallach:[00:40:07] Yeah, so there was some work on exactly this kind of topic at the FAT ML Workshop colocated with ICML last year. This work was proposing the use of encryption and such like in order to collect and make available such information, but in a way that users would feel as if their privacy was being respected, and so that people who wanted to use that information would be able to use it for purposes such as auditing. I think that's a really promising approach, although there is obviously a bunch of non trivial challenges involved in thinking about how you might make that a reality. It's a really complicated landscape. But definitely one that's worth thinking about. Sam Charrington: [00:40:54] Was there a third area that you were about to mention? Hanna Wallach:[00:40:58] Yeah, so one of the main themes that we found in our work studying industry practitioners is a real mismatch between the focus on different points in the machine learning life cycle. So the machine learning literature typically assumes no agency over data collection. This makes sense, right? If you're a machine learning academic, you typically work with standard data sets that have been collected and made available for years. You don't typically think about having agency over that data collection process. But of course, in industry, that's exactly where practitioners often do have the most control. They are in charge of that data collection or data curation process, and in contrast, they often have much less control over the methods or models themselves, which often are embedded within much bigger systems. So it's much harder to intervene from a perspective of fairness with the models than it is with the data. We found that really interesting, this sort of difference in emphasis between models versus data in these different groups of people. Of course, many practitioners voiced needs for support in figuring out how to leverage that sort of agency over data collection to create fairer data sets for use in developing their systems. Sam Charrington: [00:42:20] So you mentioned the FAT ML workshop. I'm wondering as we come to a close, if there are any resources, events, pointers, I'm sure there are tons of things that you'd love to point people at. But what are your top three or four things that you would suggest people take a look at as they're trying to wrap their heads around this area, and how to either have an impact as a researcher, or how to make good use of it as a practitioner? Hanna Wallach:[00:42:55] Yeah. Absolutely. So there are a number of different places with resources to learn more about this kind of stuff. So first, I've mentioned a couple of times, this tutorial that I put together with Jen Waltman Vahn, that will be available publicly online very soon. It is in fact being broadcast next week, so it should be up by the time this podcast goes live. So I would definitely recommend that people check that out to really get a sense of how we, at Microsoft, are thinking about fairness in machine learning. Then moving beyond that, and thinking specifically on more of the academic literature, the FAT ML workshop maintains a list of resources on the workshop website. That's again, another really, really great place to look for things to read about this topic. The FAT Star conference is a relatively newly created conference on fairness accountability and transparency, not just in machine learning, but across all of computer science and computational systems. Again, there, I recommend checking out the website to see the publications that were there last year, and also the publications that will be there this year. There is a number of really interesting papers that I haven't read yet, but I'm super excited to read, being presented at this year's conference. That conference also has tutorials on a range of different subjects. So it's also worth looking at the various different tutorials there. So at last year's conference, Arvind Narayanan presented this amazing tutorial on quantitative fairness metrics, and why they're not a one size fits all solution, why there are trade offs between them, why you can't just sort of take one of these definitions, optimize for it, and call it quits. So I definitely recommend checking that out. Some other places that are worth looking for resources on this, the AI Now Institute, which was co-founded by Kate Crawford, who is also here at Microsoft Research, and Meredith Whitaker, who is also at Google, also has some incredibly awesome resources. They've put out a number of white papers and reports over the past couple of years that really get at the crux of why these are complicated socio-technical issues. So I strongly recommend reading pretty much everything that they put out. I would also recommend checking out some of the material put out by Data and Society, which is also an organization here in New York, led by Danah Boyd, and they too have a number of really interesting things that you can read about these different topics. Then the final thing I want to emphasize is the Partnership on AI, which was formed a couple of years ago by Microsoft and a bunch of other companies working in this space of AI to really foster cross company collaboration and moving forward in this space when thinking about these complicated societal issues that relate to AI and machine learning. So the partnership has been really ramping up over the past couple of years, and they also have some good resources that are worth checking out. Sam Charrington: [00:46:22] Oh, that's great. That is a great list that will keep us busy for a while. Hanna, thank you so much for taking the time to chat with us. It was really a great conversation, and I appreciate it. Hanna Wallach:[00:46:34] No problem. Thank you for having me. This has been really great. Sam Charrington: [00:46:38] Awesome, thank you.
Bits & Bytes Google introduces Feast: an open source feature store for ML. GO-JEK and Google announced the release of Feast that allows teams to manage, store, and discover features to use for ML projects. Amazon CEO, Jeff Bezos, is launching a new conference dedicated to AI. The new AI specific conference, re:MARS, will be held in Las Vegas between June 4th and 7th this year. Should be an interesting event. Mayo Clinic research uses AI for early detection of silent heart disease. Mayo Clinic study finds that applying AI to electrocardiogram (EKG) test results offers a simple, affordable early indicator of asymptomatic left ventricular dysfunction, a precursor to heart failure. Microsoft announces ML.NET 0.9. Microsoft’s open-source and cross-platform ML framework, ML.NET, was updated to version 0.9. New and updated features focus on expanded model interpretability capabilities, GPU support for ONNX models, new Visual Studio project templates in preview, and more. Intel and Alibaba team up on new AI-powered 3D athlete tracking technology. At CES 2019, Intel and Alibaba announced the new collaboration to develop AI-powered 3D athlete tracking technology to be deployed at the 2020 Olympic Games. Baidu unveils open source edge computing platform and AI boards. OpenEdge, an open source computing platform enables developers to build edge applications with more flexibility. The company also announced new AI hardware development platforms BIE-AI-Box with Intel for in-car video analysis, and BIE-AI-Board, co-developed with NXP, for object classification. Qualcomm shows off an AI-equipped car cockpit at CES 2019. At CES, Qualcomm introduced the third generation of its Snapdragon Automotive Cockpit Platforms. The upgraded version covers various aspects of the in-car experience from voice-activated interfaces to traditional navigation systems. Their keynote featured a nice demo of “pedestrian intent prediction” based on various computer vision techniques including object detection and pose estimation. Dollars & Sense Fractal Analytics, an AI firm based in India which, among other things, owns Qure.ai (see my interview with CEO Prashant Warier), raised $200M from private equity investor Apax Standard has acquired Explorer.ai, a mapping and computer vision start-up Israeli AI-based object recognition company, AnyVision, has raised $15M from Lightspeed Venture Partners Spell, an NYC-based AI and ML platform startup raised $15M HyperScience, an edge ML company has raised $30M WeRide.ai, a Chinese autonomous driving technology specialist raised series A funding from SenseTime Technology and ABC International in series A funding UK-based Exscientia, AI-driven drug discovery company, has raised $26 million CrowdAnalytix raises $40 million in strategic investment for crowdsourced AI algorithms To receive the Bits & Bytes to your inbox, subscribe to our Newsletter.