We could not locate the page you were looking for.

Below we have generated a list of search results based on the page you were trying to reach.

404 Error
Since Sep 2016, I am a University Lecturer (equivalent to US Assistant Professor) in Machine Learning at the Department of Engineering in the University of Cambridge, UK. I was before a postdoctoral fellow in the Harvard Intelligent Probabilistic Systems group at the School of Engineering and Applied Sciencies of Harvard University, working with the group leader Prof. Ryan Adams. This position was funded through a post-doctoral fellowship given by the Rafael del Pino Foundation. Before that, I was a postdoctoral research associate in the Machine Learning Group at the Department of Engineering in the University of Cambridge (UK) from June 2011 to August 2014, working with Prof. Zoubin Ghahramani. During my first two years in Cambridge I worked in a collaboration project with the Indian multinational company Infosys Technologies. I also spent two weeks giving lectures on Bayesian Machine Learning at Charles University in Prague (Czech Republic). From December 2010 to May 2011, I was a teaching assistant at the Computer Science Department in Universidad Autónoma de Madrid (Spain), where I completed my Ph.D. and M.Phil. in Computer Science in December 2010 and June 2007, respectively. I also obtained a B.Sc. in Computer Science from this institution in June 2004, with a special prize to the best academic record on graduation. My research revolves around model based machine learning with a focus on probabilistic learning techniques and with a particular interest on Bayesian optimization, matrix factorization methods, copulas, Gaussian processes and sparse linear models. A general feature of my work is also an emphasis on fast methods for approximate Bayesian inference that scale to large datasets. The results of my research have been published at top machine learning journals (Journal of Machine Learning Research) and conferences (NIPS and ICML).
Dr. Sameer Singh is an Associate Professor of Computer Science at the University of California, Irvine (UCI). He is working primarily on robustness and interpretability of machine learning algorithms, along with models that reason with text and structure for natural language processing. Sameer was a postdoctoral researcher at the University of Washington and received his PhD from the University of Massachusetts, Amherst, during which he interned at Microsoft Research, Google Research, and Yahoo! Labs. He has received the NSF CAREER award, selected as a DARPA Riser, UCI ICS Mid-Career Excellence in research award, and the Hellman and the Noyce Faculty Fellowships. His group has received funding from Allen Institute for AI, Amazon, NSF, DARPA, Adobe Research, Hasso Plattner Institute, NEC, Base 11, and FICO. Sameer has published extensively at machine learning and natural language processing venues, including paper awards at KDD 2016, ACL 2018, EMNLP 2019, AKBC 2020, and ACL 2020.
Dr. Priestley is a Professor of Statistics and Data Science. Since 2004, she has served as the Associate Dean of the Graduate College and as the Executive Director of the Analytics and Data Science Institute at Kennesaw State University. In 2012, the SAS Institute recognized Dr. Priestley as the 2012 Distinguished Statistics Professor of the Year. She served as the 2012 and 2015 Co-Chair of the National Analytics Conference. Datanami recognized Dr. Priestley as one of the top 12 "Data Scientists to Watch in 2016." She architected the first Ph.D. Program in Data Science, which launched in February 2015. Dr. Priestley has been a featured international speaker at The World Statistical Congress, The South African Statistical Association, SAS Global Forum, Big Data Week, Technology Association of Georgia, Data Science ATL, The Atlanta CEO Council, Predictive Analytics World, INFORMS and dozens of academic and corporate conferences addressing issues related to the evolution of data science. She has authored dozens of articles on Binary Classification, Risk Modeling, Sampling, Statistical Methodologies for Problem Solving and Applications of Big Data Analytics. Prior to receiving a Ph.D. in Statistics, Dr. Priestley worked in the Financial Services industry for 11 years. Her positions included Vice President of Business Development for VISA EU in London, as well as for MasterCard US and an analytical consultant with Accenture's strategic services group. Dr. Priestley received a Ph.D. from Georgia State University, an MBA from The Pennsylvania State University, and a BS from Georgia Institute of Technology.
There are few things I love more than cuddling up with an exciting new book. There are always more things I want to learn than time I have in the day, and I think books are such a fun, long-form way of engaging (one where I won’t be tempted to check Twitter partway through). This book roundup is a selection from the last few years of TWIML guests, counting only the ones related to ML/AI published in the past 10 years. We hope that some of their insights are useful to you! If you liked their book or want to hear more about them before taking the leap into longform writing, check out the accompanying podcast episode (linked on the guest’s name). (Note: These links are affiliate links, which means that ordering through them helps support our show!) Adversarial ML Generative Adversarial Learning: Architectures and Applications (2022), Jürgen Schmidhuber AI Ethics Sex, Race, and Robots: How to Be Human in the Age of AI (2019), Ayanna Howard Ethics and Data Science (2018), Hilary Mason AI Sci-Fi AI 2041: Ten Visions for Our Future (2021), Kai-Fu Lee AI Analysis AI Superpowers: China, Silicon Valley, And The New World Order (2018), Kai-Fu Lee Rebooting AI: Building Artificial Intelligence We Can Trust (2019), Gary Marcus Artificial Unintelligence: How Computers Misunderstand the World (The MIT Press) (2019), Meredith Broussard Complexity: A Guided Tour (2011), Melanie Mitchell Artificial Intelligence: A Guide for Thinking Humans (2019), Melanie Mitchell Career Insights My Journey into AI (2018), Kai-Fu Lee Build a Career in Data Science (2020), Jacqueline Nolis Computational Neuroscience The Computational Brain (2016), Terrence Sejnowski Computer Vision Large-Scale Visual Geo-Localization (Advances in Computer Vision and Pattern Recognition) (2016), Amir Zamir Image Understanding using Sparse Representations (2014), Pavan Turaga Visual Attributes (Advances in Computer Vision and Pattern Recognition) (2017), Devi Parikh Crowdsourcing in Computer Vision (Foundations and Trends(r) in Computer Graphics and Vision) (2016), Adriana Kovashka Riemannian Computing in Computer Vision (2015), Pavan Turaga Databases Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases (2021), Xin Luna Dong Big Data Integration (Synthesis Lectures on Data Management) (2015), Xin Luna Dong Deep Learning The Deep Learning Revolution (2016), Terrence Sejnowski Dive into Deep Learning (2021), Zachary Lipton Introduction to Machine Learning A Course in Machine Learning (2020), Hal Daume III Approaching (Almost) Any Machine Learning Problem (2020), Abhishek Thakur Building Machine Learning Powered Applications: Going from Idea to Product (2020), Emmanuel Ameisen ML Organization Data Driven (2015), Hilary Mason The AI Organization: Learn from Real Companies and Microsoft’s Journey How to Redefine Your Organization with AI (2019), David Carmona MLOps Effective Data Science Infrastructure: How to make data scientists productive (2022), Ville Tuulos Model Specifics An Introduction to Variational Autoencoders (Foundations and Trends(r) in Machine Learning) (2019), Max Welling NLP Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics (2013), Emily M. Bender Robotics What to Expect When You’re Expecting Robots (2021), Julie Shah The New Breed: What Our History with Animals Reveals about Our Future with Robots (2021), Kate Darling Software How To Kernel-based Approximation Methods Using Matlab (2015), Michael McCourt
Sam Charrington: Hey, what’s up everyone! We are just a week away from kicking off TWIMLfest, and I’m super excited to share a rundown of what we’ve got in store for week 1. On deck are the Codenames Bot Competition kickoff, an Accessibility and Computer Vision panel, the first of our Wellness Wednesdays sessions featuring meditation and yoga, as well as the first block of our Unconference Sessions proposed and delivered by folks like you. The leaderboard currently includes sessions on Sampling vs Profiling for Data Logging, Deep Learning for Time Series in Industry, and Machine Learning for Sustainable Agriculture. You can check out and vote on the current proposals or submit your own by visiting https://twimlai.com/twimlfest/vote/. And of course, we’ll have a couple of amazing keynote interviews that we’ll be unveiling shortly! As if great content isn’t reason enough to get registered for TWIMLcon, by popular demand we are extending our TWIMLfest SWAG BAG giveaway by just a few more days! Everyone who registers for TWIMLfest between now and Wednesday October 7th, will be automatically entered into a drawing for one of five TWIMLfest SWAG BAGs, including a mug, t-shirt, and stickers. Registration and all the action takes place at twimlfest.com, so if you have not registered yet, be sure to jump over and do it now! We’ll wait here for you. Before we jump into the interview, I’d like to take a moment to thank Microsoft for their support for the show, and their sponsorship of this series of episodes highlighting just a few of the fundamental innovations behind Azure Cognitive Services. Cognitive Services is a portfolio of domain-specific capabilities that brings AI within the reach of every developer—without requiring machine-learning expertise. All it takes is an API call to embed the ability to see, hear, speak, search, understand, and accelerate decision-making into your apps. Visit aka.ms/cognitive to learn how customers like Volkswagen, Uber, and the BBC have used Azure Cognitive Services to embed services like real-time translation, facial recognition, and natural language understanding to create robust and intelligent user experiences in their apps. While you’re there, you can take advantage of the $200 credit to start building your own intelligent applications when you open an Azure Free Account. That link again is aka.ms/cognitive. And now, on to the show! Sam Charrington: [00:03:14] All right, everyone. I am here with Cha Zhang. Cha is a partner Engineering Manager with Microsoft Cloud and AI. Cha, welcome to the TWIML AI podcast. Cha Zhang: [00:03:25] Thank you, Sam. Nice to meet you. Sam Charrington: [00:03:27] Great to meet you as well. Before we dive in, I’d love to learn a little bit about your background. Tell us how you came to work in computer vision. Cha Zhang: [00:03:38] Sure. Sure. I actually have been at Microsoft for 16 years. I joined Microsoft originally as a researcher at Microsoft Research. I was there for 12 years. My research was primarily applying machine learning to image, audio, video; all of these different applications. I started 2016. I joined the product side, and currently I’m working as an Engineering Manager, and my primary focus is on document understanding. Sam Charrington: [00:04:11] Awesome. Awesome. So, we will be focusing quite a bit on OCR and some of your work in that space, and, you know, I think people often think of OCR as a, you know, a solve problem, right? It’s, you know, we’ve been scanning documents and extracting texts out of those documents for a long time. Obviously the advent of deep learning, you know, changes things, but I’d love to get the conversation started by having you share a little bit about, you know, what’s new and interesting in the space. How has it changed over the past few years? Cha Zhang: [00:04:50] Sure. Actually, it wasn’t very long ago, when people talk about OCR, what comes out of mind was firstly scan documents. In many people’s eyes, OCR for scan documents is sort of a solve the problem. More likely, I think there’s two major development. One is with a mobile first kind of word where everybody now have mobile phones and they take pictures everywhere. So there’s a lot of demand to do a text recognition out of images in the wild, and that certainly is a much more challenging problem than scan documents, and then technically, because of the advances in deep learning, we have realized that with deep learning, we can do OCR at a different level. We can make it a lot more accurate than before, and we can solve OCR problem in kind of imaging the wild scenario. So I think it started at 2000, early 2010 ish. I think there’s a lot of big advent advances in this area, and now we’re seeing basically OCR becomes something really that works. You know, people don’t need to worry about quality, etcetera, just mostly works. Sam Charrington: [00:06:08] Can you talk a little bit more about the challenges that arise when you’re trying to do OCR in the wild? Cha Zhang: [00:06:16] Of course. I think for documents, usually it’s white background and black text, but for images in the wild, essentially it’s a photo. So in the photo, there’s a lot of variations in the text. First there’s a huge scale variation, so some texts, if you capture a picture of a street, there might be some store name that are super big, and then there are some tiny texts that’s hard to see. So there’s a big variation in scale of the text and the aspect ratio of these texts can be a really long cause text string can be very long compared to regular objects, like a cat or a dog. Because of the mobile capture scenario, usually it’s difficult to integrate close these texts by and access a line of rectangles. For example, you’re not, there might be perspective just portions of the text when the camera sees them. The background in the image in the wild is much more complicated than the typical white background you see in scan documents, and some of these backgrounds, such as fences, breaks, and stripes, are even though they appear quite simple for human beings, but think of like fences can be a perfect, a bunch of ones, you know, on the street sitting there and they look very similar to two characters. So those create additional challenges, and I think one of the biggest one, I think technically for OCR, that’s challenging is the localization accuracy. So, typically in object detection, the localization accuracy, if it’s measured by intersection of a union, and if that criteria is bigger than 0.5, people think this is good enough, but for OCR, if you actually, the intersection is only half of the union, a lot of the characters will be missing. So, usually OCR will need a 0.9, 0.95 level kind of accuracy in order to recognize all the characters properly. So… Sam Charrington: [00:08:31] Can you explain that in more detail? What is intersection over union and how is that used in convect detection? Cha Zhang: [00:08:39] So, in order to measure the accuracy of a particular detection algorithm, you need to ground truth label the data, and so, typically what people do is they create a bounding box of the object to be determined, to be detected, and then you use a automatic algorithm to figure out where the object is, then that will also create a bounding box. Now you have two bounding boxes. and the question is how do you measure how well these two boxes align and, a common measure is to take the intersection of these two bounding boxes and you take the union of these two bounding boxes that you get two areas. You can imagine if the two bounding boxes are very close to each other, overlapping a lot, then that intersection of a union would be very high, but if they are off, they’re offset by quite a bit, then you know, the number is low. So that’s kind of academia standard, how people measure detection accuracy with this criteria. Sam Charrington: [00:09:46] Got it. And so, you were saying that the threshold that you need in the case of texts is higher because of what? Cha Zhang: [00:09:58] Because of… Let’s just think about, you know, you have a ground truth text, let’s say, “Hello world,” and it’s elongated a rectangle and you say, I have a text detection algorithm that creates also a bounding box, but have a intersection of a union, let’s say roughly 0.5, and so what that means is that the intersection area divided by the union of the two bound inbox is 50%. So very likely the detective bounding box will miss a few characters because, you know, the overlapping is not there. So, you might be missing at, you might miss a D as an N and all this will cause the OCR to produce wrong results. And so that’s the main challenge here. Sam Charrington: [00:10:48] So in the case of a traditional object detection scenario, you may miss a half of the face but you can tell that there’s a face there in the case of OCR, you’re just missing letters and it makes it a lot more difficult for the algorithm to guess what was there. Cha Zhang: [00:11:07] Yes, exactly. Sam Charrington: [00:11:08] Got it, and maybe taking a step back just to the problem as a whole, granted mobile is driving, you know, this transition to these in the wild pictures and people trying to OCR them, but what are the high value use cases there? Like, is it, you know, I’m thinking of some interesting ones as like the… when it’s in conjunction with translation, you know, maybe I’m in another country and I’m, I’ve done this. You know, you’re taking pictures of, of words and another character to try to read the menu or something like that. I’ve also done things like scan documents on a phone and, and you won’t want to OCR those, but that’s kind of back to the traditional OCR problem in a lot of ways. What are some of the other use cases that are common? Cha Zhang: [00:11:58] If you look at this kind of business opportunities, I still think the traditional document, you know, scan document, I think, some traditional kind of OCR problems that like, for example, receipts, where people can scan in the old days, but nowadays people mostly do reimburse them by taking or snapping a photo. So I think in term of the market, the revenue, I think that’s still quite a big one. There are a few others. The one that you mentioned, if you have a phone, you go to a foreign country, you snap a photo and you want to translate them as one. There’s also a lot of applications in digital asset management. So this is when you, either you are a big company or you are a personal kind of, you have some big storage of photos and where you want to organize these photos. We have shown that with OCR capability, you can increase the accuracy of processes, photos, and retrieve these photos. As a matter of fact, you know, the big search engines like Google and Bing, when they search images, OCR is integral part of that as well because the OCR, the content can help a lot in getting the best images. Sam Charrington: [00:13:22] Okay. And so, you were mentioning kind of some of the technical challenges and localization of the texts in these images is one of those challenges. How do you go about it? Is it the case that, you know, deep learning is so powerful off the shelf. Deep learning techniques just solves it for you or do you, you know, you reengineer the whole pipeline? How do you approach that? Cha Zhang: [00:13:53] So in text, this action, usually the detection pipeline is different from a traditional object detection. What’s been most popular for kind of OCR for imaging in the wild today is something called anchor free detection. So the idea… Anchor free. In a typical object detection, usually most well known anchors, like fast RCN and faster RCN, etcetera. They basically create these anchors and then they regress the actual bounding box of the objects. The challenge of using that kind of approach is that these anchors need to be preset, and so typically for normal object detection, you set at a certain density, and then you set a certain set of aspect ratios. Like your anchor box are one to two, one to three, one to one. Typically you go about there, but texts, some of the text can go like 20 to one so really you cannot, it will be a huge computational cost to go with anchor based approach. So modern days for OCR, we go anchor free, and the high level concept is essentially by using convolutional neural networks. You almost do kind of a per pixel level, a decision or classification saying, well, this region nearby this particular pixel, it looks like part of text. So there is a text/non-text classification almost kind of per pixel level. Then you rely on a few algorithms to group these into text vines by looking at how well two, for example, two texts, the region are similar to each other and you can decide, well, these two looks like the same textures and color, and maybe they should be connected. In this regard, there are quite a few well known algorithms to do this connection. In earlier days, people use a relatively kind of a rule-based approach like stable link where they link based on some features, but it’s kind of a rule-based. More recently, people start looking to new networks like relation network. So are kind of estimating the relation of two regions are features, and based on that to decide, well, these two should be connected or not. So that way you started kind of bottom up; you start with perfect kind of classification, and then you do grouping, and you come out with these text lines. Very powerful approach. It can not only detect kind of a straight lines, but even curve lines, you can handle them pretty well with those approaches. Sam Charrington: [00:16:44] So it sounds like you’re describing a pipeline. That’s not like a, end to end train single neural network that you give it images and train it on label data. It is, telling you what the text is, but rather a bunch of independent steps. Cha Zhang: [00:17:04] Yes, that’s a very good observation. Actually, so for OCR, detection is only the first step and after detection, we typically run a character model where you take the detected text lines, you normalize them into a straight line with a fixed height, and then you run a character model to actually decode the image into a character, a list of characters. There are a lot of approach actually similar to speech where, you know, speeches going from acoustic similar to these texts. But here we’re going from image to text. But a lot of the approaches that we use, like LSTM, language modeling, these are very similar. Now your question is certainly valid because in speech today, you know, people do end to end training you. They start from audio so they can directly go to text. For OCR, we are not a year yet. I think the main challenges, well first is how much data you have. I think speech, you can collect a lot more data compared with OCR. OCR data are usually very expensive to collect in a label and so, going stage by stage at this point is more economically doable than, you know, do end to end training. Sam Charrington: [00:18:25] Why is that? It seems that we have tons of pictures with words in them that we know particularly, is it just in the wild, the, in the wild examples where we don’t have the label data or is also this document use cases because I’m imagining, Microsoft has probably labeled a ton of receipts and business cards and that kind of thing. Cha Zhang: [00:18:50] Yeah. I think certainly a labeling is very, very expensive. For Microsoft, we are a company paying a lot of attention to privacy, you know, those kinds of issues and the collecting OCR data has been a major, I would say, blocking issue to go for this kind of end to end approach because if you think about it, a lot of the document that we actually carry, like if you say, talk about invoice, talk about receipts, business card, they all contain PI information. Those are data extremely difficult to obtain, and we follow very strict kind of guidelines – how we can collect them, how we can label them. So in some way we are limited by these privacy restrictions, but we do respect those a lot. So we, as a result, you know, we are now going end to end at this point. Sam Charrington: [00:19:48] Got it, got it. It makes me think a little bit about the, some of the issues with neural networks, remembering data. So for example, there are examples where you’re, you train a CNN and there are some attacks that you can do that will reproduce some of the images, you know, it’s to some degree or another, that the model was trained on. Likewise, with these very large language models, you can start to see some of the texts that the models were trained on, come out in the, in the output. I would imagine if you were training end to end, at least then that becomes an issue as well, and maybe more so than in the case of images.   What’s your intuition there? Would it be worse or are better than images? Cha Zhang: [00:20:39] I would imagine it will be similar, I would say. So after all, you know, OCR, you come from image to text, but during the learning of this OCR process, language model is actually very helpful to help improve the OCR accuracy. So, for example, during decoding of these texts lines into a text, we use some of the, like LSTM or, you know, basically these very popular language modeling schemes. Certainly it remembers the contextual information of the language in order to help the OCR to recognize these texts properly. So, I think when you go to end to end, when the amount of data that you use for training is humongous, I think, it’s difficult to imagine for me, you know, we’ll have similar level of data for training like BERT models or TBT models. Those are huge, huge amount of data, but still you will learn something from the text and they might leak into the model as well. Sam Charrington: [00:21:51] Along those lines, what enabled BERT and many of the recent innovations around language models is a shift from supervised to the semi-supervised way of framing the task. Is there a semi-supervised framing for the OCR test? That makes sense? Cha Zhang: [00:22:13] Actually for OCR today, we are not, although I think it’s definitely a very interesting research problem. I think BERT is a super nice framework for transfer learning. You know, you, you go from pre-trained model and then, you know, on a supervisor, you can… In the image word, I think, transfer learning probably exists earlier in image than language. So earlier days when we have ImageNet, we trained like a resident, those are already being used for transfer learning. So, unsupervised kind of image learning is also, I think it’s still ongoing. There’s a lot of interesting projects going on. I think for OCR right now, we’re not there yet. Like one of the main issues for building a product like OCR to use some of these pre-train model is the computational cost. I think this happens in language as well, BERT model, the GPT Model 3, like, you know, multi billions of parameter is very difficult to turn them into a product for OCR. It’s also, you know, we have the same problem. Computational cost is very sensitive. We need to make it fast, and so we’re using it relatively small models and normally we train from scratch. Transfer learning does show some benefit, but when the data reaches a certain amount, we found training from scratch is perfectly fine. Sam Charrington: [00:23:49] When you have a certain amount of data to train from? Cha Zhang: [00:23:53] Yeah. In the very early days when we started doing different learning OCR, we actually rely a lot on trans distillation – that’s teacher-student learning, where we first train a big and model, and then we gradually use teacher-student learning to create a small model so that it can run efficiently. Nowadays, we have figured out that you can train these models from scratch. The amount of data that we have on the order of, you know, hundreds of thousands and millions of images are sufficient to train from scratch on smaller model, and reach about the same accuracy. Sam Charrington: [00:24:31] Can you elaborate a little bit on that? Are you saying that you need more data to train smaller models? Cha Zhang: [00:24:37] No, I’m saying that… Take BERT as example. BERT is super beneficial for transfer of learning because it has seen so many documents. So giving any new language task, presumably your data is not much, there’s not much data that you have to train this new task, and therefore, leveraging BERT, where it has seen so many documents, will help through transfer learning to transfer some of the knowledge that the BERT has learned from this huge set of document, to the small kind of task so that it can reduce the amount of documents required to train the smaller task. The same thing happens in ImageNet transfer learning where, you know, if it’s a ResNet train on ImageNet, you learn a lot of visual information from the ImageNet dataset. Then if you have a tiny detection task, like detecting a helmet, let’s say, and you can do the transfer learning and you can use a very small amount of dataset to actually train a very good helmet detector. What I was saying just now was that for the problem of OCR where, you know, it is certainly a very important computer vision problem. Every company who invest in OCR tend to collect quite a bit of data, not to the level of, you know, billions, but hundreds or thousands, millions to that level, that amount of data is sufficient that you do not need to go transfer learning. You can train the model from scratch and you get very good results. Sam Charrington: [00:26:19] Got it. Got it. So when you were using transfer learning where you’re using models based on ImageNet, you know, along the lines of ResNet and others, or whether… Okay. Lets see… so the smaller models that you’re training are they, you know, some of the traditional architectures that we’ve already brought up or are you building out new architectures for the models themselves for this specific problem? Cha Zhang: [00:26:53] Right now we’re using some of the traditional models. There are some active research going on regarding searching the best effective architecture for OCR. We haven’t seen convincing results yet, but I think that’s a very active research area that we’re still kind of looking into, particularly when we try to make it smaller and smaller, you know, faster and faster. Sam Charrington: [00:27:20] When you say searching the best architecture for OCR, are you speaking using the word searching generally, like you have researchers are looking at different models and trying to find the best one for OCR, or are you suggesting a domain specific neural architecture search kind of…? Cha Zhang: [00:27:38] I mean neural architecture search. So that certainly can be applied to OCR and we were still exploring it, but I think that’s a very promising direction. Sam Charrington: [00:27:49] Okay. Interesting. Interesting. Earlier in the conversation you talked about one of the big use cases is some of these semi-structured data that we want to extract information out of – invoice is one example. There was a recent demonstration, or I guess that’s actually a product now of the mobile version of Excel or something. You can take a picture of a grid, grid like data, and that will, you know, both extract the text and organize it into a spreadsheet. Talk a little bit about the product that you’re working on the form recognizer, which is doing something similar. Cha Zhang: [00:28:35] Yeah, of course. So OCR certainly is pretty low level. Other than some of the application I mentioned earlier, like digital SMN and then photo managing, you know, translation, you can directly use OCR, but for many customers, what they want is not just OCR. They want to extract information from documents. Think about, you know,I need to process millions of invoices. I want to extract vendor name and the date, total amount, or if it’s an MS expense system where you want to process all the receipts, and either it can be a verification purpose, for example, like, okay, how do I make sure employees are not putting random numbers and they don’t match with the receipts that’s actually filed. It’s actually, it sounds kind of silly but you know, today, a lot of the company do this verification manually. Because of the huge manual amount of effort needed, they often can only do sampling. So you sample like 5% of these receipts to validate, but you kind of miss a huge chunk, and that you never even look at it? So we are looking at this space and we’re trying to build essentially two category of product – one is a previous set of product and these are solutions that works out of the box. For example, it can be a prebuilt receipt, pre-built business card, pre-built invoice. So these are, basically you’re sending an image or PDF file. It will extract all the fields that you’re, you’ll be interested in. Another big category that we think are super important is customization because, you know, the pre-build may never fit every need. So we have a solution called the custom form where we allow customer to basically send us a few sample images. You can either label or even, you know, not doing any labelling but we will be able to extract key value pairs out of these documents. Again, we see this as a much closer to what the customers need and that’s what the form recognizes its position as. Sam Charrington: [00:30:54] So we’ve talked about a bunch of the interesting technical challenges at the lower level at OCR. Does the form level, you know, is that a kind of a packaging of OCR? Does it have its own technical challenges to overcome…? Cha Zhang: [00:31:13] Actually it has a lot of very interesting challenges. So, one of the work recently is coming out from Microsoft research, whereas, you know, targeting exactly this problem. And so, just think about it. The language, I mean, passing these invoices and receipts are essentially sort of a language problem because you have these texts there. The challenge here is that these are images, so you run OCR on them, but unlike a typical language, a data set where you’ve scratched from the internet, you know, Wikipedia there’s basically have this ordering of these words already, but if these data coming from image, essentially you can detect these texts lines, but it’s actually very difficult to define the read order of these texts lines, and ordering of these texts lines by itself is a very challenging problem. When you have images in the wild, paper can be curved, you know, can be crunch, can be rotated here, the perspective, you know, all kinds of issues. They can have background text, you know, all these. So the particular approach that MSRA came out is called LayoutLM. It’s actually a modified a BERT model. It’s also a language model, but in addition to the language, we also embed 2D information, like what is the X, Y position of the bounding box of the text? So with that information, train, actually, this is all can also be trained without supervision. It’s unsupervised pre-training. We are able to learn this kind of spatial relationship in these invoices without coming out with explicit read order. With that, we actually can do a lot of these key value extraction really well. There’s also quite a lot of advanced research looking into say, relation networks where you see two text lines nearby each other, you can predict the relationship. Again, this is similar to the OCR where you have these bottom pixel level classification. You want a group of them here. You want a group P key and a value pairs. There’s also a lot of advanced research in this graphical convolution networks where you do convolution networks over a graph, where the graph is defined by connecting nearby text lines. Again, this is approach without requiring reading order, but just look at the spatial relationship. So these are all actually very exciting kind of extension of language, but also using visual information to help passing these vertical data more accurately. Sam Charrington: [00:34:09] Interesting. Yeah, I think it’s… At a quick thought would’ve imagined that, you know, maybe the top part of the stack, there is more rule-based than the bottom part of the stack was, you know, more machine learning base, but it sounds like they’re even, I don’t know, relatively, but there are a bunch of really interesting… Cha Zhang: [00:34:33] We are doing a lot of machine learning stuff on the top as well. Sam Charrington: [00:34:37] I’m imagining the, you know, when you talk about relation net, for example, on an invoice you could have date, and then the date, you know, horizontally next to it, or you can have date and then the date beneath it. Cha Zhang: [00:34:50] Yes. Sam Charrington: [00:34:50] You may have an address box and then a bunch of texts that comes beneath it. It would be nice to know that, you know, we’re talking about the address here. That’s part of the idea of the structured text extraction. So in that you mentioned relation net and graphical CNNs. Are those two approaches to solving the same problem or are they solving different aspects of the problem? Cha Zhang: [00:35:13] They solve different aspects of the problem, and they can be also used to solve the same. I mean, like right now, the main focus for us, for them for extracting key value pairs. This is both kind of pre-build and the customization. Think about, if it’s an invoice and you want a vendor name, so it’s a name. Certainly, you know, the text information because you see it looks like a vendor name. This probably is a vendor name and some invoice doesn’t even have the key in the invoice. Sam Charrington: [00:35:48] Right. Cha Zhang: [00:35:49] You don’t even have the word vendor name there, so how do you figure out this thing is still vendor name? So, there, you rely on information that’s language and that’s also kind of how the document is laid out. Like, okay, the font size may matter. You know, the position of the same may matter. So we are looking into combining all this information to come out with a better decision on those fields. Sam Charrington: [00:36:21] So, how does a graphical representation or way of thinking about the document gets you to a solution to these kinds of problems? You know, for example, the unlabeled vendor name? Cha Zhang: [00:36:33] The graphical kind of approach is basically… so you’ve got a bunch of text lines detected by the OCR and you connect to these texts lines with their neighbors. You define basically how strong these connections are. Actually it’s not defined. You actually learn these relationships by looking at the texts, looking at their relative positions, looking at their font similarity. Like one issue that you actually just mentioned was like address as you connect ’cause you have multiple lines of addresses. How do you know they actually belong to the same address? Right? So there’s this kind of, all these side information could be very helpful in determining that they should be grouped together. In the convolutional kind of graphical model, you learn a convolutional network by computing from all the neighboring nodes where each node is a text line to aggregate basically at the center node. So basically, the model learns by not only looking at the current text line that’s in focus, but also look at all the nearby text lines and decided, well, given all these contextual information, it does look like this is a vendor name. I guess that’s a very high level conceptual description of why it would work, but it’s the data driven machine learning so that the model [inaudible]. Sam Charrington: [00:38:06] As you’re solving problems like this, are you often needing to re-label your dataset? For example, imagining early on in developing an algorithm like this, you have a bunch of invoices, and you draw a bounding box around the addresses and you say, this is the address. Then you say, ‘Oh, well the font information is a whole new dataset,’ you have to label, well, this is… Are you going in and having people label Helvetica versus Arial? That seems a bit fine grain and hard to actually get an experts to label, or is it more abstract than that? Cha Zhang: [00:38:48] We usually only label the end goal, which is the field that you’re going to extract. So, for example, you want to extract a vendor name, vendor address, total text, you basically draw a bounding box in those regions and use that as a ground use data. Sam Charrington: [00:39:06] Got it. I think we’re going to the same place. When you say font… Cha Zhang: [00:39:11] When I say font, actually it’s in some way, implicit in the sense that we’re taking these bounding boxes, we’re extracting image information. Right? So think of it as let’s say, run a convolution network to extract a feature of that part of the text region, the text line. So, this feature is essentially all the visual information that can be helpful in deciding or determining the relationship between text lines. So if features are similar, it probably mean they are similar font, they are similar size, you know, so those kinds of… So, yeah, I think that seems to be sufficient. Sam Charrington: [00:39:55] So you’re not trying to kind of featurize your underlying images into these distinct things because what I inferred, when you said font. Do you look at the, you know, is there an analogy to kind of looking at the layers of the network, and when we do this with CNN, GC, like textures and things like that, is there some analogy that you’ve seen in looking at the layers of the network that says, ‘Oh, this layer is like identifying fonts.’ Cha Zhang: [00:40:32] No, we haven’t been going there yet. Well, I guess it’s certainly interesting to look at it. My take is most likely, font is just one attribute. I believe there are many other things. Yeah, I think it’ll be interesting to look at these features visually. Yeah. Sam Charrington: [00:40:54] We’ve talked throughout the discussion about kind of the ways that OCR and this form recognition problem kind of blends the vision domain and NLP domain and language models has come up quite a bit. Is there a little bit more kind of depth we can go into there? Some of the ways that, that you see, NLP, and particularly the advances in NLP over the past few years kind of influencing the problem and the way you solve it? Cha Zhang: [00:41:32] Yeah. We set up, I see NLP plays a very important role in these verticals. After all, these invoice receipt, business card, these are all human artifacts. They’re kind of language artifacts in some way. Right? So, all of the kind of latest state of the art in language modeling, we definitely want to leverage The thing I mentioned earlier, like the layout or it’s a one way to leverage them by using the language model, but also embed additional visual information, and hopefully to solve these problems effectively because input is really different, right? You know, the priorities like you take texts, it’s input here. We’re taking a bunch of texts lines to the locations and bounding boxes as inputs, and the algorithm can naturally kind of solve these problems. Sam Charrington: [00:42:30] And,is it also trying to do the traditional language model predicting the next character or word or set of texts? Cha Zhang: [00:42:38] Yeah, the way we train them are very similar, basically, merge texts – you merge some words and try to predict. Certainly you can use a lot of others. I think, you know, like I know recently people use translation targets. You can use alpha virgin coder kind of targets. This is a really active research area at this point. I don’t think, I think we’re still just scratching the surface, although we already seeing very, very promising results. So we definitely want to look deeper into this and see how well this really can push the state of the art. Sam Charrington: [00:43:21] Kind of continuing on that thread of the active research areas and what the future holds in this area, what are you most excited about in this domain of OCR and in general, extracting text from documents, vertical applications and the like. Cha Zhang: [00:43:42] Yeah, I think, we have been working on this problem for quite a while, but I think there’s still a lot of interesting problems. Only when we start to work with customers, we realize, you know, there are problems we haven’t been able to solve. I can just name one, for example, like table extraction sounds trivial, but when you actually look at all the existing tables in the word, the simplest one are those with explicit cell borders where you have straight lines but in reality, these tables can have no cell boundaries at all. It can be mixed on top with STEM, you know, all these things that are kind of making the problem extremely hard. So that’s jus, another one that is extremely challenging, but we want to solve. Another thing that I sort of briefly mentioned about earlier was the customization part of these vertical. How do you customize to customer’s own data instead of having these pre-built ’cause inevitably, you will have data that doesn’t work with these premium models. How do you allow customer to have a way to build their own models to still work, and that by itself is a very challenging problem because asking customers to label a lot of data is painful. They don’t want to go there. So either we go unsupervised or we go with very, very limited in number of supervision data. In such a case, how do we adapt our model so that it can work on this document that customer realize that the premium model has failed. That’s also very interesting kind of research problem that we are looking into. I envision in a language as low shot learning. It’s also, now it’s definitely applicable to the problem here as well. Sam Charrington: [00:45:50] In the case of some of the product ties, vision offerings, Azure does this as well. The user is able to upload its own set of labeled data and kind of the results for object detection are kind of fine tuned against the user’s data set. Cha Zhang: [00:46:13] Yeah. Sam Charrington: [00:46:14] Do the OCR and form recognition offerings, are they providing something similar? Like, can you upload it? Can I upload my own invoices? You’re doing some kind of transfer learning or, well. If you are, what are you doing to take advantage of what the user’s providing? Cha Zhang: [00:46:33] So we do have a product called a custom form which allow customer to upload a few samples here. We usually say minimum of five samples. So, say you have an invoice that doesn’t work with existing models, and so you want to solve the problem when you upload five invoices with similar is fine. These are from the same vendor or kind of looks or similar in structure, and we can figure out these key value pairs and extract them, either unsupervised or supervised. Right? Unsupervised means, customer don’t need to label anything. So you upload the file documents. The information we’re gaining by looking at these five documents is, well, these documents are supposed to be similar and therefore, they’re going to be a bunch of words in this document that actually is common across these documents. This commonality help us to tell, well this is probably part of the empathy form or the template of the form, while the thing that’s varying across forms are like, these are must be information customer has filled in as kind of different from sample to sample. So with that information, we can actually extract key value pairs out of, without any supervision. All you need is upload five similar documents. Of course that works to a certain degree, but if you’re still not happy with accuracy, we provide a way for you to label your key valued pairs. So here is like we, we have a UX where you can go and label the fields you care by essentially highlight the OCR text lines where you think this is the value I want to extract. Then we actually learn a model out of five samples and produce a model that can be used by the customer to extract these values. The accuracy is actually normally pretty high, in the 90/95 percentage range, actually. Sam Charrington: [00:48:38] So when the customer does this, is this process entirely learned or is there a human in the loop kind of exception handling element to it? Cha Zhang: [00:48:50] I guess this is probably kind of take a step back. I think all the products, OCR process today, OCR has made a significant advance, but if you actually care about the numbers, think about the invoice. Right? If your total is wrong, it’s really that bad. So, what we recommend is definitely we recommend people to have agent backup. For all of the products we offer, we give people confidence, right? So how confident we are about the expression of a particular value, and a different customer can choose their own threshold and have an agent to look at them. But I think, today’s accuracy. we don’t recommend kind of strays through, unless you are handling certain specific applications. I can give you an example. For example, if you have a valid, if you’re verifying receipt image against a employee entered data, so there you can go automatic, right? ‘Cause if the OCR produce a different number than the employee, well, you will need somebody to look at them anyway, but if they actually merged them, well, that probably means it’s okay. Sam Charrington: [00:50:08] Right. Cha Zhang: [00:50:08] So the application, you can automate it more. Sam Charrington: [00:50:13] Got it. So, the question that I was asking is slightly different though, and you know, so say you’ve got someone using automated form recognition and they have their five examples that they haven’t been happy with, and they submit that in through some website, our API, is someone at Microsoft taking those, and going, taking them manually through some process to try to figure out why they’re not working or are they thrown into some training job and then the customer’s result gets better? Cha Zhang: [00:50:48] Okay. Now, no, we don’t look at the customer’s data. So this is a fully automated product, meaning, you know, customer basically label these files. They call a API to train a model. The whole process is automated. Sam Charrington: [00:51:04] So under the covers, are they kind of forking off their own model? The last few layers are getting cut off and it’s fine tuning, or is it more elaborate than that, or…? Cha Zhang: [00:51:17] It’s more elaborate than that. Underneath the hood, there are multiple steps. We leverage a lot of information in these sample documents. For example, as I mentioned earlier, there will be words common across these samples. Those are very strong indicators regarding this might be part of the empathy, part of the form where, you probably think these are not so interesting to the customer. Transfer learning is certainly one way of doing that. Right now we are actually train these models without transfer learning. So it’s actually, the model is training from scratch for very few customers we’re able to do this. We’re able to do this because some very interesting work that we have done tobasically augument this data to make sure that you have sufficient data to still be able to train a model out of five samples only. This can be a feedback loop as well. So, if customer’s not happy with a model trained by five samples, you can upload them more and we just train a new model for you. So every time you try and just get a newmodel, that way, it’s a feedback loop where customer can keep improving their model until it to a certain stage where it’s really performing for the customer. Sam Charrington: [00:52:53] So when you say augmenting the five that they’re providing, are we talking about data augmentation and the sense of a transformation pipeline that kind of changes, adds noise, rotates, that kind of thing? Or are we talking about, you’ve got some other data set that you’re adding to their five and training it on that aggregate data set, and that’s how you’re producing a better model? Cha Zhang: [00:53:21] Both. Although I think the latter one is more because actually, when customer label these data, they actually provide, we ask them to provide some additional information. For example, they label, this is a date. We know it’s a date. So in this way we can artificially create more data to fill the form so that we can produce more data to train the model. Also, we use a very robust machine learning algorithms that are robust to very few examples. So, that way we can learn with this limitation. Yeah. Normally, if you look at many of the other offerings that people provide. You have to train with hundreds of examples here. We’re pushing it really down to five and we hope to push it even lower in the future. Sam Charrington: [00:54:11] So I’m assuming that this is a stacked problem and you’ve got some low level OCR, for example, models that are trained with many, many documents. What you’re doing with this form recognizer custom data is more at the top end node of that stack. Is the off the shelf model that I’m using without the five example customization, is that also trained on relatively few examples? Cha Zhang: [00:54:44] What do you mean? Sam Charrington: [00:54:45] I guess what, I guess maybe I’ll jump ahead to the conclusion that I’m drawing on. What’s what’s confusing me is how are you getting better results with few examples if you’re not using any kind of transfer? I guess I heard in your explanation that you’re not doing any kind of transfer. Cha Zhang: [00:55:03] So right now the custom forms support training model and these models are usually… each model is geared towards one particular form type. So in some way you can think this problem is actually restricted. It’s actually a easier problem. It’s not like a pre-built invoice where essentially you want to handle all your invoices. Here we’re handling one particular invoice coming from, I would say one particular vendor. I say they usually use this template. Sam Charrington: [00:55:37] Got it. So the customer then, do they call a unique API to resolve invoices of this type? Or is that then ensembled, and then there’s something that decides whether it’s of the type that you’ve built the new model for? Cha Zhang: [00:55:55] Yeah. So here’s a kind of the recommendation that we give to customers, right? So you maybe start with the previous model, and the previous model may work and then your job is done. If you’re happy, go. Then you certainly say you have a lot of invoices and out of a thousand, 10 of them doesn’t work. So while we offer the customer as well, take these invoices and you can train specific models for these 10 different invoices, you might need to train more than one model as a special model because this invoice may look very different. So imagine you can train like 10 different customer models for this. We actually also offer kind of automatic invoice classification. So a API called a model compose where we can compose these 10 small models into one. So, all you need is just calling to that one. By calling into that one, we also provide you a confidence to say, well, because during testing, the customer send the invoicing. We don’t really know whether it’s one that doesn’t work with this pre-built one or whether it’s part of this. It works well with the previous. So you send this invoice first to the customized version of the model, and we will tell you, ‘Hey, it doesn’t look like any of the 10 you have trained.’ So in this case, you will revert back say, okay, now I’m calling the previous invoice ’cause you sort of know that pre-build actually works well for that. So that’s what we recommend customers to do. Sam Charrington: [00:57:34] Okay. I dug into a little bit of the detail there, but it’s interesting to see kind of how the end-end problem is put together. In a case like this, the ends of that problem are on the customer side, not just the service that you’re offering, and so seeing how the pieces are put together is kind of interesting. Awesome! Well, Cha, thanks so much for taking the time and walking us through some of the interesting things that are happening in these domains. Cha Zhang: [00:58:12] Thank you for having me. Sam Charrington: [00:58:14] Great! Thank you.
In this month's community segment we chatted about explainability, Carlos Guestrin’s LIME paper, Europe’s attempt to ban “untrustworthy” AI systems and finally, Community member Nicolas Teague shares a blog post he wrote entitled "A Sight for Obscured Eye, Adversary, Optics, and Illusions,” which explores the parallels between computer vision adversarial examples & human vision optical illusions. In our presentation segment, Philosopie Group Inc. Director of AI, Chris Butler, joins us to discuss Trust in AI. Chris gives us an overview of a number of papers on the topic, including: Humans and Automation: Use, Misuse, Disuse, Abuse Trust in Automation: Integrating Empirical Evidence on Factors That Influence Trust Some Observations on Mental Models Overtrust of Robots in Emergency Evacuation Scenarios For links to the papers mentioned above and more information on this and previous meetups, or to get registered for upcoming meetups, visit [twimlai.com/meetup]!(/meetup) SUBSCRIBE AND TURN ON NOTIFICATIONS
As you may have heard on the podcast, I’m trying the newsletter thing again. I’m not sure what it’ll evolve into, but my goals are to make it personal, informative and brief/skimmable. I hope you’ll come along for the ride. As always, please let me know what you think! O'Really? On Monday we dropped five shows in our O’Reilly AI series for your binge listening pleasure. I’d name my favorite but they’re all my favorite! Really, the series offers something for everyone. I cut straight to the chase with Intel’s AI czar Naveen Rao, wax creative with Google’s Project Magenta lead Doug Eck, go full Nerd Alert with Ben Vigoda on Bayesian program synthesis, chat about scaling video object detection with Reza Zadeh, and learn how Rana el Kaliouby’s company uses emotional AI to help brands craft the customer experience. Check it out! Over the river & through the woods I just got back from a great trip to Europe. The bulk of my time was spent in Berlin, where I got to explore the city and tech scene, deliver an intro to AI workshop, and meet with TWIML listeners. Before heading to Germany though, I ventured into the Swiss hinterland to interview an impressive—and controversial in some circles—figure in modern AI—Jürgen Schmidhuber, co-creator of the LSTM neural network architecture. We had a great time and a great discussion, which will be posted soon on the podcast! Reading is fundamental In a recent show, I thought out loud about starting a paper-reading group for TWIML listeners. The idea seems to have resonated with folks. If you’d like to join in, jump over to the meetup page express your interest and help plan the details. Join me at the next AI Conference Apparently, the O’Reilly AI conference is being renamed “The AI Conference.” (Hubris anyone?) As usual, we’ve got a free ticket to give away, and we want to give it to YOU! Just comment or share your favorite quote from any of the shows in our O’Reilly AI series to enter. Commenting/sharing for each show gets you five entries! More details at the series page. Sign up for our Newsletter to receive this weekly to your inbox.
On the heels of last week’s $200 million acquisition by Apple of Turi, Intel announced on Tuesday yet another acquisition in the machine learning and AI space, this time with the $400 million acquisition of deep learning cloud startup Nervana Systems. This is another exciting acquisition; let’s take a minute to unpack it. First of all, for those not familiar with the company, Nervana, spelled N-E-R-vana, is a two year old company developing software, hardware and cloud services for deep learning. The company was originally founded to build hardware for speeding up deep learning, and it’s this focus that made it so attractive to Intel. The company’s first hardware product, due next year, is a custom deep learning chip called the Nervana Engine. The ASIC chip is similar in focus to the Google Tensor Processing Unit or TPU which we highlighted in the very first episode of This Week in Machine Learning & AI back in May. The company has also released a software product called Neon, and operates the Nervana Cloud. Neon is an open source deep learning framework like TensorFlow, Caffe or Theano. Relative to those others, which you hear about here on the show pretty much every week, Neon is known for being particularly fast, especially on NVIDIA GPUs. This is due to some clever optimization work the team did with the GPU firmware. Neon doesn’t have quite the popularity of some of these other frameworks, in part because it was initially a proprietary product, only recently open sourced back in May. The company’s cloud offering is tuned for running deep learning, and will eventually incorporate the company’s own chips. This is a great deal for the company’s founders and investors. With $24.4 million in funding to date, and a price reported to be as high as $408 million, Nervana returned nearly 17x to investors, which is home run territory for most VCs. At the same time, if you’ll allow me to Monday Morning Quarterback, I’m a little surprised that they’ve decided to sell so early in the game. The company is extremely well positioned in really two hot spaces, deep learning and cloud, and the team has only been at it for a couple of years. Projecting out a couple of years, it’s easy to see Nervana with a billion dollar valuation, assuming they continued to execute. This makes me wonder what the team saw in the market that said that now was the time to sell. Of course, it’s certainly the case that Intel brings a lot more to the table here than cash. The company obviously has vast resources and expertise in the chip-making arena and they could certainly help accelerate Nervana’s plans. It’s also the case though that the company faces stiff and growing competition. Google for example, offers everything Nervana does. Google’s TensorFlow, released about 8 months ago, is by most measures the most popular deep learning framework. (You’ll recall we discussed Francois Chollet’s analysis of the landscape back on the July 15 show.) Google also sees TensorFlow as becoming an on-ramp to the Google Compute Platform. And GCP has TPUs, which I just mentioned and which the company announced back in May. So perhaps the Nervana team and investors looked at the long slog ahead and decided to take the money off the table. I do wonder if the lack of an upside in terms of options makes hiring top talent more difficult for the company. So that’s the Nervana side of things, what about Intel’s side? Well, while this is a pretty small acquisition for Intel, I think it’s a smart move on their part. That’s because, despite numerous investments in the space, as recently as their investment in Nervana competitor CognitiveScale last week, Intel has been struggling to tell a story around machine and deep learning. The problem they’re facing is that NVIDIA is eating their lunch when it comes to chips for deep learning applications. In fact, NVIDIA also made news this week when they announced record revenues and a more aggressive sales outlook. The reason for the improved outlook? Quoting CEO Jen-Hsun Huang: “One particular dynamic sticks out, and it’s a very significant growth driver of where we have an extraordinary position in and it’s deep learning,” Huang told analysts in a conference call that lasted almost 80 minutes. “The last five years, we’ve quietly invested in deep learning because we believe that the future of deep learning is so impactful to the entire software industry, the entire computer industry that we, if you will, pushed it all in.” NVIDIA’s lead in deep learning has been a sore spot for Intel of late, to the point that several articles commented on interviews with company data center chief Diane Bryant where she became ruffled at the mention of Intel’s lack of presence in the machine learning market. Now, Intel and Diane are quick to shrug this off, since machine learning is a relatively nascent market. According to the MIT Technology Review, market research firm Tractica pegs the market for AI-related chips at under 1 billion, growing to 2.4 billion in 2024, a small figure compared to Intel’s 2015 revenue of $56 billion. But Intel missed the boat on mobile and PC chip sales are declining, and there’s weakness in data center and IoT revenue growth as well. So while machine learning and AI are an emerging market just at the beginning of the growth cycle, Intel can’t afford to sit this one out. This deal gives them a much needed story around deep learning and if the companies are able to execute, a foot in the door of this nascent market. Moving forward, this poses some of the same challenges I mentioned in the context of Apple/Turi, namely executive focus, but I also think this plays to several of Intel’s strengths. In particular, while I’ve seen the company struggle trying to independently build and sell enterprise software, the company does a good job of building and selling through reference architectures. If Nervana ultimately becomes a reference for how to build out a deep learning cloud using new and traditional Intel hardware combined with open source software, this could drive significant future adoption for them and begin to turn the tide. There are also a good number of possible tie-ins to take advantage of here. One is with Intel’s open source project, the Trusted Analytics Platform. Also, Intel has a significant stake in big data company Cloudera and cloud builder Mirantis. This is getting a bit ahead of ourselves, sure, but there could be some pretty interesting collaborations between these projects and companies over time. Subscribe: iTunes / Youtube / Spotify / RSS
Autonomous driving startup Comma.ai released a small dataset that lets you try your hand at building your own models for controlling a self-driving vehicle. The dataset consists 10 video clips recorded at 20 Hz from a camera mounted on the windshield of a 2016 Acura ILX. There are about 7 hours of video total, captured mostly during highway driving. Alongside the video files are a set of sensor logs where measurements such as velocity, acceleration, steering angle, GPS location and gyroscope angles are recorded. The dataset is a 45 GB compressed zip file that explodes to 80 GB when compressed. That is, if you can get it to uncompress. When I tried it, after a fairly long download, unzip complained about the file being corrupt when I tried to unzip it. The project’s github repo includes a script to download the data from archive.org as well as some simple models built in Keras and TensorFlow for predicting steering angle and creating simulated road images using generative AI. They’ve also included a paper on the latter topic. The idea is that since it’s pretty expensive to train a self-driving car on real roads, you typically want to train your algorithms in a simulator. To do that, you can either hand code a simulator or use a generative AI to create one. The paper describes the use of variational autoencoders and generative adversarial networks and an RNN to create simulated road images. You can start by running their existing models, but if you manage to do amazing things with the data, let Comma know—they’re hiring and want to meet you. Subscribe: iTunes / Youtube / Spotify / RSS
This week we discuss Intel’s latest deep learning acquisition, AI in the Olympics, image completion with deep learning in TensorFlow, and how you can win a free ticket to the O’Reilly AI Conference in New York City, plus a bunch more. Here are the notes for this week’s podcast: O’Reilly AI Conference Giveaway I’m excited to be partnered with the O’Reilly Artificial Intelligence Conference, to give away a free ticket to the event, which will be held September 26 – 27, 2016 in New York City. There are three ways to enter the giveaway: 1. (Preferred) Follow @twimlai on Twitter and retweet this tweet: Win a FREE ticket to the @OReillyAI Conference. To enter, follow @twimlai + RT. https://t.co/ReYqwqp538 for details. pic.twitter.com/9pLrzHIX9d — TWIML (@twimlai) August 15, 2016 2. Sign up for the TWIML&AI Newsletter and add a note “please enter me” in the comments field. 3. Use this site’s contact form to send me a message and use “AI contest” as the subject. A winner will be chosen at random and announced on the 9/2 podcast. Ticket is non-transferrable. Good luck, and hope to see you in New York! If you’d like to buy a ticket, register using the code PCTWIML for 20% off! And don’t forget to get your free early access ebook: Mastering Feature Engineering Intel Buys Deep Learning Startup Nervana Intel Buys a Startup to Catch Up in Deep Learning Deep Learning Chip Upstart Takes GPUs to Task Nvidia’s bet on deep learning and autonomous cars drives stock to record highs – MarketWatch AI Bot Joins Team Washington Post at the Rio Olympics The Washington Post experiments with automated storytelling to help power 2016 Rio Olympics coverage – The Washington Post Technology Fujitsu Software to Accelerate Deep Learning Workloads DetectNet: Deep Neural Network for Object Detection in DIGITS | Parallel Forall Google Research Blog: Meet Parsey’s Cousins: Syntax for 40 languages, plus new SyntaxNet capabilities Image Completion with Deep Learning Image Completion with Deep Learning in TensorFlow bamos/dcgan-completion.tensorflow: Image Completion with Deep Learning in TensorFlow [1607.07539] Semantic Image Inpainting with Perceptual and Contextual Losses [1511.06434] Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
We’ve talked fairly extensively about the use of Deep Learning in medicine in previous shows. Breast cancer and eye disease were a couple of the use cases we discussed, with both of these sharing the common feature that they’re based on image analysis. Well this week a team of researchers from Princeton University published a paper outlining their work applying machine learning to the challenge of identifying genetic causes of autism. The genetic causes for autism, or autism spectrum disorder, have been difficult for researchers to track down. The autism research community has identified 65 genes associated with autism risk so far, mostly through sequencing, but it’s believed that those are but a fraction of the 400-1,000 genes likely to be involved in the disease. To try to identify the additional genetic actors in autism susceptibility, the Princeton team used what they call a brain-specific functional interaction network, which was developed in previous research. This brain-specific network is a functional map of the brain, expressed as a probabilistic graph of how genes function together in pathways in the brain. They then used machine learning to train a classifier based on the connectivity patterns of the known ASD genes in the brain-specific network, and then uses this classifier to predict the level of potential ASD association for every gene in the genome. Specifically, they used an SVM classifier, and used the connectivity of the known ASD genes to the other genes in the brain-specific network as its features. I’m somewhat trivializing the ideas around the brain-specific network and how it translates into features, mostly because I don’t really understand it. But this is a great example and reminder that most of the magic in ML is in the feature engineering. Based on their method, the team was able to identify a number of candidate genes with no prior genetic evidence of ASD association, and has since gone on to validate many of these candidate genes through sequencing. Their results can thus be used as the basis for further analysis into the genetic causes of autism. Super interesting stuff. Check it out if you’ve got a background or interest in the medical applications of ML. A couple of other interesting research papers caught my eye this week: Researchers from security research firm ZeroFOX published a paper “Weaponizing data science for social engineering: Automated E2E spear phishing on Twitter.” Spear phishing, if you haven’t heart the term is like phishing, but is targeted at a particular user. You’re typically trying to get a user to click a link that will trick them into giving up some credentials. What the ZeroFOX team did was created a tool called SNAP_R that first rates a list of Twitter users based on their likely susceptibility to a spear phishing attack, and then uses a neural network to produce effective spear phishing tweets. If you heard that and immediately thought, oh it’s probably an LSTM RNN then woo hoo, you’re catching on! At least that’s how I felt when I read that that’s exactly what they did. This next paper I love click for info. It’s basically a Twitter sarcasm detector created by researchers at the University of Lisbon in Portugal and UT Austin. It works based on embeddings, a type of word vector, which come up all the time and that I’d like to learn more about, and these embeddings are fed into a CNN model and trained on tweets that are self-identified as sarcastic by their use of the #sarcasm hashtag. The researchers use embeddings in a unique way in this paper, coupled to the different social media users, and as a result are able to outperform another recently published state-of-the-art model for sarcasm detection by over 2%. Subscribe: iTunes / Youtube / Spotify / RSS
I recently reported on the launch of the new NVIDIA TITAN X. At the time it wasn’t in the hands of any users so any thoughts on relative performance were either vendor provided or speculative. Well, a couple of researchers on the MXNet team were among the lucky folks that have their hands on the GPU at this point and they published an initial benchmark this week following the deepmark deep learning benchmarking protocol. In a nutshell they confirmed the speculation. The Pascal Titan X is about 30% faster than the GTX 1080 and its larger memory supports larger batch sizes for models like VGG and ResNet. Relative to the older Maxwell-based Titan X, the new GPU is 40-60% faster. If a single GPU isn’t enough for you, you might be interested in the new prototype announced by Orange Silicon Valley and CocoLink Corp, which they’re calling the “world’s highest density Deep Learning Supercomputer in a box.” The machine loads 20 overclocked GPUs into a single 4U rack unit offering 57,600 cores delivering 100 TeraFLOPS. The team at Orange report that an ImageNet training job that used to take one and a half days with a single NVIDIA K40 GPU can now be done in 3.5 hours using 8 GTX 1080s. The largest they’ve been able to scale a training job to is 16 GPUs, and they’re continuing to work on scaling this to the full 20 GPUs. Also in GPU news, Microsoft announced yesterday that Azure N-Series virtual machines are now available in preview. These VMs use Tesla K80 GPUs and the company claims these offer the fastest computational GPU performance in the public cloud. Moreover, unlike other cloud providers, these VMs expose the GPUs through via Discrete Device Assignment (DDA), resulting in near bare-metal performance. 6, 12 and 24 core flavors are available in the NC series of VM, which is optimized for computational workloads. An NV series that focuses more on visualization is also available, based on the Tesla M60 GPUs. Subscribe: iTunes / Youtube / Spotify / RSS
Each year, computer security conferences host a high tech version of the kids game “capture the flag,” so that teams of hackers and security researchers can demonstrate their hacking prowess. The game requires teams to secure a computer system by identifying intentional and unintentional vulnerabilities in various software modules while launching and defending against threats from competitive teams. This week, DARPA, the Defense Advanced Research Projects Agency, hosted a version of a capture the flag contest where the teams were autonomous bots. The event, held Thursday in Las Vegas as part of the Defcon security contest, was the final competition of the agency’s Cyber Grand Challenge, a $55 million hacking contest designed to spur innovation in the area of autonomous cyber warfare. Seven teams of researchers from across the country fielded bot systems that competed with one another to autonomously identify and patch software vulnerabilities that were planted in their systems by DARPA, while deflecting attacks from competing bots and launching their own attacks against the computer systems those bots were protecting. Team’s bots are scored on their ability to secure their own software and services, ensure their continued availability and take advantage of vulnerabilities in competing team’s systems. From the looks of it, DARPA constructed a pretty elaborate physical environment for the contest, complete with an “air gap” to ensure that each system was acting totally on its own. Announcers followed along with the 96 rounds of action and provided a live play-by-play for onlookers, while referees ensured that each team played by the rules. With each round, DARPA deployed a new set of software for the bots to both defend and attack. I watched segments of the 4+ hour video from the final competition and found it pretty fascinating, but I failed in my brief attempt to find any details on how the bot various bot systems work. Cade Metz’ coverage of the competition for Wired painted an interesting picture of the different strategies each bot pursued in the contest. One bot, Rubeus, built by federal contractor Raytheon, took an aggressive tack, going after vulnerabilities in the other systems from the get go. Yet another bot, Mech.Phish didn’t perform as well overall, but it did have a knack for finding and exploiting complex and subtle bugs in the challenge code. Mayhem, a bot fielded by a team from Carnegie Mellon spin-out ForAllSecure, and the eventual winner of the $2M first prize, seemed rather focused on patching its own systems and keeping them up and running. The bot reportedly used statistical analyses throughout the game to weigh the costs and benefits of patching vulnerabilities (which has inherent risks and demands service downtime), and would only decide to patch those holes that made sense based on this analysis. Cybersecurity is an important and rapidly evolving use case for ML & AI, and there’s been quite a bit of commercial activity in the area in addition to innovation and research activities like the CGC. This week startup Distil Networks closed a $21 million series C funding round to help enterprise customers separate good bots from bad ones, and keep the latter off of their web sites. Note that we’re not talking about chatbots here, but rather the kind of web bots that abuse APIs, scrape web sites, and probe them for vulnerabilities. The company uses machine learning techniques to detect when a bot is trying to cloak its activity by spoofing multiple user accounts, browsers, and locations. And last month, another cyber security startup, Darktrace Ltd. raised a $64 million series C to help enterprises identify and defend against a variety of networked threats. Subscribe: iTunes / Youtube / Spotify / RSS
News broke late last week of Apple’s acquisition of Seattle-based machine learning startup Turi, for a reported $200 million. Actually, I haven’t seen any definitive confirmation of the acquisition at the time of my initial research, but neither have there been any denials. You’ll recall we spoke about Turi just a few weeks ago, in the context of the Data Science Summit the company hosted in San Francisco, shortly after changing its name from Dato due to a legal dispute. The company, which was originally called GraphLab, was one of the first companies I started following in the machine learning platform space, and I’m pretty excited for founders Carlos Guestrin and Danny Bickson. At face value this a great deal for both companies. As we’ve discussed, Apple needs all the help it can get in machine learning and AI, and the company has over $230 billion-with-a-B sitting around in cash, so they can definitely afford it. And from Dato’s perspective, the purchase price is about 4x invested capital, so it’s a solid exit for a team of first time founders from academia in a space in which many of their contemporaries have struggled. But the question remains as to what happens next. This acquisition doesn’t really make sense if Turi is to remain an independent company—Apple needs the help internally fighting the “AI culture war,” and the company hasn’t had much success as an enterprise software player. On the other hand, in Turi CEO Carlos Guestrin, Apple could have a great ML standard bearer. Carlos is not only a business leader and a respected machine learning researcher but also a great teacher, with a popular machine learning course series on Coursera. So it’s likely that, as Techcrunch suggests, Turi discontinues offering its existing products and is reborn as a Apple’s new machine learning and AI development center. As a result, in addition to the Apple and the Turi team, winners in this deal include Seattle, which has been gaining a bit of notoriety as a cloud computing and machine learning hotspot and will also see a new influx of wealth as a result of this deal. Also Turi competitors in the machine learning platform space, folks like H2O, upstart DataRobot, and the French firm Dataiku have one less competitor to worry about and a solid exit to point to as a comparable. Dataiku, for its part, announced an update to its product, Dataiku Data Science Studio (DSS) 3.1 earlier in the week. The update adds a new support for HPE Vertica, H2O Sparkling Water, Spark MLlib, Scikit-Learn and XGBoost from within the DSS visual analysis tool, as well as integration with IBM Netezza, SAP Hana and Google Big Query on the backend. It will be interesting to see how this one plays out and I’ll keep you posted. Subscribe: iTunes / Youtube / Spotify / RSS
Good morning, First off, thanks everyone for your interest in the podcast. If you haven’t listened to the latest show, it’s a bit different than the previous ones. It’s the first in a series of interviews with folks doing interesting things in the machine learning and AI arena. I hope you find it interesting! This week the interview took the place of the regular news show, mostly because I didn’t have time to put the latter together. The news show is a ton of work, with each show taking about 24 hours to produce (down from 30+ when I started), and they can’t, by definition, be done in advance. All that said, I really believe in the format-—creating it was scratching my own itch-—so I’m working on ways to ensure it can continue uninterrupted, even when I’m traveling late in the week (as was the case this week), have other projects to attend to, or my wife gets tired of me dedicating the weekends to it (I'm starting to get that look). A couple of things I’m working on to this end are to (a) find some regular sponsors for the show and (b) find/hire someone or a small team of someones who can help me produce the show. Of course, (a) makes (b) possible, but I’m pursuing both in parallel as of now. You can help by continuing to share the podcast with your friends, review it on iTunes, post it, tweet it, etc. Ok, enough of the “inside baseball.” Here’s a quick rundown of the interesting ML and AI news for the week. Business We saw a few interesting business and product announcements this week: Shopping and travel bot startup Mezi raised $9 million in a series A financing closed this week. Investors in this round include previous investor Nexus Venture Partners and new investors Saama Capital and American Express Ventures. They've also brought on new individual investors Amit Singhal, former SVP and Head of Google Search, and Gokul Rajaram, Product Engineering Lead at Square. B12--like the vitamin I suppose--raised a $12.4 million series A. These guys are not the first guys to talk about applying AI to web site development... see The Grid for an earlier example. Like Mezi, they're also highlighting their use of hybrid AI in delivering their solution. We'll see a lot more of this type of business in the near future: startups taking traditional service-oriented businesses and sprinkling on some AI in the form of tools or automation under the covers--perhaps even just a bit to get started with. Prospera has raised $7 million to commercialize just one of many applications that will apply AI to this data. Prospera is developing a system based on computer vision and deep learning technologies that will determine when, where and how much water to deliver to crops to improve yields while conserving resources. Google introduces ML-based bid automation tools with AdWords Smart Bidding. Smart Bidding takes millions of signals into account to help users determine the best bid for a given ad unit, and it automatically refines conversion performance models to optimize deployment of customers' advertising budgets. Office 365 adds Researcher and Editor, new intelligent services to aid users writing reports and other documents in Word. Researcher is a sidebar that pulls up related articles from encyclopedias and the web based on what the user has written, and Editor is a smarter evolution of Word's spelling and grammar checkers. We've seen research sidebars in Word before and they've never proven useful, so it will be interesting to see how this one performs. Editor, on the other hand, I'd expect to be really useful, and to eventually replace the standard editorial tools in Word over time. Last, but certainly not least, Prisma, the app we talked about last time for bringing research into artistic style transfer with neural networks to the iPhone is now available on Android. I've played with it and it's pretty cool. Research OpenAI is hiring. Elon Musk-founded OpenAI is hiring researchers to work on a few "special projects". They specific research areas are: 1. Detecting if someone is using a covert breakthrough AI system in the world. 2. Building an agent to win online programming competitions. 3. Cyber-security defense. 4. Creating a complex simulation with many long-lived agents. Call me crazy, but as much as Musk says he fears an AI, the research areas here seem to be right out of an apocalyptic AI movie. Neural Network from Matroid leads in Princeton Competition. This is an interesting post describing Matroid's entry into the Princeton ModelNet competition for classifying 3D CAD models. Their application of Convolutional Neural Networks (CNNs) to this problem is interesting, and they've published a paper on their approach on arXiv. If you haven't seen the sample images from the DeepWarp Project around this week, you should check them out. A team of researchers from Skolkovo Institute of Science and Technology in Russia developed a deep learning model for creating photorealistic images from a base image where the eyes are looking in an arbitrary direction. I'd like to dig deeper into this paper at some point. Projects The Charades Data Set is an interesting set of dataset composed of nearly 10,000 videos of daily indoors activities collected by the Allen Institute for AI using Amazon Mechanical Turk. The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos. Language modeling a billion words. An interesting project to create a generative natural language AI using LSTM RNNs trained on the Google Billion Words dataset. An interesting discussion of the techniques used to achieve scale, including the use of multiple GPUs. Bonus: Yann LeCun on Quora Yann LeCun, director of AI research at Facebook and NYU professor, did an AMA over on Quora the other day. Here are some of the responses I found interesting: What are the likely AI advancements in the next 5 to 10 years? - Quora Who is leading in AI research among big players like Google, Facebook, Apple and Microsoft? - Quora What is a plausible path (if any) towards AI becoming a threat to humanity? - Quora What are some recent and potentially upcoming breakthroughs in deep learning? - Quora Sign up for our Newsletter to receive this weekly to your inbox.
A potentially interesting survey crossed the wires this week, and I while I’m bringing it up here, I do so with caveats, because the numbers seem a bit wonky. The survey, titled “Outlook on Artificial Intelligence in the Enterprise 2016” was published by Narrative Science, a “data storytelling” company that uses natural language generation to turn data into narratives. Narrative Science had help from the National Business Research Institute, a survey company that did the data collection for them. The headline of the survey announcement seems to be that 38% of those surveyed are already using AI technologies, while 56% of those that aren’t expect to do so by 2018. But, if that’s the case, then my math says that 73% of respondents’ organizations expect to have AI deployed by 2018, but the official report cites this number as 62%. Also, an infographic published by the same group says that only 24% of organizations surveyed are currently using AI, instead of the 38% quoted in their news release. This discrepancy could be due to the fact that a large percentage of organizations represented by the survey had more than one respondent, but it’s very confusing and I’d certainly expect more from a “data storytelling” company. Unless of course their press release and infographic where totally created by a generative AI, in which case I’m very impressed but also a bit horrified. Of course, the articles reporting on the survey don’t do anything to clear this up, with one of them reporting that 74% of organizations have already adopted AI. In any case, I feel we do need more data about enterprise adoption of AI, so some credible numbers here would be great but for now this ends up being just a cautionary tale about questioning your data. I have tweeted out to the company for clarification, and I’ll share whatever I find out. Subscribe: iTunes / Youtube / Spotify / RSS
Last week, at a Machine Learning meetup at Stanford University, NVIDIA CEO Jen-Hsun Huang unveiled the company’s new flagship GPU, the NVIDIA TITAN X, and gifted the first device off of the assembly line to famed ML Researcher Andrew Ng. The new TITAN X, which holds the same name as the previous version of the device, is based on the company’s new Pascal graphics architecture, which was unveiled back in May. Last night, at a Machine Learning meetup at Stanford University, NVIDIA CEO Jen-Hsun Huang unveiled the company’s new flagship GPU, the NVIDIA TITAN X and gifted the first device off of the assembly line to famed ML Researcher Andrew Ng. The new TITAN X, which holds the same name as the previous version of the device, is based on the company’s new Pascal graphics architecture, which was unveiled back in May. The company is so excited about the card, it’s blog post introducing the card threw around a ton of superlatives and adjectives like Biggest, Ultimate, Irresponsible, Crazy, and Reckless. It also threw a bunch of numbers around, including these: 11 Trillion Floating point ops/sec 32-bit floating point 44 Trillion INT8 ops per second 12B transistors 3,584 CUDA cores running at 1.53 GHz 12 GB of GDDR5X memory with a 480 GB/s bandwidth) The other number it tossed out there was 1,200, which is the price of the card in US dollars. Now, not everyone is excited about this card as NVIDIA. Indeed, for gamers, what NVIDIA’s offering with the TITAN X is a GPU that’s about 25% faster than the company’s standby offering the GTX1080 but at double the cost. But it could be that that’s because the company is targeting deep learning researchers instead of gamers for the TITAN X. (In fact, CEO Jen-Hsun said as much at the product launch.) For people working on deep learning, the specs of the TITAN X should allow it to increase model training performance by 30-60%, which can save a researcher weeks of time and computing costs. The best technical preview I’ve found of the new card, which comes out on August 2nd, is over on AnandTech. Of course I’ll be dropping a link to this article and all the other ones I mention on the show into the show notes, available at twimlai.com.
This week Prisma Labs released their new app, Prisma, that seeks to bring generative AI to the masses. This is an app that allows you to apply filters to your pictures that renders them in the style of famous artists like Van Gogh and Picasso. If this sounds familiar that’s because it should… We discussed the paper that this is based on which is called “A Neural Algorithm of Artistic Style” back in the beginning of June, in the same show we discussed Google’s Magenta project and some other developments in the field of generative artistic AI. The examples I’ve seen of the Prisma app look pretty cool, and I’d love to play with the app personally but they‘ve only got iOS and I’ve only got Android, so we’ve reached an impasse of sorts. There was an interesting Twitter debate about the broader significance of the app, suggesting that it’s perhaps the biggest stand-alone consumer success of deep learning to date. That may be the case, but I think the deep learning based features in Google Photos for example are more impactful to more people. At the end of the day, though, the average consumer has no idea that what Prisma is doing is any more complicated than applying a typical Instagram feature, and I think that’s the beauty of what they’ve built. I expect to see more of that, a lot more, in the future. This week saw several financial transactions in the machine learning and AI space. eBay announced the acquisition of Israel and San Francisco-based SalesPredict, a company that applied machine learning to challenges faced by sales, marketing and support organizations. These guys were a client of mine and I’m super excited for them. According to the company’s former CEO, Yaron Zakai-Or, the Sales Predict team will shift gears a bit to focus on helping give eBay sellers more information about the value of items, to allow them to sell more. eBay has been on a bit of an AI buying spree, acquiring another startup, Expert Maker, back in May. Another machine learning company, Seattle-based Amplero, announced this week an $8 million raise. These guys are focused on a similar space as SalesPredict actually, but with a B2C angle as opposed to B2B. All of the press I’ve read about this company, as well as the company’s own web site, left me intrigued, but also sad and confused. The VentureBeat article for example says the company’s product “uses machine learning to test thousands of marketing permutations constantly, ensuring that the right message is delivered to the right users at the right time, and via the right channel. The solution then chooses the winning permutations, working in concert with your existing marketing technology stack to make the relevant changes on the fly.” What the heck is a marketing permutation? It sounds like they’re talking about testing marketing messaging which we’ve been doing via A/B or Multivariate testing for a long time. Perhaps these guys should win the AI-Washing of the Month Award. (Ok, side note… should that be a thing? I think it’d be kinda fun. But actually there’s no way these guys would beat the beer company we talked about last time.) That said, the CEO, a guy named Olly Downs, has 29 patents in areas like Bayesian Prediction and Statistical Estimation so chances are this is more an issue of confusing marketing than the lack of a technically sophisticated product. Also, UK-based machine learning statup FiveAI raised a $2.7 million seed round to develop an autonomous vehicle platform. Final note in Business, an update on the AI Culture Wars in Silicon Valley which has apparently turned into the AI Publicity Wars in Silicon Valley. Yes, all the powers that be in the Valley are now in an all out race to get their AI story out there. Last week I talked about the exclusive that Microsoft gave to The Verge. Well, the big story this week was about Facebook’s Big Sur, which was covered by The Verge, MIT Technology Review, Mashable, CNET and more. If you wondered what the news was, I can assure you absolutely nothing. Facebook invited some journos up for a press tour of their Prineville, OR data center, and the result was a bunch of stories about the companies massive AI brain. It’s hard to tell if it was Facebook or the press doing the AI washing here, but the former did make it a point to talk about Big Sur, the server they designed with Quanta and nVidia, the design of which was open sourced and contributed to the Open Compute Project back in December of last year.
This week’s show covers the White House’s AI Now workshop, tuning your AI BS meter, research on predatory robots, an AI that writes Python code, plus acquisitions, financing, technology updates and a bunch more. The Big Picture Home :: AI Now Jason Furman’s speech I need an AI BS-Meter — Gab41 Smerity.com: It’s ML, not magic: simple questions you should ask to help reduce AI hype You Can Now Drink Beer Brewed By Artificial Intelligence – Forbes On the importance of democratizing Artificial Intelligence Business Google buys machine learning startup Moodstocks to help your phone’s camera identify objects | VentureBeat | Business | by Chris O’Brien News discovery app SmartNews nabs another $38M, now valued at $500M-$600M | TechCrunch General Catalyst’s Phil Libin invests in 2 more chatbot startups, Growbot and Butter.ai | VentureBeat | Bots | by Ken Yeung Exclusive: Why Microsoft is betting its future on AI | The Verge Research Google’s DeepMind AI to use 1 million NHS eye scans to spot diseases earlier | Ars Technica Artificial Intelligence May Aid in Alzheimer’s Diagnosis – Neuroscience News Application of Machine Learning to Arterial Spin Labeling in Mild Cognitive Impairment and Alzheimer Disease Steering a Predator Robot using a Mixed Frame/Event-Driven Convolutional Neural Network Super-intelligent predator robot is taught to hunt down prey in chilling experiment | Daily Mail Online Technology Release of IPython 5.0 Skype chatbots now work in group chats | VentureBeat | Bots | by Khari Johnson Microsoft’s Project Malmo AI platform goes open source | ZDNet Projects Teaching an AI to write Python code with Python code Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow – WildML Specials Data Science Summit – JULY 12-13 in SAN FRANCISCO / Use code TWIML20 for 20% off registration FREE O’Reilly Early Access Ebook: Mastering Feature Engineering
This week’s show covers the first fatal Tesla autopilot crash, a new EU law that could prohibit machine learning, the AI that shot down a human fighter pilot, the 2016 CVPR conference, 10 hot AI startups, the business implications of machine learning, cool chatbot projects and, if you can believe it, even more. Here are the notes for this week’s podcast: Tesla Autopilot Crash A Tragic Loss | Tesla Motors Ex-Navy SEAL becomes first to die in self-driving car after Tesla crash | Daily Mail Online Tesla’s ‘Autopilot’ Flew Under Regulators’ Oversight – WSJ The technology behind the Tesla crash, explained – The Washington Post EU Legislation Impacts Machine Learning Use EU regulations on algorithmic decision-making and a “right to explanation” Artificial Intelligence Has a ‘Sea of Dudes’ Problem – Bloomberg Why We Should Expect Algorithms to Be Biased To study possibly racist algorithms, professors have to sue the US | Ars Technica Business The Most Well-Funded Startups Developing Core Artificial Intelligence Tech Doodle acquires chatbot Meekan to integrate its A.I. scheduling assistant | VentureBeat | Bots | by Chris O’Brien Meet Articoolo, the robot writer with content for brains | TechCrunch The Business Implications of Machine Learning — Medium How Amazon Triggered a Robot Arms Race – Bloomberg IEEE Computer Vision & Pattern Recognition Conference CVPR 2016 CVPR 2016 Open Access Repository Zeeshan Zia’s answer to What are the most interesting CVPR 2016 papers and why? – Quora All Your Questions Answered — CVPR Day 1 — Gab41 Jordi Pont-Tuset’s site – CVPR 2016: Deep learning takes over again? AI Fighter Pilot Beats Human Expert AI bests Air Force combat tactics experts in simulated dogfights | Ars Technica Genetic Fuzzy based Artificial Intelligence for Unmanned Combat Aerial Vehicle Control in Simulated Air Combat Missions Projects & Hands-On IBM Watson A.I. XPRIZE Changelog – Messenger Platform A Natural Language User Interface is just a User Interface — The Startup — Medium Build a Chatbot w/ an API – ML for Hackers #9 – YouTube Is that a Time Machine? Some Design Patterns for Real World Machine L… Data Science Summit Data Science Summit Use code TWIML20 for a 20% discount on registration! Image: Tesla Motors
This week’s show covers the International Conference on Machine Learning (ICML 2016), “dueling architectures” for reinforcement learning, AI safety goals for robots, plus top AI business deals, tech announcement, projects and more. ICML 2016 –Accepted Papers | ICML New York City – Which companies had accepted papers at #icml2016 ? Best Paper Awards – [1511.06581] Dueling Network Architectures for Deep Reinforcement Learning – [1601.06759] Pixel Recurrent Neural Networks – [1602.07415] Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling – My winner in the best name category: Extended and Unscented Kitchen Sinks – Demystifying Deep Reinforcement Learning Research Google Research Blog: Bringing Precision to the AI Safety Discussion OpenAI Blog: Concrete AI safety problems Paper: 1606.06565.pdf OpenAI technical goals Artificial intelligence achieves near-human performance in diagnosing breast cancer — ScienceDaily Paper: 1606.05718.pdf Business Twitter pays up to $150M for Magic Pony Technology, which uses neural networks to improve images | TechCrunch Increasing our Investment in Machine Learning | Twitter Blogs Artificial Intelligence Explodes: New Deal Activity Record For AI DARPA is looking to make huge strides in machine learning | PCWorld Data-Driven Discovery of Models (D3M) – Federal Business Opportunities: Opportunities AI Culture Wars in Silicon Valley How Siri Started — and Lost — the Assistant Race How Google is Remaking Itself as a “Machine Learning First” Company — Backchannel AI, Apple and Google Technology Lighting the way to deep machine learning | Engineering Blog | Facebook Code Intel Launches ‘Knights Landing’ Phi Family for HPC, Machine Learning The Toronto Raptors Are Using IBM’s Watson to Draft A Winning Team | Motherboard Projects Hello, TensorFlow! How to read: Character level deep learning GitXiv: Collaborative Open Computer Science Machine Learning Yearning Mastering Feature Engineering – O’Reilly Media Bonus I didn’t have time to cover: The Stanford Question Answering Dataset
This week’s show looks at Facebooks’ new DeepText engine, creating art with deep learning and Google Magenta, how to build artificial assistants and bots, and applying economics to machine learning models. Here are the notes for this week’s show: DeepText: Facebook’s Text Understanding Engine Introducting DeepText: Facebook’s Text Understanding Engine FBLearner Flow Research: Text Understanding from Scratch Natural Language Processing (almost) from Scratch Machine Learning and Art Google Magenta Neural Art A Neural Algorithm of Artistic Style Neural Art in TensorFlow Autoencoding Blade Runner Courses: NYU’s Machine Learning for Artists Goldsmith’s University of London The Latest TensorFlow Paper TensorFlow: A system for large-scale machine learning Business of ML & AI Microsoft Confirms Microsoft Ventures VC Arm Intel Acquires Computer Vision for IOT, Automotive Lumiata Closes $10 Million Series B Financing with Intel Capital Findo raises $3M to help you find files and documents through natural language queries More Bots, and How to Build Artificial Assistants Motion AI lets anyone easily build a bot Sequel lets you create a ‘Me’ bot, beats Google to the punch Hybrid Intelligence: How Artificial Assistants Work The Economics of Machine Learning models The preoccupation with test error in applied machine learning Towards Cost-Optimized Artificial Intelligence More Cool Deep Learning posts Deep Reinforcement Learning: Pong from Pixels A Survey of Deep Learning Techniques Applied to Trading Just for Fun Building an IoT Magic Mirror Magic Mirror on GitHub Image Credit: Microsoft