Google’s Wide & Deep Learning Models

One of the papers I’ve been meaning to look into is the Wide and Deep Learning paper published by Google Research a couple of weeks ago. It turns out that the paper is both short and very much on the applied side of the spectrum, so it’s relatively easy reading. There’s also a lot of supporting material, between the Google Research blog, the TensorFlow docs and the video they created, though I found that reading the paper helped me understand the video, as opposed to the other way around!

The background here is that a team from Google Research developed a recommender model that combines the best aspects of logistic regression and neural nets and found than it outperformed either approach individually by a small but significant percentage.

The basic idea is that linear models are easy to use, easy to scale and easy to understand. They’re also pretty good at “memorizing” the relationships between individual features when you use some simple feature engineering to capture the relationship between individual features. This feature engineering, which is very commonly used, results in a lot of derived features and so the linear models that uses it is called “wide” learning in this paper.

What the linear models aren’t really good at are “generalizing” across different features because they can’t really see those relationships unless you feed in a set of higher order derived features that capture this, and doing so is labor intensive.

This is where neural nets, or so called “deep” models, come into play. They are better at generalizing and rooting out unexpected feature combinations that have predictive value. But they’re also prone to over-generalization and don’t do a good job at “memorizing” specific feature combinations that are infrequently seen in the training data.

So this paper proposes a jointly trained model that combines both wide and deep learning. By jointly trained we mean that this isn’t an ensemble model, where we train a linear model and a neural net separately and then choose the best prediction among the two. That doesn’t help us here because for ensemble to work, we need both models to be independently accurate. That would mean we would need to do all the feature engineering we’re trying to avoid for the linear model. Rather, by training the wide and deep models together, they can each do what they’re best at while keeping the overall model complexity low.

It’s actually pretty surprising how much system-level implementation detail this paper packs into 4 pages. I was left feeling like I have a pretty good understanding of how the recommendation system for the Google Play store was designed so as to make recommendations against a 1 million item app catalog using over 500 billion training examples to serve each request in about 10 ms under a peak load of 10 million app scoring requests per second.

In addition to publishing the paper, Google also open sourced their TensorFlow implementation of the model with a high level API for Wide & Deep models called a “DNN Linear Combined Classifier”.

Alright, I hope you enjoyed learning about this paper as much as I enjoyed reading it.

Before we jump over to Projects, a few quick notes:

In recent weeks we’ve talked about the ICML and CVPR conferences. This week Leo Tam posted a blog calling out his impressions from both and his top 10 posts from each. Check it out for a concise look into what you missed at these conferences.
Next, this week was the IJCAI conference, the International Joint Conference on AI. I haven’t seen much by way of summaries or highlight posts so I don’t have much to say about it, but if you see anything good send it my way to share.
Finally, if you’re looking for a contextualized view into a bunch of interesting and important research papers including bot, and how they all fit together, you’ll like Xavier Amatriain’s presentation from last week’s Data Science Summit. The focus of the talk is providing the audience with a reminder of all the problems for which traditional ML is still state of the art relative to the new hotness deep learning, and he cites the relevant papers for each area. The slides are up on SlideShare and are highly recommended.