In this episode i’m joined by John Bohannan, Director of Science at AI startup Primer.
Subscribe: iTunes / Google Play / Spotify / RSS
As you all may know, a few weeks ago we released my interview with Google legend Jeff Dean, which, by the way, you should definitely check if you haven’t already. Anyway, in that interview, Jeff mentions the recent explosion of machine learning papers on arXiv, which I responded to jokingly by asking whether Google had already developed the AI system to help them summarize and track all of them. While Jeff didn’t have anything specific to offer, a listener reached out and let me know that John was in fact already working on this problem. In our conversation, John and I discuss his work on Primer Science, a tool that harvests content uploaded to arxiv, sorts it into natural topics using unsupervised learning, then gives relevant summaries of the activity happening in different innovation areas. We spend a good amount of time on the inner workings of Primer Science, including their data pipeline and some of the tools they use, how they determine “ground truth” for training their models, and the use of heuristics to supplement NLP in their processing.
Tomorrow, 5/7, I’m keynoting at the Prepare AI event here in Saint Louis and then making my way out to San Francisco for Figure Eight’s Train AI conference. The event agenda looks great, and I’ll be on-site all day podcasting, so if you’re in the Bay Area you should definitely plan to stop by. Of course if you do, use the discount code TWIMLAI for 30% off of registration. Be sure to give me a shout if you’re planning to be around!
Mentioned in the Interview
- LexisNexis Newsdesk
- Paul Ginsparg
- Systems and Software for Machine Learning at Scale with Jeff Dean
- Reproducibility and the Philosophy of Data with Clare Gollnick
- Check out @ShirinGlander’s Great TWiML Sketches!
- TWiML Presents: Series page
- TWiML Events Page
- TWiML Meetup
- TWiML Newsletter
“More On That Later” by Lee Rosevere licensed under CC By 4.0