Machine Learning Identifies Autism Genes, Weaponized AI for Phishing, and a Twitter Sarcasm Detector

150 150 This Week in Machine Learning & AI

This post is an excerpt from the August 5, 2016 edition of the This Week in Machine Learning & AI podcast. You can listen or subscribe to the podcast below.

We’ve talked fairly extensively about the use of Deep Learning in medicine in previous shows. Breast cancer and eye disease were a couple of the use cases we discussed, with both of these sharing the common feature that they’re based on image analysis. Well this week a team of researchers from Princeton University published a paper outlining their work applying machine learning to the challenge of identifying genetic causes of autism.

The genetic causes for autism, or autism spectrum disorder, have been difficult for researchers to track down. The autism research community has identified 65 genes associated with autism risk so far, mostly through sequencing, but it’s believed that those are but a fraction of the 400-1,000 genes likely to be involved in the disease.

To try to identify the additional genetic actors in autism susceptibility, the Princeton team used what they call a brain-specific functional interaction network, which was developed in previous research. This brain-specific network is a functional map of the brain, expressed as a probabilistic graph of how genes function together in pathways in the brain.

They then used machine learning to train a classifier based on the connectivity patterns of the known ASD genes in the brain-specific network, and then uses this classifier to predict the level of potential ASD association for every gene in the genome. Specifically, they used an SVM classifier, and used the connectivity of the known ASD genes to the other genes in the brain-specific network as its features. I’m somewhat trivializing the ideas around the brain-specific network and how it translates into features, mostly because I don’t really understand it. But this is a great example and reminder that most of the magic in ML is in the feature engineering.

Based on their method, the team was able to identify a number of candidate genes with no prior genetic evidence of ASD association, and has since gone on to validate many of these candidate genes through sequencing. Their results can thus be used as the basis for further analysis into the genetic causes of autism.

Super interesting stuff. Check it out if you’ve got a background or interest in the medical applications of ML.

A couple of other interesting research papers caught my eye this week:

  • Researchers from security research firm ZeroFOX published a paper “Weaponizing data science for social engineering: Automated E2E spear phishing on Twitter.” Spear phishing, if you haven’t heart the term is like phishing, but is targeted at a particular user. You’re typically trying to get a user to click a link that will trick them into giving up some credentials. What the ZeroFOX team did was created a tool called SNAP_R that first rates a list of Twitter users based on their likely susceptibility to a spear phishing attack, and then uses a neural network to produce effective spear phishing tweets. If you heard that and immediately thought, oh it’s probably an LSTM RNN then woo hoo, you’re catching on! At least that’s how I felt when I read that that’s exactly what they did.
  • This next paper I love click for info. It’s basically a Twitter sarcasm detector created by researchers at the University of Lisbon in Portugal and UT Austin. It works based on embeddings, a type of word vector, which come up all the time and that I’d like to learn more about, and these embeddings are fed into a CNN model and trained on tweets that are self-identified as sarcastic by their use of the #sarcasm hashtag. The researchers use embeddings in a unique way in this paper, coupled to the different social media users, and as a result are able to outperform another recently published state-of-the-art model for sarcasm detection by over 2%.

Leave a Reply

Your email address will not be published.