Kamran Khan, founder of BlueDot, recently found his company the subject of attention for being among the first to publicly warn about the coronavirus (COVID-19) that initially appeared in the Chinese city of Wuhan. How did the company's system of data gathering techniques and algorithms help flag the potential dangers of the disease? In this interview, Kamran shares how they use a variety of machine learning techniques to track, analyze and predict infectious disease outbreaks.
As a practicing physician based in Toronto, Kamran was directly impacted by the SARS outbreak in 2003. "We saw our hospitals completely overwhelmed. They went into lockdown. All elective procedures were canceled...even the city took on a different feel...there were billions of financial losses...and Toronto was just one of dozens." In the wake of that crisis, governments have been slow to act. Efforts like the International Health Regulations Treaty (2005), which aims to standardize communication about diseases, help but are not well enforced. It doesn't help that these nations are often unaware of the severity of an outbreak, or are hesitant to report a threat because of potential economic consequences.
Ultimately, his experience with the SARS crisis led Kamran to explore the role technology might play in anticipating outbreaks and predicting how they might spread. Kamran's insight ultimately lead to the creation of BlueDot, which applies machine learning to four main challenges in infectious disease tracking: Surveillance, Dispersion, Impact, and Communication.
The BlueDot engine gathers data on over 150 diseases and syndromes around the world, looking at over 100,000 online articles each day spanning 65 languages, searching every 15 minutes, 24 hours a day. This includes official data from organizations like the Center for Disease Control or the World Health Organization, but also counts on less structured, local information from journalists and healthcare workers.
BlueDot's epidemiologists and physicians manually classified the data and developed a taxonomy so relevant keywords could be scanned efficiently. They later applied ML and NLP to train the system. Kamran points out that the algorithms in place perform "relatively low-complexity tasks, but they're incredibly high volume and there's an enormous amount of them, so we can simply train a machine to replicate our judgment [for classifying]".
As a result of their system's algorithms, only a handful of cases are flagged for human experts to analyze. In the case of COVOID-19, the system highlighted articles in Chinese that reported 27 pneumonia cases associated with a market that had seafood and live animals in Wuhan.
Recognizing the role that travel plays in disease dispersion—especially in the age of air travel—BlueDot uses geographic information system (GIS) data and flight ticket sales to create a dispersion graph for each disease based on the airports connected to a city and where passengers are likely to fly. Not everyone travels by air, so they also use anonymized location data from 400 million mobile devices to track flows from outbreak epicenters to other parts of the region or world. The locations receiving the highest volume of travelers are identified and diligently evaluated for what the impact of the disease could be in the area.
For COVOID-19, BlueDot applied this methodology to identify many of the cities among the first to receive the coronavirus, including Tokyo, Bangkok, Hong Kong, Seoul, and Taipei.
Once a virus leaves its region of origin, a wide variety of factors determine whether it will ultimately die out or grow into a full-fledged outbreak: A region may have better or worse public health infrastructure, hospitable or inhospitable climates, or varying economic resources. BlueData's systems consider factors such as these to predict the potential impact on an identified area.
For example, if a virus is being spread by ticks, and Vancouver is in the middle of winter snow, the likelihood of an outbreak is very low because ticks would not survive that climate. However, the same virus might thrive in a humid environment like Florida, making the region at-risk for an outbreak.
If an area is determined to be at-risk, the focus shifts to providing early warnings to health officials, hospitals, airlines, and government agencies in public health, national defense, national security and even agriculture. Kamran reiterates the importance of providing only the most relevant information to those who need it, referencing the ideas Clay Shirky and his 2008 talk], "It's Not Information Overload. It's Filter Failure.
BlueDot first became aware of the pneumonia cases in Wuhan on December 31st, and in addition to notifying their clients and government stakeholders directly, they publicly released their findings in the Journal of Travel Medicine on January 14th.
Criticism and Limitations
These are incredibly difficult predictions to make, and the science behind the transmission of infectious diseases is complex and evolving every day. So, what is the proper role of technology? Kamran asserts that "by no means would [they] claim that AI has got this problem solved. It's just one of the tools in the toolbox."
In some cases, Kamran and his team may lack sufficient observations to develop a machine learning model for a particular disease. For this and other reasons, the company relies on a combination of approaches and a diverse team of specialists in their work.
With coronavirus already in full swing, BlueDot is looking more heavily at analyzing location data from mobile devices to provide a real-time understanding of how people are moving around. However, Kamran compares this to predicting the weather—the further ahead you're looking, the less accurate your prediction.
Despite the limitations, Kamran reinforces the value of the work by acknowledging that "Manually, it would take a hundred people around the clock [to process the data], and we have four people and a machine."