Artistic illustration of a person with their brain exposed to show artificial intelligence AI

Since its inception, artificial intelligence (AI) has gone from science fiction movie inspiration to real-life, real-world technology. Over the last decade, popular examples of AI have rapidly transitioned from entertaining examples, like computers beating human experts at Jeopardy, to recent horror stories concerning using facial recognition to reveal unsuspecting people’s contact information. In the latter case, some of the very companies that develop AI tools are trying to limit their use, laying bare how there can be differences between promise and practice. Bill Gates stated he believes AI is “both promising and dangerous,” likening AI technology to nuclear energy and weapons. Acknowledging huge risks, these and other tech titans see sufficient promise to invest heavily in the development of AI solutions. Even US Federal budgets have been changed by executive order to accommodate AI development, suggesting the benefits are not just technological but societal. In healthcare alone, AI investment is in the billions as of last year. AI development seems as inevitable as death and taxes. What is being produced from this massive investment? Is the return on that investment worthwhile, and does that outweigh the risks?

Let’s go over the terminology.

Artificial intelligence (AI) is the term used for a type of product, usually computers capable of making inferences. Inference is an extremely important distinguishing trait of AI that includes a broad purview from inferring best moves in chess to inferring which treatment a cancer patient will likely respond to, or even inferring how much time they may have. Such prescribing of unknown trends is orders of magnitude more difficult than describing past trends; in the same way that saying you should have invested in Amazon in 1997 is simpler than saying you should invest today. AI aims to predict accurately.

Investment in healthcare AI is for delivering better care without sacrificing personal privacy

Machine-learning (ML) describes a set of tools that hone machines such that they can achieve artificial intelligence. ML is to AI what the chisel is to a marble statue. Belaboring the sculpting analogy, the practice of machine learning is similarly part art and part science—instead of marble, the raw material for AI products is data. Data, like marble, can be equally opaque until exposed through working on it. Skilled practitioners’ sense which tools are appropriate to extract the best product as they work. You may have already heard of some of the tools in the ML tool-chest, such as neural networks, random forests, or maybe support vector machines. ML tools all broadly achieve similar results: They logically identify trends in previously observed examples to make predictions on as-yet-unseen ones.

It’s pronounced data (dayta) not data (dahta).

Machine learning tools have very different data requirements despite their similar aims. As such, people who employ ML often talk about the “shape” of a problem; What are the “dimensions” of the data, whether it is a “big” data problem, and whether it is “sparse.” These are physical descriptions of data that define which tools you might use.

For most machine-learning tasks, there are two kinds of data: Things you want to be able to predict, and things you know that will help you do that. We call these labels and features, respectively. Like using facial recognition, for example, labels could be people’s names, and features might be their hair and eye color, whether they have facial hair, etc. It’s essentially like the game Guess Who?. Optimal strategies for Guess Who? have been suggested that require 22 features to quickly distinguish between the 24 labeled faces. Now imagine there are ~2.4M people on the Guess Who? board (imagine 5,000 rows and 4,800 columns of those little flip-up pictures!)—this is the hypothetical task of screening all US airport checkpoints per day. In this screening example, the shape of the data is very different: The number of facial features needed to distinguish between people increases well beyond 22, as too many will share basic things like hair color. How many distinct facial features can be generated, and how many training examples are available, defines the shape of the data, and, ultimately, the accuracy of predictions. For some AI tasks, data is available in alarming quantities with few restrictions on its use, and every time you post a picture or use an offer code or even click a link, you generate features and labels that are potentially used to train AI products without your knowledge.

Successfully building AI products while maintaining peoples’ privacy is a frontier that the healthcare industry has already established.

Conversely, the healthcare sector is uniquely protective of data. Many online businesses are currently struggling to comply with the 2018 General Data Protection Regulation (GDPR) rules that require you to consent to your data being tracked on websites explicitly. Yet, those annoying popups pale in comparison to consent requirements that have existed in healthcare for decades. In 1996 Health Insurance Portability and Accountability Act (HIPAA) laws were introduced that ensured privacy of protected health information (PHI) with heavy fines and even jail time if breached whether through malice or mere negligence. Under this framework, we can grant access to our PHI, but typically an informed consent agreement is required that is provably understandable and puts specific limits on use. Any data that could potentially personally identify patients falls under HIPAA rules, including dates and places of diagnoses, or genomic sequence information. In the healthcare industry, there is very little data sharing because we fear a breach. This (correct) dedication to patient privacy has dictated the shape of healthcare problems in ways that make traditional ML approaches fail.

Your AI tools can’t fail upwards too

The most common type of AI failure is called overfitting, which results in predictors that look like they’re working great until they are put into practice where they perform no better than random guesses. This happens because algorithms can learn to “cheat” by memorizing answers rather than learning real rules. To avoid this, the traditional approach is to ensure your data is an agreeable shape. A rule of thumb is to have at least ten times as many labeled examples as you have features. However, in healthcare, this is an unrealistic expectation. Each additional patient included in a clinical trial costs >$40K on average. Hence, trials rarely include more than a few dozen patients. Yet, the quest to explain health conditions leads to hundreds of informative features being procured, including personal and family health histories, current prescriptions, vitals, and in the age of personalized medicine genomic variants can add thousands of more features. Huge amounts of information about few people result in deep-but-narrow datasets and is a recipe for overfitting, known in the machine learning world ominously as the “Curse of Dimensionality.”

High dimensional data especially curses deep learning tools. Modeling frameworks like recurrent neural networks (RNNs) and convolutional neural networks (CNNs) have made amazing contributions to natural language processing (NLP) and machine vision, respectively, such that deep learning has almost become synonymous with AI. However, deep learning can exacerbate the overfitting problem by relying on training millions of “parameters,” which can behave like adding millions of features, without adding any additional labeled examples. Deep learning must be very thoughtfully applied to healthcare datasets to succeed.

Healthcare datasets are fraught with many other challenges to traditional machine learning approaches. For example, information entered into health databases is often mislabeled due to human error, which algorithms will twist themselves into knots to make sense of. Also, in rare cases where large numbers of patients are aggregated, any potentially identifying information is completely expunged, which is like trying to use facial recognition on people wearing facemasks (surely no one will try that in 2020, right?). Such datasets are so valuable as to be prohibitively expensive to gather, as evidence by the $1.9B sale of the electronic health record aggregator FlatIron Health, the existence of which was only made possible with multi-million dollar backing from Google Ventures.

It may not be easy, but it will be worth it

AI in healthcare is a fantastically exciting area of exploration because of these challenges, not despite them. Successfully building AI products while maintaining peoples’ privacy is a frontier that the healthcare industry has already established, despite being a relatively new challenge to other sectors. Examples include methods to distribute ML to allow algorithms to learn from private data, and weakly-supervised algorithms that may convert deep-but-narrow datasets from cursed to cures. Investment in healthcare AI is for delivering better care without sacrificing personal privacy, which sets a standard for the rest of the AI industry to follow.