Will it replace radiologists? The answer is no, but I do think that radiologists who use AI will replace radiologists who don’t.
Located in the Silicon Valley, Stanford Radiology has an enviable vantage point from which to advance deep learning in radiology. Curtis Langlotz, MD, PhD, director of Stanford’s new Artificial Intelligence in Medicine & Imaging (AIMI) laboratory, was one of three academic radiologists to share their institution’s approach to AI during a standing-room-only How I Do It session at the November 2017 meeting of the Radiological Society of North America (RSNA),
Langlotz—who shared the podium with Luciano Prevedello, MD, MPH, and Bradley Erickson, MD, PhD—discussed why Stanford Radiology is investing in artificial intelligence, the tools it is using to build its program, and where it expects to apply them.
“Diagnostic errors play a role in up to 10% of patient deaths in the US,” Langlotz said, citing a report from the National Academy of Medicine. “A recent survey from the Institute for Health Care Improvement shows that 21% of adults report that they have personally experienced a medical error. A lot of research shows that the rate of clinically significant error in radiology interpretation is about 4%. Multiply that by the number of studies that we do in the U.S., and you can see that there is room for improvement.”
Fei Fei Li, PhD, creator of ImageNet—the image recognition challenge cited by Luciano Prevedello, MD—and a professor of computer science at Stanford, is a faculty member of the AIMI lab. Machine learning algorithms developed for ImageNet were pivotal developments that enabled the computer to exceed human performance in image recognition challenge.
As it takes steps to apply artificial intelligence resources to medicine, Stanford is aggregating all of its data in a school-wide initiative called Data Science Enterprise Resource to bring together imaging, omics, bio-bank, tissue bank, and EHR data. “We also have developed a pipeline using an honest broker to bring in all of our clinical images, de-identify them, and unify them with other data on those same patients, to be deployed in the cloud and on the premises,” Langlotz said. “Then we will have a system of applications that can find cohorts of data for bench science researchers and their students to work with.”
Stanford, in fact, was the institution that provided the dataset that the RSNA made available for its recent Bone Age Machine Learning Challenge.
Tools, Challenges
For storage, Stanford has a business associate agreement with Google Cloud enabling the lab to put patient data in the cloud. For computing, the AIMI has dedicated nodes on Stanford’s High Performance Computing Cluster.
“There is another model that some use, which is one dataset, one developer, one machine,” Langlotz said. “We frown on that because we think it makes it more difficult to do the last mile of clinical deployment once the algorithms are built.”
In addition to the same open source software libraries described by Prevedello, AIMI often uses the Stanford Natural Language Processing toolkit from the lab of Christopher Manning, PhD, at Stanford to extract information from text. Manning is Stanford’s Thomas Siebel Professor in Machine Learning and a linguistics and computer science professor.
The challenges are not small, and proper labeling of images produced at the time of clinical care is a major issue, Langlotz said: “We spend more time thinking about that than what kind of deep learning neural networks we apply to these problems.”
Langlotz noted that many of the neural networks developed on ImageNet were developed for two dimensional color photographs, not the three- and four-dimensional, multi-channel, multi-modality images produced by radiology.
Efficient convolutional and other neural networks that can train using image data of high complexity will require either human partners or human-augmented expertise. “Those humans will need to understand how the model came to its conclusion, so the model will have to explain or illustrate how it came to its conclusion,” Langlotz said. “Saliency maps that show the pixels that actually led to its conclusion will be very important.”
New Methods
Langlotz sees the combined clinical and research resources as a strength, enabling the lab to evaluate research in place by showing the direct effect the neural networks have on clinical care.
Rather than take the labor-intensive approach of using radiologists to label a dataset that can be used to train a neural network for a specific task, Stanford is devising ways to automate report classification so that larger datasets can be used to train the algorithms.
This gets to the notion of weak supervision, Langlotz said: “When you automate this notion of labeling and extracting content from the report, it’s not perfect, it’s not the same as having a human expert radiologist look at the image and label it. We look at it as a pipeline.”
For example, Langlotz said it is possible to use highly trained humans (radiologists) to create high-quality labels for 1,000 brain CTs that show brain hemorrhage and then feed them into a neural network, training it to recognize a brain hemorrhage with 85% accuracy.
The other approach is to take a dataset of 1,000 brain CTs and develop a smaller data set based on a “weak” labeling algorithm. “Feeding those weakly labeled cases into the same neural network, accuracy is not going to be as high because the labels are not as good,” Langlotz explained. “But, because the labeling algorithm is automated, we don’t have to be limited to 1,000 cases. We can try 100,000 cases and get more clinical variety in the dataset that we use for training, which can often offset the deficit we see because of the weak labeling. We think that is a powerful approach.”
Stanford uses Nuance mPower software to enable a Google-type search of a radiology report database—for instance, all images that show tension pneumothorax in the impression of the report—and an algorithm that can identify concepts in a report, including anatomy (chest, abdomen, pelvis) observations (infiltration, effusions), and uncertainty, enabling the extraction of content from the radiology report.
Report Classification Using Deep Learning. Using the open-source software Snorkel, Roger Goldman, a radiology resident with AIMI, built a rules-based algorithm to identify patients with brain hemorrhage. One such query might identify all reports that contain the word hemorrhage within five words of brain.
Report Classification and Validation Using Vector Technology. A different open-source software, a vector technology called Glove, allows the creation of vectors to which deep learning algorithms can be applied to find reformations of free text. “We can actually use a convolutional neural network to classify free text reports,” he said. “Again, not perfect, but it’s automated, so you have the ability to rapidly annotate large numbers of cases, and it’s very powerful.”
Langlotz shared a document produced by the algorithm that he compared to a saliency map. It highlighted the words in the report that led to the conclusion that a report may include a pulmonary embolism: “In this case, it was very accurate.”
Visual Genome Using Semantic Network. Li, developer of ImageNet, is working on Visual Genome, which is an attempt to build a semantic network behind an image. Langlotz shared an image of a girl feeding an elephant, with a man behind her taking a picture that was automatically linked to concepts via bounding boxes.
“Can we apply this same technology to medical images?” Langlotz asked. “Our semantic networks might be simpler, particularly if we are focused on observations. Where you have anatomy, you have an observations, you have some uncertainty, you have some modifiers and you can begin to build semantic networks that can be useful for summarization, for labeling and a whole host of things.”
Deep Learning Models
Langlotz also shared two deep learning models that have been built by members of the Stanford lab, one for detecting bone age from hand x-rays and another for detecting pneumonia on a chest x-ray.
Bone Age Model. The dataset for RSNA’s Bone Age Challenge was contributed by Stanford, built for a model described in an article published in Radiology by a team led by David Larson, MD, PhD. A hand x-ray is used by pediatricians to estimate the physiologic age of a child compared to the chronologic age and identify those who may have developmental delay. The state of the art in estimating bone age is a book, or set of books, so Stanford built an algorithm.
“The algorithm was a little bit better than two of our reviewers and about the same as two others,” Langlotz said. The winners of the RSNA challenge actually exceeded Stanford’s results. “We think this is tremendous,” Langlotz said. “That’s another model: These challenges, could be a great way to build models that extract the last bit of information out of the data in a training set.”
CheXNet Pneumonia Model. Developed by a team led by Matt Lungren, MD, MPH, and based on 100,000 chest x-rays and a model to detect pneumonia recently shared by the National Institutes of Health (NIH), Stanford developed a neural network with an even higher accuracy than the one published by NIH. “What we learned is that, again, the labels are key,” he said. “There is a lot of inter-observer variability in how these labels get created.”
The model attracted coverage in the Wall Street Journal and has been the topic of robust commentary online.
Image Quality Improvement Using Deep Learning. Under the direction of neuroradiologist Greg Zaharchuk, MD, PhD, researchers used a four-level, encoder/decoder convolutional neural network trained by a dataset that included noisy images combined with T-2 weighted images and proton density images with bypass layers to produce lower-noise images of clinical quality. “It comes up with a synthetic image, which is a lower noise image, and you can imagine the applications here,” he said. “Shorter MR sequences, shorter CTs to lower radiation perhaps, less need for anesthesia sedation in pediatric applications.”
Other opportunities Langlotz envisions include:
Image quality control. Algorithms could also be embedded in the scanner to alert the technologist if images are suboptimal.
Triage. “We are working on this,” Langlotz said. “The chest radiologists at Stanford get 100 studies in their queue every morning and some of them have tubes out of place, pneumothoraces. It would be nice to have those that are most likely to have studies with abnormalities at the top of the list.”
Computer-aided detection. “We have been using ‘old fashioned’ AI[CPL1] technology for many years,” Langlotz said. “I think we can develop those much more quickly with machine learning and certainly there will be screening applications in other areas.
Computer-aided Classification. For those times when a radiologist is out of his or her comfort zone, Langlotz envisions having the ability to circle a finding and get a differential diagnosis or a similar study that was dictated recently by a clinical expert for reference.
“If you could build detectors and classifiers for the common abnormalities on a given modality like a chest x-ray, and have some text generation associated with that, now you have built a kind of e-resident,” Langlotz says, referring to the radiologist trainees that enhance productivity at academic institutions. “It would be nice for everyone to have some of that efficiency in their practice.”
Acknowledging that artificial intelligence is at the peak of inflated expectations on the Gartner Hype Cycle, he said: “We will end up down here in the trough of disillusionment in a little while.”
Langlotz shared a vintage photograph of William Morton, MD, the neurologist who popularized the use of x-rays in the U.S. shortly after Roentgen’s discovery, sitting next to engineer Edwin Hammer, and reminded the audience that radiology always has been a high-tech specialty.
When MR was introduced, the images were so crisp and clear that many believed radiologists would be supplanted. In practice, radiologists learned the physics of MR and how to recognize artifacts to distinguish between something real versus artefactual. Langlotz predicted the same pathway for deep learning: Radiologists will learn how to employ it clinically, when it is worth using, and when it should not be used.
“Will it replace radiologists?” he asked. “The answer is no, but I do think that radiologists who use AI will replace radiologists who don’t.”
—Cheryl Proval