Cancer Radiologist Shares On-the-Job Machine Learning Lessons Learned

For clinical transformation to occur, we need robust science to really convince the community that these techniques will work for our patients. It is a patient life at the end of the algorithm, so we need to be sure.

Andrea G. Rockall, MRCP, FRCR
Radiologist Consultant, Imperial College Healthcare, NHS
December 19, 2018

Andrea G. Rockall, MRCP, FRCR, had never heard of machine learning when she arrived at Imperial College of London in 2012. A working radiologist and researcher with interests in genitourinary cancer, image-based trials, and functional imaging and response assessment, she was asked to develp several machine learning trials.

In her highly practical segment of the 1.5 hour refresher course, Rockall shared lessons learned on the crucible of her experience developing clinical radiology studies with the machine learning community during “Deep Learning—An Imaging Roadmap,” at the RSNA on November 27th, 2018, in Chicago.

Before she began, Rockall identified some pressing global problems facing radiology that machine learning has the potential to help address. For instance, half the advertised posts for radiologists in the United Kingdom have failed to appoint for over a year. In under-resourced countries, the radiologist shortage is acute: Ghana has approxiamtely 20 radiologists for 27 million people and Zambia has approxiamtey six radiologists for 16 million people.

At the same time, scan complexity is increasing—Rockall pointed to the thousands of images contained in one whole-body MRI, a study of great relevance to her area of interest. “There are too few radiologists to really analyze and offer our patients the benefit of this increased complexity,” she said. “The increased complexity means we are looking at more things—we have a lot to offer—but it is difficult to translate that into clinical practice with so few of us out there to do all this complex work.”

The potential for machine learning to optimize patient care by assisting radiologists to perform laborious or repetitive tasks—volume size, lesion size, RPV, RECIST measurements—and then integrate that information with diverse patient data to inform treatment decisions is what tantalizes Rockall, but she is realistic about the challenges. “For clinical transformation to occur, we need robust science to really convince the community that these techniques will work for our patients,” she said. “It is a patient life at the end of the algorithm, so we need to be sure.”

The MALIBO Lesson

Good science starts with a good question and a hypothesis, and Rockall had both when she launched Machine Learning in Body Oncology (MALIBO) in 2013, her first machine learning study, funded by the National Istitute for Health Research in the UK. “We identified this unmet need—that whole-body MRI is a good diagnostic tool but there is limited clinical translation because it is difficult for us radiologists to read the thousands and thousands of images,” Rockall said, alluding to the time burden and numerous measurement requirements.

The question was whether the application of machine learning could improve the diagnostic test accuracy of whole-body MRI in patients with colorectal cancer by reducing the reading time—and radiologist concern about missing something. The hypothesis was that machine learning would improve specificity by “x” amount.

In Phase I, the researchers trained a machine learning algorithm on normal images to recognize normal organs. In Phase II, they taught the computer to identify malignant lesions using machine learning and developed an algorithm to recognize malignant tissue. By Phase III, the researchers had settled on an algorithm and set out to validate the algorithm with expert radiologists, who read, in random order, with or without machine-learning support.

“That was our brilliant idea, before I knew anything about machine learning, thinking it all was going to work so beautifully,” Rockall recalled, thinking they had done it all correctly, with a flowchart, stop-go points, and the support of a statistician who identified the need for 217 cases to meet their hypothesis, 141 of which had no metastatic disease.

Problems arose when the team recognized it had not anticipated how many patient images would be needed to train the algorithm without facing the problem of over-fitting: the team had underestimated how difficult it was to assemble a large dataset of whole-body MRIs representing patients with colorectal cancer.

Another Unmet Need, More Lessons

Rockall flashed a CT image of a patient with multiple myeloma on the screen to introduce another unmet clinical need that resulted in a machine learning study. “This is a patient with multiple myeloma who can’t start treatment because on this CT, the standard of care, we can’t see the disease so there is no indication to start treatment,” she shared. “But if we look at the whole-body MRI, we can see that the patient has a very heavy burden of disease.”

Guidance from the UK now suggests that whole body MRI is the best way to image multiple myeloma, but it has not widely deployed into clinical translation due to the difficulty of acquiring and reading the study. “I was working at the time with Dr. Christina Messiou at the Royal Marsden Hospital where there were hundreds of myeloma WB-MRI cases,” Rockall said. “This was a perfect machine learning study.”

Again, the researchers planned carefully, created a flow chart and worked out how many patients were needed in each phase of the trial. “We really need to look at image data and acquisition,” Rockall advised. “Ideally we want artifact-free data, with excellent image contrast and resolution, reproducible for the same diagnostic finding regardless of the vendor.”

The image data from the healthy volunteer dataset over which the researchers had complete control was artifact-free with excellent image contrast and resolution, and all of the patients similarly positioned. The algorithm was trained to segment all of the various organs and bones. “We can do this really rather well in healthy volunteers,” she said. “The difficulty comes when we have real, live patient data.”

When using patient data, the machine-learning algorithm had to contend with data from different imaging centers and patients with disease; the data were riddled with artifacts and acquired with different signal intensities, and the algorithm had difficulty with the segmentation task that it had been trained to undertake.

“We need to look at other ways to handle diverse data,” Rockall said. “Work in progress is looking at transfer learning (using knowledge that is gained when solving one task and applying it to another) and intensity normalization in order to manage diverse data that has come to us already acquired. This is mostly what we are using to train algorithms at the moment, we don’t have prospectively collected datasets.”

Another strategy to improve harmonization of data is to look at rapid MRI acquisition using under-sampling and a convolutional neural network reconstruction, not just in the chest but in the upper abdomen and pelvis.  “Dynamic, biometric analysis or fast imaging may help give us real-time adjustments on a case-by-case basis,” she said. “They may be able to account for patient movement during acquisition and may improve the ability of the algorithm to work in clinical practice. We need harmonization either before or after acquisition.”

The Segmentation Strategy

Rockall’s strategy for teaching the computer to detect the difference between healthy and diseased tissue is to train the algorithm to perform segmentation tasks, an approach used by many other machine learning researchers.  Whether to train the algorithm with clean data that has been acquired in the lab on healthy volunteers or messy, clinical data is the question that remains unanswered for Rockall.

The detection and segmentation tool must work in a wide variety of patients for it to be clinically deployable. For Phase II of the MALIBO study of patients with colorectal cancer, the first dataset was trained on healthy volunteer studies acquired in the lab, and Rockall was delighted with the results.

“We had the T sequences all stacked, and we could scroll up or down through the T2s and diffusion images, together with the overlay of the machine learning output,” she shared. “We had a probability map with red indicating high probability of cancer, and we were delighted. We even deployed this in our reading platform where we could scroll up or down and see the probablity map popping in and out like a PET scan this red dot where cancer had been identified. We were thrilled.”

When they started adding the extra patient datasets, problems arose due to over-fitting with the colon cancer cases, and the team faced difficulty with random chest masses, lymph nodes, variability in the MRI sequences from different departments, and corrupt data that couldn’t be used. “The performance of the algorithm drops when input data changes, and this is really important," she said. "Should we be training first on similar data or should we be training first on messy data? I don’t know that yet, and it is one of the things we are investigating.”

Image Quantification, Body Composition, Radiomics

As a cancer radiologist, Rockall has a long list of machine-learning targets: “Image quantification is obviously very important to cancer radiologists. We’ve got lots of measuring to do, difficult, time-consuming, repetitive tasks—measuring lesion size, RECIST measurements, volume size, radiotherapy planning, frailty assessments, burden of disease—all of which can benefit from AI tools.”

Another task Rockall is working on is body composition. “We know that body composition can tell us about overall survival, it’s important to us, but it’s an unmet need,” she said. “Nobody is going to sit at their console and draw lines around the body muscle. In my field (ovarian cancer), patients don’t have the ideal abdominal shape for automatic muscle segementation, they’ve got ascites, they have peritoneal disease, so now we need a lot more training data to get this segmentation tool right.”

In the process of doing that, Rockall encountered further hurdles. “We also had to decide how to find the L-3 level to do the segmentation at the right level, and then we found that in the thousands of images we were working on, they didn’t all cover the same body part: Some covered the diaphragm down, some from neck down, some included the head. So how does the machine learning tool recognize the L3 with all of these different scenarios? Other factors that need to be normalized are slice thickness and the phenomenon of extra vertebrae.”

“We need to be able to train algorithms to manage this variable data,” she added. “Can we develop confidence maps for the machine learning output so that we can have choices that will actually not worry but will comfort the radiologists, because we know that there is variation. If the tool can tell us there is variation, that is helpful.”

Radiomics—a mathematical characterization of tissue features that can identify different types of tumors—is high on Rockall’s target list.  In manually segmenting hundreds of cases, Rockall’s team discovered a radiomic prognostic vector using mathematical tools to assign ovarian cases into prognostic groups. The team is aware that no matter how strong a predictor the vector performed in their first paper (soon to be published), they need to validate it widely in study with a large, retrospective dataset, and then prospectively, as well as integrate the vector into the decision tree.

“We looked at how we wanted to gather the data, what we want to do with the data, what we want to quantify, and how we wanted to test it,” she said. “Ultimately, what we really want to end up with is some kind of risk stratification, a prognostic chain.”

Rockall’s Wish List

In conclusion, Rockall returned to what she really wants to accomplish, which has nothing to do with GPUs and the number of layers in a neural network.

The right image for the right patient. “As a cancer radiologist, I’d like to start off with the image acquisition that would really benefit patients. At the moment in ovarian cancer we use CT, but MRI is probably much better. The problem is we can’t get a reliable image on MR because patients move, patients breathe there are artifacts. So, what I would like to see is fast MR, which has been beautifully developed in the chest for cardiac imaging. I’d like to see this applied to the upper abdomen, in the pelvis, so we can actually improve what we are doing for patient care by deploying AI to help us with acquisition.”

Detection and segmentation. “I would like difficult segmentation tasks to happen automatically, because I want to be able to understand the radiomic prognostic vector in a moment—nobody in the room is going to sit and draw lines around the ovary to get the radiomic and prognostic vector that will tell us if the patient should have surgery. We just can’t do that, so we need a segmentation tool to allow the radiomic analysis. We also want automatic image quantification of patient fraility as we open the case, because we know that frailty conveys a different prognosis to different patients.”

Automatic volume measurement. “I don’t want to do volume measurements. I want to do other things. If we could have the RPV come up, the disease volume, the frailty assessments, I can pool all these things together, I have a lot more information about the status of the patient and how they need to be treated.”

“In order to get adoption and translation of these important clinical tools, we need physician buy-in,” she acknowledged. “Clinician skepticism could slow down translation if we don’t meet the high bar of robust science in getting these AI tools on the market.”

Standards and regulations will play an important role in supporting this translation by providing robust assessment of evidence and providing the confidence to clinicians that they will demand. Rockall noted that the British Standards Institute and the Association for the Advancement of Medical Instrumentation are now working together on new IT standards.

For those who have been inspired to engage in building tools to improve the diagnostic capabilities of radiologists, Rockall offered the following thoughts on planning a good clinical deep-learning study.

  • Start by identifying an unmet clinical need that will really make a difference to patient care.
  • Make sure there is enough appropriate data from different manufacturers so that your algorithm can be rolled out across different settings.
  • Consider the data needed for training, testing, and external validation as you plan the study.
  • Plan very carefully. Will the tool work for a wide variety of patients with anatomic variation, with crumbly spines, with osteoporosis, with ascites? Does the tool work if variations are present?
  • Plan to provide a level of confidence that tells the clinician how certain or uncertain the tool is of its finding (which radiologists often do in their reports).
  • When the tool is ready for deployment, make sure it will integrate with the PACS and require minimal or no clicks.

Finally, Rockall urged would-be developers to fully understand clinician needs, and to that end, have a well-defined task to answer an unmet clinical need, the right data, and the right methods, including the right team.

“This is convergence science,” she said, acknowledging the contributions of her non-physician partner in machine learning, Ben Glocker.  “We are working together.”

Subscribe to Hub
News from around the coalition and beyond

Hub is the monthly newsletter published for the membership of Strategic Radiology practices. It includes coalition and practice news as well as news and commentary of interest to radiology professionals.

If you want to know more about Strategic Radiology, you are invited to subscribe to our monthly newsletter. Your email will not be shared.