In July, the National Institutes of Health (NIH) Clinical Center released DeepLesion to the scientific community, a massive dataset that contains 32,735 lesions in 32,120 CT slices from 10,594 studies of 4,427 unique patients. Its stated purpose was to help the scientific community improve detection accuracy of lesions.
The dataset is different than most lesion datasets currently available in several ways: it is large enough to train a deep neural network and it contains diverse types of lesions (eg lung, liver, kidney, bone, enlarged lymph nodes) rather than one variety. All images were annotated by NIH radiologists using electronic annotation tools to place arrows, lines, diameters, and text to bookmark the precise location and size of a lesion, measured according to the response evaluation criteria in solid tumors (RECIST) and harvested from the institute’s PACS.
“This dataset is a huge boost to the field of machine learning," said James Whitfill, MD, Strategic Radiology CIO. "While the algorithms might appear to be the most challenging part of this field, in reality, the training and validation data is the proverbial ‘last mile’ in AI. This dataset should enhance the ability of researchers to validate and accelerate adoption of other algorithms. “
Because they worked with images that had already been annotated, minimal pre-processing was required to prepare the dataset for use. The researchers culled the CT images found in its PACS to remove those with “noisy” bookmarks, including nonaxial RECIST diameters, and converted the vertices to imaging coordinates; the DICOM slices were converted to PNG graphics. To anonymize the data, the real patient identifiers were replaced with indices that defined patient, study and series. Metadata such as pixel spacing, slice interval, intensity window, and patient gender and age were also recorded. Slice intervals range from .25 mm to 22.5 mm, although 48.3% are 1 mm and 48.9% are 5 mm.
In releasing the dataset, the intention of the researchers is to encourage:
“It could enable the scientific community to create a large-scale universal lesion detector with one unified framework,” noted the NIH press release that accompanied the release. The dataset was compiled by a group of NIH researchers led by Ronald M. Summers, MD, PhD, senior investigator and clinical image processing service in the Imaging Biomarkers and Computer Aided Diagnosis Laboratory, NIH Clinical Center, Radiology and Imaging Sciences Department.
Hal Is here
In fact, those researchers already used the dataset to build the first iteration of a universal lesion detector. Yan et al used DeepLesion to create an automated universal lesion detector algorithm based on a regional convolutional neural network to find all types of lesions that achieved a sensitivity of 77.31% with three false positives and 81.1% with five false positives per image. They published their results in a recent issue of the Journal of Medical Imaging.
Old-school computer aided detection algorithms based on statistical learning approaches have focused on one particular type of lesion to date, but the emergence of deep learning algorithms based on convolutional neural networks mean that the interpretive skills of the computer algorithm can more closely approximate radiologist practice patterns, making them potentially more helpful.
In explaining the significance of the large, diverse dataset, they wrote: “In practice, multiple findings can be observed and are often correlated….However, it remains challenging to develop a universal or multicategory computer-aided detection framework, capable of detecting multiple lesion types in a seamless fashion, partially due to the lack of a multicategory lesion dataset. Such a framework is crucial to building an automatic radiological diagnosis and reasoning system.”
Preprocessing and Network Architecture
To prepare the images for use, the researchers rescaled the 12-bit CT intensity range to floating point numbers, using a single windowing that covers the intensity range of lung, soft tissue, and bone. Image slices were resized to 512 x 512 and three axial slices were used to encode 3-D information (the center image containing the bookmark and neighboring slices interpolated at 2 mm slice intervals).
Images were input into a convolutional neural network based on the TVGG-16 model to produce a feature map, with the last two pooling layers removed to enhance the map’s resolution and increase sampling ratio of positive samples. After that, a “region proposed network” parsed the feature map, proposed candidate lesion regions, and estimated the probability of lesion/nonlesion on a fixed set of anchors on each position of the feature maps. Anchor locations and sizes were fine-tuned via bounding box regression. Five anchor scales and three anchor ratios were implemented after analyzing the sizes of the bounding boxes in DeepLesion.
After all of that, lesion proposals and feature maps were sent to a “region of interest” pooling layer that resampled the feature maps to a fixed size, and the maps were fed into two convolutional layers. Rectified linear units were inserted and the feature vector produced by the final layer underwent two more layers to predict lesion confidence scores.
Evaluation and results
The investigators evaluated the proposed algorithm by dividing DeepLesion into three parts at the patient level for training (70%), validation (15%), and testing (15%). As mentioned earlier, sensitivity was 81.1% with five false positives and 77.31% with three false positives. However, the performance steadily improves as more samples are used, the researchers emphasized. “As a result, the accuracy is expected to be better as we harvest more data in the future,” they wrote.
What the algorithm doesn’t do is predict the type of each lesion detected, which is why the algorithm was not directly compared with the literature. (In future work, they plan to extract lesion type via natural language processing from the reports.) The researchers were able to identify the lesions that were easiest to detect: liver, lung, kidney and mediastinum. Abdominal and pelvic lesions were more difficult to detect. Because of both fewer training samples and small contrast with normal structures, soft tissue and bone lesions had the lowest sensitivity.
The authors identified the following use cases for DeepLesion: lesion detection (as demonstrated) to assist radiologists; lesion classification; lesion segmentation for automated segmentation and measurement of lesions; content- or text-based lesion retrieval algorithms using relevant lesions given a query text or image; and perhaps one of the most clinically valuable use cases, lesion growth analysis. The NIH also has a large number of MR images and nuclear medicine images in its PACS that could be the basis of future datasets.
“It is increasingly clear that AI is not going to replace radiologists, but it will change the way radiologists work, just like PACS and VR did,” Whitfill says. “SR members are leading this transformation. Through its partnership with IBM Watson, SR has been a valuable part of the work to develop better machine learning networks to support radiologist diagnosis.”