Article | July 26, 2018

Machine Learning And Deep Learning In Medical Imaging (ML & DL In MI)

By Victoria Yaskevich, Project Coordinator, Healthcare & Finance IT Consultant

Managing The Entire Lifecycle Of Clinical Trial Images In A Single Tool

In 1895, a German physicist Wilhelm Röntgen showed his wife Anna an X-ray of her hand. “I have seen my death,” said the woman awe-stricken by an unusual and impressive invention. It was an amazing step forward in the history of medicine, as for the first time ever the innards of the body could be made visible without having them cut into the flesh.

Medical imaging has broken paradigms when it first began more than 120 years ago, and after the years of evolvement, it seems to take us beyond our current reality once again. It aims to introduce people solutions that will analyze structured and unstructured medical data being kept in disconnected storages and present it in a contextually relevant and probability-driven manner. Sounds intriguing, doesn’t it?

It does, especially if we take into account that the average hospital creates around 50 petabytes of data per year which equals to 20 million four-drawer filing cabinets full of text, or 13.3 years of HD video. Needless to say, it’s a lot of data!

To make your eyes round out even more, I’d say that medical images account for as much as 90% of all medical data today, which becomes an overwhelming amount on a human scale. Thus new methods are required to extract and process data from those images more efficiently. No, AI will not diagnose patients and replace doctors. It will just augment their ability to find the key, the relevant data needed to care for patients, and further present this data in a concise and easily digestible format.

For example, when a radiologist examines a chest X-ray, AI reviews the image and identifies potential findings immediately. Let’s say, a patient has a lung cancer, so it combs through the picture archiving and communication system (PACS), electronic medical records (EMR) and departmental reporting systems to bring in:

Prior chest imaging studies;
Cardiology report information;
Medications the patient is currently taking;
Patient history relevant to having COPD (chronic obstructive pulmonary disease) and a history of smoking that might relate to his current exam;
Recent lab reports;
Oncology marks including chemotherapy;
Radiation therapy treatments.

The data collected by AI is further displayed in a concise way (the links to a full source are also presented) which greatly enhances the picture of the patient’s health.

This is exactly how IBM Watson, a king of medical AI, works. But how is it made possible? First and foremost, tons and tons of data.

To build a body of knowledge researchers put together myriads of Word documents, PDFs and web pages, both structured (databases) and unstructured (Wikipedia, newswires, etc.), including dictionaries and encyclopedias. When asked a question, Watson initially analyzes it using more than 100 algorithms, identifying names, dates, geographic locations or other entities. It also examines the phrase structure and grammar of the question to better estimate what has been asked. To answer it, Watson searches millions of documents to find thousands of possible answers. Along the way, it collects additional evidence and uses a scoring algorithm to rate each item's quality. Based on that scoring, it ranks all possible answers and offers the best one. For the cases where AI doesn’t accurately determine the disease state or finds incorrect or irrelevant data, software developers go back and refine the AI algorithm iteration after iteration until the AI software gets it right in the majority of cases. All in all, a lot of efforts are put to make the whole machine moving.

In order to get this legion of data and train AI, IBM purchases whole companies, like Merge Healthcare and Phytel in 2015, or Truven Health Analytics in 2016. Moreover, IBM Watson is licensing its software through third-party agreements with other health IT vendors, i.e. each vendor needs to add additional value to Watson with their own programming, not just become a reseller. And what is more important is that vendors are also required to share access to all the patient data and imaging studies they have access to. This allows Watson to continue amplifying its clinical intelligence with millions of new patient records.

Examples of ML in medical imaging

Detection of melanoma, a type of skin cancer. Diagnosing this disease can be extremely difficult, as there are so many variations in the way it appears in individual cases. By feeding a computer many images of melanoma, it is possible to teach the system to recognize very subtle but important features associated with the disease. The ML technology is able to compare a new patient’s image with many others in a database and then rapidly give the doctor important information (based on the images and text-based records) about the diagnosis and potential treatments.
Finding cancer in lung CT scans. “You have to scroll through hundreds and hundreds of slices looking for a few little glowing pixels that appear and disappear, and that takes a long time, and it is very easy to make a mistake,” says Jeremy Howard, CEO of Enlitic, a four-year-old startup that is using deep learning for medical image processing. Howard says they have already created an algorithm capable of identifying relevant characteristics of lung tumors more accurately than radiologists can.
Diagnosis and prediction of a future disease outburst or progression. Models are trained on the data taken from continuous studies of the disease status collected through years after its acquisition. For example, a hippocampal shape classification of healthy elderly people can predict an outburst of dementia symptoms by up to ten years later.
Detecting abnormalities of heart work in EKGs. Cardiologs, as well as its competitor iRhythm, are working hard to recognize any kind of aberrations in an EKG, f.e. arrhythmia, or improper beating of the heart, which could signal long-term complications.
Diagnosis of the Alzheimer’s disease or other forms of dementia, and prediction of the conversion from mild cognitive impairment (MCI) to dementia, based on brain MR images. This is likely, or at least in part, driven by the availability of large datasets with diagnostic labels, such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Open Access Series of Imaging Studies (OASIS).
Detection of diabetic retinopathy (retinal degeneration) in retinal fundus photographs. Many papers are focused on optimizing detection and segmentation of retinal vessels for which several smaller public databases are available. However, a recent Kaggle competition on diabetic retinopathy detection changed the situation by providing 35,000 images with expert visual scores for training.

But we have to keep in mind that this example is a specific task, performed on 2D images. Differential diagnosis or quantification based on full 3D or 4D, possibly multi-modal, imaging data would require even larger training sets to describe all biological variation adequately.

There are around 40 startups that are enhancing medical imaging with AI, yet all of them encounter the same problems on the way to success. Let’s have a closer look at them.

Main obstacles of ML in medical imaging

1. Lack of training data

Wait, you said there are 50 petabytes of data generated each year! Correct. However, this PHI can’t be used for training without encryption, it has to be safeguarded by the medical staff according to HIPAA, PIPEDA, GDPR and HITECH Act.

2. Variety of imaging protocols

Apart from some standardized imaging protocols, the training data can be acquired within different conditions, different protocols, and different scanner models, which would definitely hamper ML results.

MRI of the carotid artery obtained at two different sites in a multi-center study to improve diagnosis of high-risk carotid plaques. The imaging protocols in this study were carefully aligned, but due to different scanning equipment and different practices in different centers, some changes are unavoidable. Lumen, plaque, calcium spots (*) and intraplaque hemorrhage (black dot) can clearly be distinguished in both protocols, but visual appearance differs.

3. Poor annotation of training data

Related to the lack of training data, there is a lack of annotated data that could be used for training. Given a selection is done manually, it requires that humans not only visually assess the images, but also indicate boundaries reliably, f.e. for diffuse abnormalities. There is also a need for qualified people that will perform such segmentation.

4. Wrong interpretation and evaluation of data

Correlates with the previous point. Because of a poor annotation of data, we underestimate the complexity of learning algorithms to be applied and thus get poor training results.

Conclusion

Whether you are a skeptic or a romantic, you can’t deny that ML is confidently stepping into our lives. It drives our cars, advise best investment plans, recognize people’s mood based on facial expression. From drug discovery to diagnosis prediction Artificially Intelligent Healthcare (AIH) has a future too. So next time your radiologist warns you about a predisposed lung disease, take him seriously and think about a pile of your old exams been studied or genomic tests been executed for you to be forewarned and forearmed.