Identifying the Best Image Classification Algorithm for COVID-19
Diagnosis with a Small, Imbalanced Chest X-Ray Dataset
Abstract
In this project, I study families of deep learning neural networks that
are trained on publicly available chest X-ray datasets to identify the
best image classification algorithm for automating the diagnosis of
respiratory illnesses. Specifically, the learned networks will be used
to classify anonymized chest X-ray images to three classes: healthy,
COVID-19 and non-COVID pneumonia. As in most real-world applications,
publicly available chest X-ray image datasets are not abundant, and
ground truth data of COVID-19 diagnosis is especially hard to come by.
In this project, the first variable implemented to improve the
predictive power of the neural networks is through pretraining on a
domain-relevant and much larger than the transfer learning dataset. To
address the imbalance within training data, the second variable
implemented is to customize the data sampling configuration using the
equal-weight-per-epoch method or fixed- fraction-per-batch method. As
control for each neural network, pretrained weights learned from the
classic ImageNet dataset are used, and no customized training data
sampling method is applied. In regard to transfer learning, two
scikit-learn functions, average precision and F1 score, are computed
during training. Then Precision and Recall are manually calculated based
on the confusion matrix for each neural network along with the
hyperparameters. The most significant observation is that the Recall
metric for the control group is consistently less than 0.6, which is a
clear indicator of the underperformance on COVID-19 prediction. The
family with significantly higher performance is DenseNet; surprisingly,
DenseNet169 has one of the highest Precision and Recall of 0.870 and
0.837. With more than 82 million COVID-19 cases worldwide, the need for
efficient, accurate and mass diagnosis of patients is apparent and
growing. The utilization of chest X-ray images in medical diagnosis is
both a cost-effective and widespread technique for early screening of
respiratory illnesses.