Pneumonia is caused by viruses, bacteria, or fungi that infect the lungs, which, if not diagnosed, can be fatal and lead to respiratory failure. More than 250,000 individuals in the United States, mainly adults, are diagnosed with pneumonia each year, and 50,000 die from the disease. Chest Radiography (X-ray) is widely used by radiologists to detect pneumonia. It is not uncommon to overlook pneumonia detection for a well-trained radiologist, which triggers the need for improvement in the diagnosis's accuracy. In this work, we propose using transfer learning, which can reduce the neural network's training time and minimize the generalization error. We trained, fine-tuned the state-of-the-art deep learning models such as InceptionResNet, MobileNetV2, Xception, DenseNet201, and ResNet152V2 to classify pneumonia accurately. Later, we created a weighted average ensemble of these models and achieved a test accuracy of 98.46%, precision of 98.38%, recall of 99.53%, and f1 score of 98.96%. These performance metrics of accuracy, precision, and f1 score are at their highest levels ever reported in the literature, which can be considered a benchmark for the accurate pneumonia classification.
Medical image datasets are usually imbalanced, due to the high costs of obtaining the data and time-consuming annotations. Training deep neural network models on such datasets to accurately classify the medical condition does not yield desired results and often over-fits the data on majority class samples. In order to address this issue, data augmentation is often performed on training data by position augmentation techniques such as scaling, cropping, flipping, padding, rotation, translation, affine transformation, and color augmentation techniques such as brightness, contrast, saturation, and hue to increase the dataset sizes. These augmentation techniques are not guaranteed to be advantageous in domains with limited data, especially medical image data, and could lead to further overfitting. In this work, we performed data augmentation on the Chest X-rays dataset through generative modeling (deep convolutional generative adversarial network) which creates artificial instances retaining similar characteristics to the original data and evaluation of the model resulted in Fr\'echet Distance of Inception (FID) score of 1.289.