Computed tomography (CT) imaging could be very practical for diagnosing various diseases. However, the nature of the CT images is even more diverse since the resolution and number of the slices of a CT scan are determined by the machine and its settings. Conventional deep learning models are hard to tickle such diverse data since the essential requirement of the deep neural network is the consistent shape of the input data. In this paper, we propose a novel, effective, two-step-wise approach to tickle this issue for COVID-19 symptom classification thoroughly. First, the semantic feature embedding of each slice for a CT scan is extracted by conventional backbone networks. Then, we proposed a long short-term memory (LSTM) and Transformer-based sub-network to deal with temporal feature learning, leading to spatiotemporal feature representation learning. In this fashion, the proposed two-step LSTM model could prevent overfitting, as well as increase performance. Comprehensive experiments reveal that the proposed two-step method not only shows excellent performance but also could be compensated for each other. More specifically, the two-step LSTM model has a lower false-negative rate, while the 2-step Swin model has a lower false-positive rate. In summary, it is suggested that the model ensemble could be adopted for more stable and promising performance in real-world applications.
With the massive damage in the world caused by Coronavirus Disease 2019 SARS-CoV-2 (COVID-19), many related research topics have been proposed in the past two years. The Chest Computed Tomography (CT) scans are the most valuable materials to diagnose the COVID-19 symptoms. However, most schemes for COVID-19 classification of Chest CT scan is based on a single-slice level, implying that the most critical CT slice should be selected from the original CT scan volume manually. We simultaneously propose 2-D and 3-D models to predict the COVID-19 of CT scan to tickle this issue. In our 2-D model, we introduce the Deep Wilcoxon signed-rank test (DWCC) to determine the importance of each slice of a CT scan to overcome the issue mentioned previously. Furthermore, a Convolutional CT scan-Aware Transformer (CCAT) is proposed to discover the context of the slices fully. The frame-level feature is extracted from each CT slice based on any backbone network and followed by feeding the features to our within-slice-Transformer (WST) to discover the context information in the pixel dimension. The proposed Between-Slice-Transformer (BST) is used to aggregate the extracted spatial-context features of every CT slice. A simple classifier is then used to judge whether the Spatio-temporal features are COVID-19 or non-COVID-19. The extensive experiments demonstrated that the proposed CCAT and DWCC significantly outperform the state-of-the-art methods.