Each medical segmentation task should be considered with a specific AI algorithm based on its scenario so that the most accurate prediction model can be obtained. The most popular algorithms in medical segmentation, 3D U-Net and its variants, can directly implement the task of lung trachea segmentation, but its failure to consider the special tree-like structure of the trachea suggests that there is much room for improvement in its segmentation accuracy. Therefore, a research gap exists because a great amount of state-of-the-art DL algorithms are vanilla 3D U-Net structures, which do not introduce the various performance-enhancing modules that come with special natural image modality in lung airway segmentation. In this paper, we proposed two different network structures Branch-Level U-Net (B-UNet) and Branch-Level CE-UNet (B-CE-UNet) which are based on U-Net structure and compared the prediction results with the same dataset. Specially, both of the two networks add branch loss and central line loss to learn the feature of fine branch endings of the airways. Uncertainty estimation algorithms are also included to attain confident predictions and thereby, increase the overall trustworthiness of our whole model. In addition, predictions of the lung trachea based on the maximum connectivity rate were calculated and extracted during post-processing for segmentation refinement and pruning.
Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers.
We propose a novel Deep Active Learning (DeepAL) model-3D Wasserstein Discriminative UNet (WD-UNet) for reducing the annotation effort of medical 3D Computed Tomography (CT) segmentation. The proposed WD-UNet learns in a semi-supervised way and accelerates learning convergence to meet or exceed the prediction metrics of supervised learning models. Our method can be embedded with different Active Learning (AL) strategies and different network structures. The model is evaluated on 3D lung airway CT scans for medical segmentation and show that the use of uncertainty metric, which is parametrized as an input of query strategy, leads to more accurate prediction results than some state-of-the-art Deep Learning (DL) supervised models, e.g.,3DUNet and 3D CEUNet. Compared to the above supervised DL methods, our WD-UNet not only saves the cost of annotation for radiologists but also saves computational resources. WD-UNet uses a limited amount of annotated data (35% of the total) to achieve better predictive metrics with a more efficient deep learning model algorithm.
Medical image classification and segmentation based on deep learning (DL) are emergency research topics for diagnosing variant viruses of the current COVID-19 situation. In COVID-19 computed tomography (CT) images of the lungs, ground glass turbidity is the most common finding that requires specialist diagnosis. Based on this situation, some researchers propose the relevant DL models which can replace professional diagnostic specialists in clinics when lacking expertise. However, although DL methods have a stunning performance in medical image processing, the limited datasets can be a challenge in developing the accuracy of diagnosis at the human level. In addition, deep learning algorithms face the challenge of classifying and segmenting medical images in three or even multiple dimensions and maintaining high accuracy rates. Consequently, with a guaranteed high level of accuracy, our model can classify the patients' CT images into three types: Normal, Pneumonia and COVID. Subsequently, two datasets are used for segmentation, one of the datasets even has only a limited amount of data (20 cases). Our system combined the classification model and the segmentation model together, a fully integrated diagnostic model was built on the basis of ResNet50 and 3D U-Net algorithm. By feeding with different datasets, the COVID image segmentation of the infected area will be carried out according to classification results. Our model achieves 94.52% accuracy in the classification of lung lesions by 3 types: COVID, Pneumonia and Normal. For future medical use, embedding the model into the medical facilities might be an efficient way of assisting or substituting doctors with diagnoses, therefore, a broader range of the problem of variant viruses in the COVID-19 situation may also be successfully solved.
Human-Object Interaction (HOI) detection lies at the core of action understanding. Besides 2D information such as human/object appearance and locations, 3D pose is also usually utilized in HOI learning since its view-independence. However, rough 3D body joints just carry sparse body information and are not sufficient to understand complex interactions. Thus, we need detailed 3D body shape to go further. Meanwhile, the interacted object in 3D is also not fully studied in HOI learning. In light of these, we propose a detailed 2D-3D joint representation learning method. First, we utilize the single-view human body capture method to obtain detailed 3D body, face and hand shapes. Next, we estimate the 3D object location and size with reference to the 2D human-object spatial configuration and object category priors. Finally, a joint learning framework and cross-modal consistency tasks are proposed to learn the joint HOI representation. To better evaluate the 2D ambiguity processing capacity of models, we propose a new benchmark named Ambiguous-HOI consisting of hard ambiguous images. Extensive experiments in large-scale HOI benchmark and Ambiguous-HOI show impressive effectiveness of our method. Code and data are available at https://github.com/DirtyHarryLYL/DJ-RN.
Existing image-based activity understanding methods mainly adopt direct mapping, i.e. from image to activity concepts, which may encounter performance bottleneck since the huge gap. In light of this, we propose a new path: infer human part states first and then reason out the activities based on part-level semantics. Human Body Part States (PaSta) are fine-grained action semantic tokens, e.g. <hand, hold, something>, which can compose the activities and help us step toward human activity knowledge engine. To fully utilize the power of PaSta, we build a large-scale knowledge base PaStaNet, which contains 7M+ PaSta annotations. And two corresponding models are proposed: first, we design a model named Activity2Vec to extract PaSta features, which aim to be general representations for various activities. Second, we use a PaSta-based Reasoning method to infer activities. Promoted by PaStaNet, our method achieves significant improvements, e.g. 6.4 and 13.9 mAP on full and one-shot sets of HICO in supervised learning, and 3.2 and 4.2 mAP on V-COCO and images-based AVA in transfer learning. Code and data are available at http://hake-mvig.cn/.
Human activity understanding is crucial for building automatic intelligent system. With the help of deep learning, activity understanding has made huge progress recently. But some challenges such as imbalanced data distribution, action ambiguity, complex visual patterns still remain. To address these and promote the activity understanding, we build a large-scale Human Activity Knowledge Engine (HAKE) based on the human body part states. Upon existing activity datasets, we annotate the part states of all the active persons in all images, thus establish the relationship between instance activity and body part states. Furthermore, we propose a HAKE based part state recognition model with a knowledge extractor named Activity2Vec and a corresponding part state based reasoning network. With HAKE, our method can alleviate the learning difficulty brought by the long-tail data distribution, and bring in interpretability. Now our HAKE has more than 7 M+ part state annotations and is still under construction. We first validate our approach on a part of HAKE in this preliminary paper, where we show 7.2 mAP performance improvement on Human-Object Interaction recognition, and 12.38 mAP improvement on the one-shot subsets.