In this paper, we conduct a comprehensive study on the co-salient object detection (CoSOD) problem for images. CoSOD is an emerging and rapidly growing extension of salient object detection (SOD), which aims to detect the co-occurring salient objects in a group of images. However, existing CoSOD datasets often have a serious data bias, assuming that each group of images contains salient objects of similar visual appearances. This bias can lead to the ideal settings and effectiveness of models trained on existing datasets, being impaired in real-life situations, where similarities are usually semantic or conceptual. To tackle this issue, we first introduce a new benchmark, called CoSOD3k in the wild, which requires a large amount of semantic context, making it more challenging than existing CoSOD datasets. Our CoSOD3k consists of 3,316 high-quality, elaborately selected images divided into 160 groups with hierarchical annotations. The images span a wide range of categories, shapes, object sizes, and backgrounds. Second, we integrate the existing SOD techniques to build a unified, trainable CoSOD framework, which is long overdue in this field. Specifically, we propose a novel CoEG-Net that augments our prior model EGNet with a co-attention projection strategy to enable fast common information learning. CoEG-Net fully leverages previous large-scale SOD datasets and significantly improves the model scalability and stability. Third, we comprehensively summarize 34 cutting-edge algorithms, benchmarking 16 of them over three challenging CoSOD datasets (iCoSeg, CoSal2015, and our CoSOD3k), and reporting more detailed (i.e., group-level) performance analysis. Finally, we discuss the challenges and future works of CoSOD. We hope that our study will give a strong boost to growth in the CoSOD community
Optical Coherence Tomography Angiography (OCT-A) is a non-invasive imaging technique, and has been increasingly used to image the retinal vasculature at capillary level resolution. However, automated segmentation of retinal vessels in OCT-A has been under-studied due to various challenges such as low capillary visibility and high vessel complexity, despite its significance in understanding many eye-related diseases. In addition, there is no publicly available OCT-A dataset with manually graded vessels for training and validation. To address these issues, for the first time in the field of retinal image analysis we construct a dedicated Retinal OCT-A SEgmentation dataset (ROSE), which consists of 229 OCT-A images with vessel annotations at either centerline-level or pixel level. This dataset has been released for public access to assist researchers in the community in undertaking research in related topics. Secondly, we propose a novel Split-based Coarse-to-Fine vessel segmentation network (SCF-Net), with the ability to detect thick and thin vessels separately. In the SCF-Net, a split-based coarse segmentation (SCS) module is first introduced to produce a preliminary confidence map of vessels, and a split-based refinement (SRN) module is then used to optimize the shape/contour of the retinal microvasculature. Thirdly, we perform a thorough evaluation of the state-of-the-art vessel segmentation models and our SCF-Net on the proposed ROSE dataset. The experimental results demonstrate that our SCF-Net yields better vessel segmentation performance in OCT-A than both traditional methods and other deep learning methods.
Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of size, color and texture; and (ii) the boundary between a polyp and its surrounding mucosa is not sharp. To address these challenges, we propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images. Specifically, we first aggregate the features in high-level layers using a parallel partial decoder (PPD). Based on the combined feature, we then generate a global map as the initial guidance area for the following components. In addition, we mine the boundary cues using a reverse attention (RA) module, which is able to establish the relationship between areas and boundary cues. Thanks to the recurrent cooperation mechanism between areas and boundaries, our PraNet is capable of calibrating any misaligned predictions, improving the segmentation accuracy. Quantitative and qualitative evaluations on five challenging datasets across six metrics show that our PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency.
Anterior chamber angle (ACA) classification is a key step in the diagnosis of angle-closure glaucoma in Anterior Segment Optical Coherence Tomography (AS-OCT). Existing automated analysis methods focus on a binary classification system (i.e., open angle or angle-closure) in a 2D AS-OCT slice. However, clinical diagnosis requires a more discriminating ACA three-class system (i.e., open, narrow, or synechiae angles) for the benefit of clinicians who seek better to understand the progression of the spectrum of angle-closure glaucoma types. To address this, we propose a novel sequence multi-scale aggregation deep network (SMA-Net) for open-narrow-synechiae ACA classification based on an AS-OCT sequence. In our method, a Multi-Scale Discriminative Aggregation (MSDA) block is utilized to learn the multi-scale representations at slice level, while a ConvLSTM is introduced to study the temporal dynamics of these representations at sequence level. Finally, a multi-level loss function is used to combine the slice-based and sequence-based losses. The proposed method is evaluated across two AS-OCT datasets. The experimental results show that the proposed method outperforms existing state-of-the-art methods in applicability, effectiveness, and accuracy. We believe this work to be the first attempt to classify ACAs into open, narrow, or synechia types grading using AS-OCT sequences.
Precise characterization and analysis of iris shape from Anterior Segment OCT (AS-OCT) are of great importance in facilitating diagnosis of angle-closure-related diseases. Existing methods focus solely on analyzing structural properties identified from the 2D slice, while accurate characterization of morphological changes of iris shape in 3D AS-OCT may be able to reveal in addition the risk of disease progression. In this paper, we propose a novel framework for reconstruction and quantification of 3D iris surface from AS-OCT imagery. We consider it to be the first work to detect angle-closure glaucoma by means of 3D representation. An iris segmentation network with wavelet refinement block (WRB) is first proposed to generate the initial shape of the iris from single AS-OCT slice. The 3D iris surface is then reconstructed using a guided optimization method with Poisson-disk sampling. Finally, a set of surface-based features are extracted, which are used in detecting of angle-closure glaucoma. Experimental results demonstrate that our method is highly effective in iris segmentation and surface reconstruction. Moreover, we show that 3D-based representation achieves better performance in angle-closure glaucoma detection than does 2D-based feature.
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Although many OS time prediction methods have been developed and obtain promising results, there are still several issues. First, conventional prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume, which may not represent the full image or model complex tumor patterns. Second, different types of scanners (i.e., multi-modal data) are sensitive to different brain regions, which makes it challenging to effectively exploit the complementary information across multiple modalities and also preserve the modality-specific properties. Third, existing methods focus on prediction models, ignoring complex data-to-label relationships. To address the above issues, we propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net). Specifically, we first project the 3D MR volume onto 2D images in different directions, which reduces computational costs, while preserving important information and enabling pre-trained models to be transferred from other tasks. Then, we use a modality-specific network to extract implicit and high-level features from different MR scans. A multi-modal shared network is built to fuse these features using a bilinear pooling model, exploiting their correlations to provide complementary information. Finally, we integrate the outputs from each modality-specific network and the multi-modal shared network to generate the final prediction result. Experimental results demonstrate the superiority of our M2Net model over other methods.
Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues. Further, collecting a large amount of data is impractical within a short time period, inhibiting the training of a deep model. To address these challenges, a novel COVID-19 Lung Infection Segmentation Deep Network (Inf-Net) is proposed to automatically identify infected regions from chest CT slices. In our Inf-Net, a parallel partial decoder is used to aggregate the high-level features and generate a global map. Then, the implicit reverse attention and explicit edge-attention are utilized to model the boundaries and enhance the representations. Moreover, to alleviate the shortage of labeled data, we present a semi-supervised segmentation framework based on a randomly selected propagation strategy, which only requires a few labeled images and leverages primarily unlabeled data. Our semi-supervised framework can improve the learning ability and achieve a higher performance. Extensive experiments on our COVID-SemiSeg and real CT volumes demonstrate that the proposed Inf-Net outperforms most cutting-edge segmentation models and advances the state-of-the-art performance.