Abstract:Pathological myopia (PM) is the leading ocular disease for impaired vision and blindness worldwide. The key to detecting PM as early as possible is to detect informative features in global and local lesion regions, such as fundus tessellation, atrophy and maculopathy. However, applying classical convolutional neural networks (CNNs) to efficiently highlight global and local lesion context information in feature maps is quite challenging. To tackle this issue, we aim to fully leverage the potential of global and local lesion information with attention module design. Based on this, we propose an efficient pyramid channel attention (EPCA) module, which dynamically explores the relative importance of global and local lesion context information in feature maps. Then we combine the EPCA module with the backbone network to construct EPCA-Net for automatic PM detection based on fundus images. In addition, we construct a PM dataset termed PM-fundus by collecting fundus images of PM from publicly available datasets (e.g., the PALM dataset and ODIR dataset). The comprehensive experiments are conducted on three datasets, demonstrating that our EPCA-Net outperforms state-of-the-art methods in detecting PM. Furthermore, motivated by the recent pretraining-and-finetuning paradigm, we attempt to adapt pre-trained natural image models for PM detection by freezing them and treating the EPCA module and other attention modules as the adapters. The results show that our method with the pretraining-and-finetuning paradigm achieves competitive performance through comparisons to part of methods with traditional fine-tuning methods with fewer tunable parameters.
Abstract:Recently, utilizing deep neural networks to build the opendomain dialogue models has become a hot topic. However, the responses generated by these models suffer from many problems such as responses not being contextualized and tend to generate generic responses that lack information content, damaging the user's experience seriously. Therefore, many studies try introducing more information into the dialogue models to make the generated responses more vivid and informative. Unlike them, this paper improves the quality of generated responses by learning the implicit pattern information between contexts and responses in the training samples. In this paper, we first build an open-domain dialogue model based on the pre-trained language model (i.e., GPT-2). And then, an improved scheduled sampling method is proposed for pre-trained models, by which the responses can be used to guide the response generation in the training phase while avoiding the exposure bias problem. More importantly, we design a response-aware mechanism for mining the implicit pattern information between contexts and responses so that the generated replies are more diverse and approximate to human replies. Finally, we evaluate the proposed model (RAD) on the Persona-Chat and DailyDialog datasets; and the experimental results show that our model outperforms the baselines on most automatic and manual metrics.
Abstract:Fundus photography is prone to suffer from image quality degradation that impacts clinical examination performed by ophthalmologists or intelligent systems. Though enhancement algorithms have been developed to promote fundus observation on degraded images, high data demands and limited applicability hinder their clinical deployment. To circumvent this bottleneck, a generic fundus image enhancement network (GFE-Net) is developed in this study to robustly correct unknown fundus images without supervised or extra data. Levering image frequency information, self-supervised representation learning is conducted to learn robust structure-aware representations from degraded images. Then with a seamless architecture that couples representation learning and image enhancement, GFE-Net can accurately correct fundus images and meanwhile preserve retinal structures. Comprehensive experiments are implemented to demonstrate the effectiveness and advantages of GFE-Net. Compared with state-of-the-art algorithms, GFE-Net achieves superior performance in data dependency, enhancement performance, deployment efficiency, and scale generalizability. Follow-up fundus image analysis is also facilitated by GFE-Net, whose modules are respectively verified to be effective for image enhancement.
Abstract:Given the prevalence of rolling bearing fault diagnosis as a practical issue across various working conditions, the limited availability of samples compounds the challenge. Additionally, the complexity of the external environment and the structure of rolling bearings often manifests faults characterized by randomness and fuzziness, hindering the effective extraction of fault characteristics and restricting the accuracy of fault diagnosis. To overcome these problems, this paper presents a novel approach termed constructive Incremental learning-based ensemble domain adaptation (CIL-EDA) approach. Specifically, it is implemented on stochastic configuration networks (SCN) to constructively improve its adaptive performance in multi-domains. Concretely, a cloud feature extraction method is employed in conjunction with wavelet packet decomposition (WPD) to capture the uncertainty of fault information from multiple resolution aspects. Subsequently, constructive Incremental learning-based domain adaptation (CIL-DA) is firstly developed to enhance the cross-domain learning capability of each hidden node through domain matching and construct a robust fault classifier by leveraging limited labeled data from both target and source domains. Finally, fault diagnosis results are obtained by a majority voting of CIL-EDA which integrates CIL-DA and parallel ensemble learning. Experimental results demonstrate that our CIL-DA outperforms several domain adaptation methods and CIL-EDA consistently outperforms state-of-art fault diagnosis methods in few-shot scenarios.
Abstract:The annotation scarcity of medical image segmentation poses challenges in collecting sufficient training data for deep learning models. Specifically, models trained on limited data may not generalize well to other unseen data domains, resulting in a domain shift issue. Consequently, domain generalization (DG) is developed to boost the performance of segmentation models on unseen domains. However, the DG setup requires multiple source domains, which impedes the efficient deployment of segmentation algorithms in clinical scenarios. To address this challenge and improve the segmentation model's generalizability, we propose a novel approach called the Frequency-mixed Single-source Domain Generalization method (FreeSDG). By analyzing the frequency's effect on domain discrepancy, FreeSDG leverages a mixed frequency spectrum to augment the single-source domain. Additionally, self-supervision is constructed in the domain augmentation to learn robust context-aware representations for the segmentation task. Experimental results on five datasets of three modalities demonstrate the effectiveness of the proposed algorithm. FreeSDG outperforms state-of-the-art methods and significantly improves the segmentation model's generalizability. Therefore, FreeSDG provides a promising solution for enhancing the generalization of medical image segmentation models, especially when annotated data is scarce. The code is available at https://github.com/liamheng/Non-IID_Medical_Image_Segmentation.
Abstract:Fault diagnosis of rolling bearings is of great significance for post-maintenance in rotating machinery, but it is a challenging work to diagnose faults efficiently with a few samples. Additionally, faults commonly occur with randomness and fuzziness due to the complexity of the external environment and the structure of rolling bearings, hindering effective mining of fault characteristics and eventually restricting accuracy of fault diagnosis. To overcome these problems, stochastic configuration network (SCN) based cloud ensemble learning, called SCN-CEL, is developed in this work. Concretely, a cloud feature extraction method is first developed by using a backward cloud generator of normal cloud model to mine the uncertainty of fault information. Then, a cloud sampling method, which generates enough cloud droplets using bidirectional cloud generator, is proposed to extend the cloud feature samples. Finally, an ensemble model with SCNs is developed to comprehensively characterize the uncertainty of fault information and advance the generalization performance of fault diagnosis machine. Experimental results demonstrate that the proposed method indeed performs favorably for distinguishing fault categories of rolling bearings in the few shot scenarios.
Abstract:Deep neural networks (DNNs) have been widely applied in medical image classification and achieve remarkable classification performance. These achievements heavily depend on large-scale accurately annotated training data. However, label noise is inevitably introduced in the medical image annotation, as the labeling process heavily relies on the expertise and experience of annotators. Meanwhile, DNNs suffer from overfitting noisy labels, degrading the performance of models. Therefore, in this work, we innovatively devise noise-robust training approach to mitigate the adverse effects of noisy labels in medical image classification. Specifically, we incorporate contrastive learning and intra-group attention mixup strategies into the vanilla supervised learning. The contrastive learning for feature extractor helps to enhance visual representation of DNNs. The intra-group attention mixup module constructs groups and assigns self-attention weights for group-wise samples, and subsequently interpolates massive noisy-suppressed samples through weighted mixup operation. We conduct comparative experiments on both synthetic and real-world noisy medical datasets under various noise levels. Rigorous experiments validate that our noise-robust method with contrastive learning and attention mixup can effectively handle with label noise, and is superior to state-of-the-art methods. An ablation study also shows that both components contribute to boost model performance. The proposed method demonstrates its capability of curb label noise and has certain potential toward real-world clinic applications.
Abstract:This paper explores the task of radiology report generation, which aims at generating free-text descriptions for a set of radiographs. One significant challenge of this task is how to correctly maintain the consistency between the images and the lengthy report. Previous research explored solving this issue through planning-based methods, which generate reports only based on high-level plans. However, these plans usually only contain the major observations from the radiographs (e.g., lung opacity), lacking much necessary information, such as the observation characteristics and preliminary clinical diagnoses. To address this problem, the system should also take the image information into account together with the textual plan and perform stronger reasoning during the generation process. In this paper, we propose an observation-guided radiology report generation framework (ORGAN). It first produces an observation plan and then feeds both the plan and radiographs for report generation, where an observation graph and a tree reasoning mechanism are adopted to precisely enrich the plan information by capturing the multi-formats of each observation. Experimental results demonstrate that our framework outperforms previous state-of-the-art methods regarding text quality and clinical efficacy
Abstract:Few-shot named entity recognition (NER) exploits limited annotated instances to identify named mentions. Effectively transferring the internal or external resources thus becomes the key to few-shot NER. While the existing prompt tuning methods have shown remarkable few-shot performances, they still fail to make full use of knowledge. In this work, we investigate the integration of rich knowledge to prompt tuning for stronger few-shot NER. We propose incorporating the deep prompt tuning framework with threefold knowledge (namely TKDP), including the internal 1) context knowledge and the external 2) label knowledge & 3) sememe knowledge. TKDP encodes the three feature sources and incorporates them into the soft prompt embeddings, which are further injected into an existing pre-trained language model to facilitate predictions. On five benchmark datasets, our knowledge-enriched model boosts by at most 11.53% F1 over the raw deep prompt method, and significantly outperforms 8 strong-performing baseline systems in 5-/10-/20-shot settings, showing great potential in few-shot NER. Our TKDP can be broadly adapted to other few-shot tasks without effort.
Abstract:Robust and accurate segmentation for elongated physiological structures is challenging, especially in the ambiguous region, such as the corneal endothelium microscope image with uneven illumination or the fundus image with disease interference. In this paper, we present a spatial and scale uncertainty-aware network (SSU-Net) that fully uses both spatial and scale uncertainty to highlight ambiguous regions and integrate hierarchical structure contexts. First, we estimate epistemic and aleatoric spatial uncertainty maps using Monte Carlo dropout to approximate Bayesian networks. Based on these spatial uncertainty maps, we propose the gated soft uncertainty-aware (GSUA) module to guide the model to focus on ambiguous regions. Second, we extract the uncertainty under different scales and propose the multi-scale uncertainty-aware (MSUA) fusion module to integrate structure contexts from hierarchical predictions, strengthening the final prediction. Finally, we visualize the uncertainty map of final prediction, providing interpretability for segmentation results. Experiment results show that the SSU-Net performs best on cornea endothelial cell and retinal vessel segmentation tasks. Moreover, compared with counterpart uncertainty-based methods, SSU-Net is more accurate and robust.