Fundus photography is prone to suffer from image quality degradation that impacts clinical examination performed by ophthalmologists or intelligent systems. Though enhancement algorithms have been developed to promote fundus observation on degraded images, high data demands and limited applicability hinder their clinical deployment. To circumvent this bottleneck, a generic fundus image enhancement network (GFE-Net) is developed in this study to robustly correct unknown fundus images without supervised or extra data. Levering image frequency information, self-supervised representation learning is conducted to learn robust structure-aware representations from degraded images. Then with a seamless architecture that couples representation learning and image enhancement, GFE-Net can accurately correct fundus images and meanwhile preserve retinal structures. Comprehensive experiments are implemented to demonstrate the effectiveness and advantages of GFE-Net. Compared with state-of-the-art algorithms, GFE-Net achieves superior performance in data dependency, enhancement performance, deployment efficiency, and scale generalizability. Follow-up fundus image analysis is also facilitated by GFE-Net, whose modules are respectively verified to be effective for image enhancement.
Given the prevalence of rolling bearing fault diagnosis as a practical issue across various working conditions, the limited availability of samples compounds the challenge. Additionally, the complexity of the external environment and the structure of rolling bearings often manifests faults characterized by randomness and fuzziness, hindering the effective extraction of fault characteristics and restricting the accuracy of fault diagnosis. To overcome these problems, this paper presents a novel approach termed constructive Incremental learning-based ensemble domain adaptation (CIL-EDA) approach. Specifically, it is implemented on stochastic configuration networks (SCN) to constructively improve its adaptive performance in multi-domains. Concretely, a cloud feature extraction method is employed in conjunction with wavelet packet decomposition (WPD) to capture the uncertainty of fault information from multiple resolution aspects. Subsequently, constructive Incremental learning-based domain adaptation (CIL-DA) is firstly developed to enhance the cross-domain learning capability of each hidden node through domain matching and construct a robust fault classifier by leveraging limited labeled data from both target and source domains. Finally, fault diagnosis results are obtained by a majority voting of CIL-EDA which integrates CIL-DA and parallel ensemble learning. Experimental results demonstrate that our CIL-DA outperforms several domain adaptation methods and CIL-EDA consistently outperforms state-of-art fault diagnosis methods in few-shot scenarios.
The annotation scarcity of medical image segmentation poses challenges in collecting sufficient training data for deep learning models. Specifically, models trained on limited data may not generalize well to other unseen data domains, resulting in a domain shift issue. Consequently, domain generalization (DG) is developed to boost the performance of segmentation models on unseen domains. However, the DG setup requires multiple source domains, which impedes the efficient deployment of segmentation algorithms in clinical scenarios. To address this challenge and improve the segmentation model's generalizability, we propose a novel approach called the Frequency-mixed Single-source Domain Generalization method (FreeSDG). By analyzing the frequency's effect on domain discrepancy, FreeSDG leverages a mixed frequency spectrum to augment the single-source domain. Additionally, self-supervision is constructed in the domain augmentation to learn robust context-aware representations for the segmentation task. Experimental results on five datasets of three modalities demonstrate the effectiveness of the proposed algorithm. FreeSDG outperforms state-of-the-art methods and significantly improves the segmentation model's generalizability. Therefore, FreeSDG provides a promising solution for enhancing the generalization of medical image segmentation models, especially when annotated data is scarce. The code is available at https://github.com/liamheng/Non-IID_Medical_Image_Segmentation.
Fault diagnosis of rolling bearings is of great significance for post-maintenance in rotating machinery, but it is a challenging work to diagnose faults efficiently with a few samples. Additionally, faults commonly occur with randomness and fuzziness due to the complexity of the external environment and the structure of rolling bearings, hindering effective mining of fault characteristics and eventually restricting accuracy of fault diagnosis. To overcome these problems, stochastic configuration network (SCN) based cloud ensemble learning, called SCN-CEL, is developed in this work. Concretely, a cloud feature extraction method is first developed by using a backward cloud generator of normal cloud model to mine the uncertainty of fault information. Then, a cloud sampling method, which generates enough cloud droplets using bidirectional cloud generator, is proposed to extend the cloud feature samples. Finally, an ensemble model with SCNs is developed to comprehensively characterize the uncertainty of fault information and advance the generalization performance of fault diagnosis machine. Experimental results demonstrate that the proposed method indeed performs favorably for distinguishing fault categories of rolling bearings in the few shot scenarios.
Deep neural networks (DNNs) have been widely applied in medical image classification and achieve remarkable classification performance. These achievements heavily depend on large-scale accurately annotated training data. However, label noise is inevitably introduced in the medical image annotation, as the labeling process heavily relies on the expertise and experience of annotators. Meanwhile, DNNs suffer from overfitting noisy labels, degrading the performance of models. Therefore, in this work, we innovatively devise noise-robust training approach to mitigate the adverse effects of noisy labels in medical image classification. Specifically, we incorporate contrastive learning and intra-group attention mixup strategies into the vanilla supervised learning. The contrastive learning for feature extractor helps to enhance visual representation of DNNs. The intra-group attention mixup module constructs groups and assigns self-attention weights for group-wise samples, and subsequently interpolates massive noisy-suppressed samples through weighted mixup operation. We conduct comparative experiments on both synthetic and real-world noisy medical datasets under various noise levels. Rigorous experiments validate that our noise-robust method with contrastive learning and attention mixup can effectively handle with label noise, and is superior to state-of-the-art methods. An ablation study also shows that both components contribute to boost model performance. The proposed method demonstrates its capability of curb label noise and has certain potential toward real-world clinic applications.
This paper explores the task of radiology report generation, which aims at generating free-text descriptions for a set of radiographs. One significant challenge of this task is how to correctly maintain the consistency between the images and the lengthy report. Previous research explored solving this issue through planning-based methods, which generate reports only based on high-level plans. However, these plans usually only contain the major observations from the radiographs (e.g., lung opacity), lacking much necessary information, such as the observation characteristics and preliminary clinical diagnoses. To address this problem, the system should also take the image information into account together with the textual plan and perform stronger reasoning during the generation process. In this paper, we propose an observation-guided radiology report generation framework (ORGAN). It first produces an observation plan and then feeds both the plan and radiographs for report generation, where an observation graph and a tree reasoning mechanism are adopted to precisely enrich the plan information by capturing the multi-formats of each observation. Experimental results demonstrate that our framework outperforms previous state-of-the-art methods regarding text quality and clinical efficacy
Few-shot named entity recognition (NER) exploits limited annotated instances to identify named mentions. Effectively transferring the internal or external resources thus becomes the key to few-shot NER. While the existing prompt tuning methods have shown remarkable few-shot performances, they still fail to make full use of knowledge. In this work, we investigate the integration of rich knowledge to prompt tuning for stronger few-shot NER. We propose incorporating the deep prompt tuning framework with threefold knowledge (namely TKDP), including the internal 1) context knowledge and the external 2) label knowledge & 3) sememe knowledge. TKDP encodes the three feature sources and incorporates them into the soft prompt embeddings, which are further injected into an existing pre-trained language model to facilitate predictions. On five benchmark datasets, our knowledge-enriched model boosts by at most 11.53% F1 over the raw deep prompt method, and significantly outperforms 8 strong-performing baseline systems in 5-/10-/20-shot settings, showing great potential in few-shot NER. Our TKDP can be broadly adapted to other few-shot tasks without effort.
Robust and accurate segmentation for elongated physiological structures is challenging, especially in the ambiguous region, such as the corneal endothelium microscope image with uneven illumination or the fundus image with disease interference. In this paper, we present a spatial and scale uncertainty-aware network (SSU-Net) that fully uses both spatial and scale uncertainty to highlight ambiguous regions and integrate hierarchical structure contexts. First, we estimate epistemic and aleatoric spatial uncertainty maps using Monte Carlo dropout to approximate Bayesian networks. Based on these spatial uncertainty maps, we propose the gated soft uncertainty-aware (GSUA) module to guide the model to focus on ambiguous regions. Second, we extract the uncertainty under different scales and propose the multi-scale uncertainty-aware (MSUA) fusion module to integrate structure contexts from hierarchical predictions, strengthening the final prediction. Finally, we visualize the uncertainty map of final prediction, providing interpretability for segmentation results. Experiment results show that the SSU-Net performs best on cornea endothelial cell and retinal vessel segmentation tasks. Moreover, compared with counterpart uncertainty-based methods, SSU-Net is more accurate and robust.
Detecting 3D mask attacks to a face recognition system is challenging. Although genuine faces and 3D face masks show significantly different remote photoplethysmography (rPPG) signals, rPPG-based face anti-spoofing methods often suffer from performance degradation due to unstable face alignment in the video sequence and weak rPPG signals. To enhance the rPPG signal in a motion-robust way, a landmark-anchored face stitching method is proposed to align the faces robustly and precisely at the pixel-wise level by using both SIFT keypoints and facial landmarks. To better encode the rPPG signal, a weighted spatial-temporal representation is proposed, which emphasizes the face regions with rich blood vessels. In addition, characteristics of rPPG signals in different color spaces are jointly utilized. To improve the generalization capability, a lightweight EfficientNet with a Gated Recurrent Unit (GRU) is designed to extract both spatial and temporal features from the rPPG spatial-temporal representation for classification. The proposed method is compared with the state-of-the-art methods on five benchmark datasets under both intra-dataset and cross-dataset evaluations. The proposed method shows a significant and consistent improvement in performance over other state-of-the-art rPPG-based methods for face spoofing detection.
The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. Several attempts have been made to protect individuals from being identified by unauthorized FR systems utilizing adversarial attacks to generate encrypted face images. However, existing methods suffer from poor visual quality or low attack success rates, which limit their utility. Recently, diffusion models have achieved tremendous success in image generation. In this work, we ask: can diffusion models be used to generate adversarial examples to improve both visual quality and attack performance? We propose DiffProtect, which utilizes a diffusion autoencoder to generate semantically meaningful perturbations on FR systems. Extensive experiments demonstrate that DiffProtect produces more natural-looking encrypted images than state-of-the-art methods while achieving significantly higher attack success rates, e.g., 24.5% and 25.1% absolute improvements on the CelebA-HQ and FFHQ datasets.