Alert button
Picture for Yingying Zhu

Yingying Zhu

Alert button

Generalized Minimum Error with Fiducial Points Criterion for Robust Learning

Sep 09, 2023
Haiquan Zhao, Yuan Gao, Yingying Zhu

Figure 1 for Generalized Minimum Error with Fiducial Points Criterion for Robust Learning
Figure 2 for Generalized Minimum Error with Fiducial Points Criterion for Robust Learning
Figure 3 for Generalized Minimum Error with Fiducial Points Criterion for Robust Learning
Figure 4 for Generalized Minimum Error with Fiducial Points Criterion for Robust Learning

The conventional Minimum Error Entropy criterion (MEE) has its limitations, showing reduced sensitivity to error mean values and uncertainty regarding error probability density function locations. To overcome this, a MEE with fiducial points criterion (MEEF), was presented. However, the efficacy of the MEEF is not consistent due to its reliance on a fixed Gaussian kernel. In this paper, a generalized minimum error with fiducial points criterion (GMEEF) is presented by adopting the Generalized Gaussian Density (GGD) function as kernel. The GGD extends the Gaussian distribution by introducing a shape parameter that provides more control over the tail behavior and peakedness. In addition, due to the high computational complexity of GMEEF criterion, the quantized idea is introduced to notably lower the computational load of the GMEEF-type algorithm. Finally, the proposed criterions are introduced to the domains of adaptive filter, kernel recursive algorithm, and multilayer perceptron. Several numerical simulations, which contain system identification, acoustic echo cancellation, times series prediction, and supervised classification, indicate that the novel algorithms' performance performs excellently.

* 12 pages, 9 figures 
Viaarxiv icon

Expert Uncertainty and Severity Aware Chest X-Ray Classification by Multi-Relationship Graph Learning

Sep 06, 2023
Mengliang Zhang, Xinyue Hu, Lin Gu, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Ronald M. Summers, Yingying Zhu

Figure 1 for Expert Uncertainty and Severity Aware Chest X-Ray Classification by Multi-Relationship Graph Learning
Figure 2 for Expert Uncertainty and Severity Aware Chest X-Ray Classification by Multi-Relationship Graph Learning
Figure 3 for Expert Uncertainty and Severity Aware Chest X-Ray Classification by Multi-Relationship Graph Learning
Figure 4 for Expert Uncertainty and Severity Aware Chest X-Ray Classification by Multi-Relationship Graph Learning

Patients undergoing chest X-rays (CXR) often endure multiple lung diseases. When evaluating a patient's condition, due to the complex pathologies, subtle texture changes of different lung lesions in images, and patient condition differences, radiologists may make uncertain even when they have experienced long-term clinical training and professional guidance, which makes much noise in extracting disease labels based on CXR reports. In this paper, we re-extract disease labels from CXR reports to make them more realistic by considering disease severity and uncertainty in classification. Our contributions are as follows: 1. We re-extracted the disease labels with severity and uncertainty by a rule-based approach with keywords discussed with clinical experts. 2. To further improve the explainability of chest X-ray diagnosis, we designed a multi-relationship graph learning method with an expert uncertainty-aware loss function. 3. Our multi-relationship graph learning method can also interpret the disease classification results. Our experimental results show that models considering disease severity and uncertainty outperform previous state-of-the-art methods.

Viaarxiv icon

Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering

Jul 22, 2023
Xinyue Hu, Lin Gu, Qiyuan An, Mengliang Zhang, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Ronald M. Summers, Yingying Zhu

Figure 1 for Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
Figure 2 for Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
Figure 3 for Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
Figure 4 for Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering

To contribute to automating the medical vision-language model, we propose a novel Chest-Xray Difference Visual Question Answering (VQA) task. Given a pair of main and reference images, this task attempts to answer several questions on both diseases and, more importantly, the differences between them. This is consistent with the radiologist's diagnosis practice that compares the current image with the reference before concluding the report. We collect a new dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs of main and reference images. Compared to existing medical VQA datasets, our questions are tailored to the Assessment-Diagnosis-Intervention-Evaluation treatment procedure used by clinical professionals. Meanwhile, we also propose a novel expert knowledge-aware graph representation learning model to address this task. The proposed baseline model leverages expert knowledge such as anatomical structure prior, semantic, and spatial knowledge to construct a multi-relationship graph, representing the image differences between two images for the image difference VQA task. The dataset and code can be found at https://github.com/Holipori/MIMIC-Diff-VQA. We believe this work would further push forward the medical vision language model.

Viaarxiv icon

Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning

Feb 19, 2023
Xinyue Hu, Lin Gu, Kazuma Kobayashi, Qiyuan An, Qingyu Chen, Zhiyong Lu, Chang Su, Tatsuya Harada, Yingying Zhu

Figure 1 for Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning
Figure 2 for Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning
Figure 3 for Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning
Figure 4 for Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning

Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the public health system, particularly in resource-poor countries. Existing medical VQA methods tend to encode medical images and learn the correspondence between visual features and questions without exploiting the spatial, semantic, or medical knowledge behind them. This is partially because of the small size of the current medical VQA dataset, which often includes simple questions. Therefore, we first collected a comprehensive and large-scale medical VQA dataset, focusing on chest X-ray images. The questions involved detailed relationships, such as disease names, locations, levels, and types in our dataset. Based on this dataset, we also propose a novel baseline method by constructing three different relationship graphs: spatial relationship, semantic relationship, and implicit relationship graphs on the image regions, questions, and semantic labels. The answer and graph reasoning paths are learned for different questions.

Viaarxiv icon

Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization

Feb 03, 2023
Yingying Zhu, Hongji Yang, Yuxin Lu, Qiang Huang

Figure 1 for Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization
Figure 2 for Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization
Figure 3 for Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization
Figure 4 for Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization

In this work, we aim at an important but less explored problem of a simple yet effective backbone specific for cross-view geo-localization task. Existing methods for cross-view geo-localization tasks are frequently characterized by 1) complicated methodologies, 2) GPU-consuming computations, and 3) a stringent assumption that aerial and ground images are centrally or orientation aligned. To address the above three challenges for cross-view image matching, we propose a new backbone network, named Simple Attention-based Image Geo-localization network (SAIG). The proposed SAIG effectively represents long-range interactions among patches as well as cross-view correspondence with multi-head self-attention layers. The "narrow-deep" architecture of our SAIG improves the feature richness without degradation in performance, while its shallow and effective convolutional stem preserves the locality, eliminating the loss of patchify boundary information. Our SAIG achieves state-of-the-art results on cross-view geo-localization, while being far simpler than previous works. Furthermore, with only 15.9% of the model parameters and half of the output dimension compared to the state-of-the-art, the SAIG adapts well across multiple cross-view datasets without employing any well-designed feature aggregation modules or feature alignment algorithms. In addition, our SAIG attains competitive scores on image retrieval benchmarks, further demonstrating its generalizability. As a backbone network, our SAIG is both easy to follow and computationally lightweight, which is meaningful in practical scenario. Moreover, we propose a simple Spatial-Mixed feature aggregation moDule (SMD) that can mix and project spatial information into a low-dimensional space to generate feature descriptors... (The code is available at https://github.com/yanghongji2007/SAIG)

* Under Review 
Viaarxiv icon

Unified Multi-View Orthonormal Non-Negative Graph Based Clustering Framework

Nov 03, 2022
Liangchen Liu, Qiuhong Ke, Chaojie Li, Feiping Nie, Yingying Zhu

Figure 1 for Unified Multi-View Orthonormal Non-Negative Graph Based Clustering Framework
Figure 2 for Unified Multi-View Orthonormal Non-Negative Graph Based Clustering Framework
Figure 3 for Unified Multi-View Orthonormal Non-Negative Graph Based Clustering Framework
Figure 4 for Unified Multi-View Orthonormal Non-Negative Graph Based Clustering Framework

Spectral clustering is an effective methodology for unsupervised learning. Most traditional spectral clustering algorithms involve a separate two-step procedure and apply the transformed new representations for the final clustering results. Recently, much progress has been made to utilize the non-negative feature property in real-world data and to jointly learn the representation and clustering results. However, to our knowledge, no previous work considers a unified model that incorporates the important multi-view information with those properties, which severely limits the performance of existing methods. In this paper, we formulate a novel clustering model, which exploits the non-negative feature property and, more importantly, incorporates the multi-view information into a unified joint learning framework: the unified multi-view orthonormal non-negative graph based clustering framework (Umv-ONGC). Then, we derive an effective three-stage iterative solution for the proposed model and provide analytic solutions for the three sub-problems from the three stages. We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features. Extensive experiments on three benchmark data sets demonstrate the effectiveness of the proposed method.

Viaarxiv icon

See Blue Sky: Deep Image Dehaze Using Paired and Unpaired Training Images

Oct 14, 2022
Xiaoyan Zhang, Gaoyang Tang, Yingying Zhu, Qi Tian

Figure 1 for See Blue Sky: Deep Image Dehaze Using Paired and Unpaired Training Images
Figure 2 for See Blue Sky: Deep Image Dehaze Using Paired and Unpaired Training Images
Figure 3 for See Blue Sky: Deep Image Dehaze Using Paired and Unpaired Training Images
Figure 4 for See Blue Sky: Deep Image Dehaze Using Paired and Unpaired Training Images

The issue of image haze removal has attracted wide attention in recent years. However, most existing haze removal methods cannot restore the scene with clear blue sky, since the color and texture information of the object in the original haze image is insufficient. To remedy this, we propose a cycle generative adversarial network to construct a novel end-to-end image dehaze model. We adopt outdoor image datasets to train our model, which includes a set of real-world unpaired image dataset and a set of paired image dataset to ensure that the generated images are close to the real scene. Based on the cycle structure, our model adds four different kinds of loss function to constrain the effect including adversarial loss, cycle consistency loss, photorealism loss and paired L1 loss. These four constraints can improve the overall quality of such degraded images for better visual appeal and ensure reconstruction of images to keep from distortion. The proposed model could remove the haze of images and also restore the sky of images to be clean and blue (like captured in a sunny weather).

Viaarxiv icon

Few-Shot Classification with Contrastive Learning

Sep 17, 2022
Zhanyuan Yang, Jinghua Wang, Yingying Zhu

Figure 1 for Few-Shot Classification with Contrastive Learning
Figure 2 for Few-Shot Classification with Contrastive Learning
Figure 3 for Few-Shot Classification with Contrastive Learning
Figure 4 for Few-Shot Classification with Contrastive Learning

A two-stage training paradigm consisting of sequential pre-training and meta-training stages has been widely used in current few-shot learning (FSL) research. Many of these methods use self-supervised learning and contrastive learning to achieve new state-of-the-art results. However, the potential of contrastive learning in both stages of FSL training paradigm is still not fully exploited. In this paper, we propose a novel contrastive learning-based framework that seamlessly integrates contrastive learning into both stages to improve the performance of few-shot classification. In the pre-training stage, we propose a self-supervised contrastive loss in the forms of feature vector vs. feature map and feature map vs. feature map, which uses global and local information to learn good initial representations. In the meta-training stage, we propose a cross-view episodic training mechanism to perform the nearest centroid classification on two different views of the same episode and adopt a distance-scaled contrastive loss based on them. These two strategies force the model to overcome the bias between views and promote the transferability of representations. Extensive experiments on three benchmark datasets demonstrate that our method achieves competitive results.

* To appear in ECCV 2022 
Viaarxiv icon

Memory Efficient Temporal & Visual Graph Model for Unsupervised Video Domain Adaptation

Aug 13, 2022
Xinyue Hu, Lin Gu, Liangchen Liu, Ruijiang Li, Chang Su, Tatsuya Harada, Yingying Zhu

Figure 1 for Memory Efficient Temporal & Visual Graph Model for Unsupervised Video Domain Adaptation
Figure 2 for Memory Efficient Temporal & Visual Graph Model for Unsupervised Video Domain Adaptation
Figure 3 for Memory Efficient Temporal & Visual Graph Model for Unsupervised Video Domain Adaptation
Figure 4 for Memory Efficient Temporal & Visual Graph Model for Unsupervised Video Domain Adaptation

Existing video domain adaption (DA) methods need to store all temporal combinations of video frames or pair the source and target videos, which are memory cost expensive and can't scale up to long videos. To address these limitations, we propose a memory-efficient graph-based video DA approach as follows. At first our method models each source or target video by a graph: nodes represent video frames and edges represent the temporal or visual similarity relationship between frames. We use a graph attention network to learn the weight of individual frames and simultaneously align the source and target video into a domain-invariant graph feature space. Instead of storing a large number of sub-videos, our method only constructs one graph with a graph attention mechanism for one video, reducing the memory cost substantially. The extensive experiments show that, compared with the state-of-art methods, we achieved superior performance while reducing the memory cost significantly.

Viaarxiv icon

Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection

Aug 05, 2022
Ziteng Cui, Yingying Zhu, Lin Gu, Guo-Jun Qi, Xiaoxiao Li, Renrui Zhang, Zenghui Zhang, Tatsuya Harada

Figure 1 for Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection
Figure 2 for Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection
Figure 3 for Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection
Figure 4 for Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection

Image restoration algorithms such as super resolution (SR) are indispensable pre-processing modules for object detection in low quality images. Most of these algorithms assume the degradation is fixed and known a priori. However, in practical, either the real degradation or optimal up-sampling ratio rate is unknown or differs from assumption, leading to a deteriorating performance for both the pre-processing module and the consequent high-level task such as object detection. Here, we propose a novel self-supervised framework to detect objects in degraded low resolution images. We utilizes the downsampling degradation as a kind of transformation for self-supervised signals to explore the equivariant representation against various resolutions and other degradation conditions. The Auto Encoding Resolution in Self-supervision (AERIS) framework could further take the advantage of advanced SR architectures with an arbitrary resolution restoring decoder to reconstruct the original correspondence from the degraded input image. Both the representation learning and object detection are optimized jointly in an end-to-end training fashion. The generic AERIS framework could be implemented on various mainstream object detection architectures with different backbones. The extensive experiments show that our methods has achieved superior performance compared with existing methods when facing variant degradation situations. Code would be released at https://github.com/cuiziteng/ECCV_AERIS.

* Accepted by ECCV 2022. arXiv admin note: substantial text overlap with arXiv:2201.02314 
Viaarxiv icon