Alert button
Picture for Huali Xu

Huali Xu

Alert button

Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey

Mar 15, 2023
Huali Xu, Shuaifeng Zhi, Shuzhou Sun, Vishal M. Patel, Li Liu

Figure 1 for Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey
Figure 2 for Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey
Figure 3 for Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey
Figure 4 for Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey

Deep learning has been highly successful in computer vision with large amounts of labeled data, but struggles with limited labeled training data. To address this, Few-shot learning (FSL) is proposed, but it assumes that all samples (including source and target task data, where target tasks are performed with prior knowledge from source ones) are from the same domain, which is a stringent assumption in the real world. To alleviate this limitation, Cross-domain few-shot learning (CDFSL) has gained attention as it allows source and target data from different domains and label spaces. This paper provides a comprehensive review of CDFSL at the first time, which has received far less attention than FSL due to its unique setup and difficulties. We expect this paper to serve as both a position paper and a tutorial for those doing research in CDFSL. This review first introduces the definition of CDFSL and the issues involved, followed by the core scientific question and challenge. A comprehensive review of validated CDFSL approaches from the existing literature is then presented, along with their detailed descriptions based on a rigorous taxonomy. Furthermore, this paper outlines and discusses several promising directions of CDFSL that deserve further scientific investigation, covering aspects of problem setups, applications and theories.

* 35 pages, 14 figures, 8 tables 
Viaarxiv icon

Cross-Domain Few-Shot Classification via Inter-Source Stylization

Aug 17, 2022
Huali Xu, Li Liu

Figure 1 for Cross-Domain Few-Shot Classification via Inter-Source Stylization
Figure 2 for Cross-Domain Few-Shot Classification via Inter-Source Stylization
Figure 3 for Cross-Domain Few-Shot Classification via Inter-Source Stylization
Figure 4 for Cross-Domain Few-Shot Classification via Inter-Source Stylization

Cross-Domain Few Shot Classification (CDFSC) leverages prior knowledge learned from a supervised auxiliary dataset to solve a target task with limited supervised information available, where the auxiliary and target datasets come from the different domains. It is challenging due to the domain shift between these datasets. Inspired by Multisource Domain Adaptation (MDA), the recent works introduce the multiple domains to improve the performance. However, they, on the one hand, evaluate only on the benchmark with natural images, and on the other hand, they need many annotations even in the source domains can be costly. To address the above mentioned issues, this paper explore a new Multisource CDFSC setting (MCDFSC) where only one source domain is fully labeled while the rest source domains remain unlabeled. These sources are from different fileds, means they are not only natural images. Considering the inductive bias of CNNs, this paper proposed Inter-Source stylization network (ISSNet) for this new MCDFSC setting. It transfers the styles of unlabeled sources to labeled source, which expands the distribution of labeled source and further improves the model generalization ability. Experiments on 8 target datasets demonstrate ISSNet effectively suppresses the performance degradation caused by different domains.

* 10 pages 
Viaarxiv icon

An Edge Information and Mask Shrinking Based Image Inpainting Approach

Jun 11, 2020
Huali Xu, Xiangdong Su, Meng Wang, Xiang Hao, Guanglai Gao

Figure 1 for An Edge Information and Mask Shrinking Based Image Inpainting Approach
Figure 2 for An Edge Information and Mask Shrinking Based Image Inpainting Approach
Figure 3 for An Edge Information and Mask Shrinking Based Image Inpainting Approach
Figure 4 for An Edge Information and Mask Shrinking Based Image Inpainting Approach

In the image inpainting task, the ability to repair both high-frequency and low-frequency information in the missing regions has a substantial influence on the quality of the restored image. However, existing inpainting methods usually fail to consider both high-frequency and low-frequency information simultaneously. To solve this problem, this paper proposes edge information and mask shrinking based image inpainting approach, which consists of two models. The first model is an edge generation model used to generate complete edge information from the damaged image, and the second model is an image completion model used to fix the missing regions with the generated edge information and the valid contents of the damaged image. The mask shrinking strategy is employed in the image completion model to track the areas to be repaired. The proposed approach is evaluated qualitatively and quantitatively on the dataset Places2. The result shows our approach outperforms state-of-the-art methods.

* Accepted by ICME2020 
Viaarxiv icon

SNR-based teachers-student technique for speech enhancement

May 29, 2020
Xiang Hao, Xiangdong Su, Zhiyu Wang, Qiang Zhang, Huali Xu, Guanglai Gao

Figure 1 for SNR-based teachers-student technique for speech enhancement
Figure 2 for SNR-based teachers-student technique for speech enhancement
Figure 3 for SNR-based teachers-student technique for speech enhancement
Figure 4 for SNR-based teachers-student technique for speech enhancement

It is very challenging for speech enhancement methods to achieves robust performance under both high signal-to-noise ratio (SNR) and low SNR simultaneously. In this paper, we propose a method that integrates an SNR-based teachers-student technique and time-domain U-Net to deal with this problem. Specifically, this method consists of multiple teacher models and a student model. We first train the teacher models under multiple small-range SNRs that do not coincide with each other so that they can perform speech enhancement well within the specific SNR range. Then, we choose different teacher models to supervise the training of the student model according to the SNR of the training data. Eventually, the student model can perform speech enhancement under both high SNR and low SNR. To evaluate the proposed method, we constructed a dataset with an SNR ranging from -20dB to 20dB based on the public dataset. We experimentally analyzed the effectiveness of the SNR-based teachers-student technique and compared the proposed method with several state-of-the-art methods.

* Accepted to 2020 IEEE International Conference on Multimedia and Expo (ICME 2020) 
Viaarxiv icon