Alert button
Picture for Xiaoyu Du

Xiaoyu Du

Alert button

Delving into Multimodal Prompting for Fine-grained Visual Classification

Sep 16, 2023
Xin Jiang, Hao Tang, Junyao Gao, Xiaoyu Du, Shengfeng He, Zechao Li

Fine-grained visual classification (FGVC) involves categorizing fine subdivisions within a broader category, which poses challenges due to subtle inter-class discrepancies and large intra-class variations. However, prevailing approaches primarily focus on uni-modal visual concepts. Recent advancements in pre-trained vision-language models have demonstrated remarkable performance in various high-level vision tasks, yet the applicability of such models to FGVC tasks remains uncertain. In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model. Our MP-FGVC comprises a multimodal prompts scheme and a multimodal adaptation scheme. The former includes Subcategory-specific Vision Prompt (SsVP) and Discrepancy-aware Text Prompt (DaTP), which explicitly highlights the subcategory-specific discrepancies from the perspectives of both vision and language. The latter aligns the vision and text prompting elements in a common semantic space, facilitating cross-modal collaborative reasoning through a Vision-Language Fusion Module (VLFM) for further improvement on FGVC. Moreover, we tailor a two-stage optimization strategy for MP-FGVC to fully leverage the pre-trained CLIP model and expedite efficient adaptation for FGVC. Extensive experiments conducted on four FGVC datasets demonstrate the effectiveness of our MP-FGVC.

* The first two authors contributed equally to this work 
Viaarxiv icon

Triplet Contrastive Learning for Unsupervised Vehicle Re-identification

Jan 23, 2023
Fei Shen, Xiaoyu Du, Liyan Zhang, Jinhui Tang

Figure 1 for Triplet Contrastive Learning for Unsupervised Vehicle Re-identification
Figure 2 for Triplet Contrastive Learning for Unsupervised Vehicle Re-identification
Figure 3 for Triplet Contrastive Learning for Unsupervised Vehicle Re-identification
Figure 4 for Triplet Contrastive Learning for Unsupervised Vehicle Re-identification

Part feature learning is a critical technology for finegrained semantic understanding in vehicle re-identification. However, recent unsupervised re-identification works exhibit serious gradient collapse issues when directly modeling the part features and global features. To address this problem, in this paper, we propose a novel Triplet Contrastive Learning framework (TCL) which leverages cluster features to bridge the part features and global features. Specifically, TCL devises three memory banks to store the features according to their attributes and proposes a proxy contrastive loss (PCL) to make contrastive learning between adjacent memory banks, thus presenting the associations between the part and global features as a transition of the partcluster and cluster-global associations. Since the cluster memory bank deals with all the instance features, it can summarize them into a discriminative feature representation. To deeply exploit the instance information, TCL proposes two additional loss functions. For the inter-class instance, a hybrid contrastive loss (HCL) re-defines the sample correlations by approaching the positive cluster features and leaving the all negative instance features. For the intra-class instances, a weighted regularization cluster contrastive loss (WRCCL) refines the pseudo labels by penalizing the mislabeled images according to the instance similarity. Extensive experiments show that TCL outperforms many state-of-the-art unsupervised vehicle re-identification approaches. The code will be available at https://github.com/muzishen/TCL.

* Code: https://github.com/muzishen/TCL 
Viaarxiv icon

BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature Fusion for Deep Exemplar-based Video Colorization

Dec 05, 2022
Yixin Yang, Zhongzheng Peng, Xiaoyu Du, Zhulin Tao, Jinhui Tang, Jinshan Pan

Figure 1 for BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature Fusion for Deep Exemplar-based Video Colorization
Figure 2 for BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature Fusion for Deep Exemplar-based Video Colorization
Figure 3 for BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature Fusion for Deep Exemplar-based Video Colorization
Figure 4 for BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature Fusion for Deep Exemplar-based Video Colorization

How to effectively explore the colors of reference exemplars and propagate them to colorize each frame is vital for exemplar-based video colorization. In this paper, we present an effective BiSTNet to explore colors of reference exemplars and utilize them to help video colorization by a bidirectional temporal feature fusion with the guidance of semantic image prior. We first establish the semantic correspondence between each frame and the reference exemplars in deep feature space to explore color information from reference exemplars. Then, to better propagate the colors of reference exemplars into each frame and avoid the inaccurate matches colors from exemplars we develop a simple yet effective bidirectional temporal feature fusion module to better colorize each frame. We note that there usually exist color-bleeding artifacts around the boundaries of the important objects in videos. To overcome this problem, we further develop a mixed expert block to extract semantic information for modeling the object boundaries of frames so that the semantic image prior can better guide the colorization process for better performance. In addition, we develop a multi-scale recurrent block to progressively colorize frames in a coarse-to-fine manner. Extensive experimental results demonstrate that the proposed BiSTNet performs favorably against state-of-the-art methods on the benchmark datasets. Our code will be made available at \url{https://yyang181.github.io/BiSTNet/}

* Project website: \url{https://yyang181.github.io/BiSTNet/} 
Viaarxiv icon

A Competitive Method for Dog Nose-print Re-identification

Jun 01, 2022
Fei Shen, Zhe Wang, Zijun Wang, Xiaode Fu, Jiayi Chen, Xiaoyu Du, Jinhui Tang

Figure 1 for A Competitive Method for Dog Nose-print Re-identification
Figure 2 for A Competitive Method for Dog Nose-print Re-identification
Figure 3 for A Competitive Method for Dog Nose-print Re-identification
Figure 4 for A Competitive Method for Dog Nose-print Re-identification

Vision-based pattern identification (such as face, fingerprint, iris etc.) has been successfully applied in human biometrics for a long history. However, dog nose-print authentication is a challenging problem since the lack of a large amount of labeled data. For that, this paper presents our proposed methods for dog nose-print authentication (Re-ID) task in CVPR 2022 pet biometric challenge. First, considering the problem that each class only with few samples in the training set, we propose an automatic offline data augmentation strategy. Then, for the difference in sample styles between the training and test datasets, we employ joint cross-entropy, triplet and pair-wise circle losses function for network optimization. Finally, with multiple models ensembled adopted, our methods achieve 86.67\% AUC on the test set. Codes are available at https://github.com/muzishen/Pet-ReID-IMAG.

* 3rd place solution to 2022 Pet Biometric Challenge (CVPRW). The source code and trained models can be obtained at this https://github.com/muzishen/Pet-ReID-IMAG 
Viaarxiv icon

Automated Artefact Relevancy Determination from Artefact Metadata and Associated Timeline Events

Dec 02, 2020
Xiaoyu Du, Quan Le, Mark Scanlon

Figure 1 for Automated Artefact Relevancy Determination from Artefact Metadata and Associated Timeline Events
Figure 2 for Automated Artefact Relevancy Determination from Artefact Metadata and Associated Timeline Events
Figure 3 for Automated Artefact Relevancy Determination from Artefact Metadata and Associated Timeline Events

Case-hindering, multi-year digital forensic evidence backlogs have become commonplace in law enforcement agencies throughout the world. This is due to an ever-growing number of cases requiring digital forensic investigation coupled with the growing volume of data to be processed per case. Leveraging previously processed digital forensic cases and their component artefact relevancy classifications can facilitate an opportunity for training automated artificial intelligence based evidence processing systems. These can significantly aid investigators in the discovery and prioritisation of evidence. This paper presents one approach for file artefact relevancy determination building on the growing trend towards a centralised, Digital Forensics as a Service (DFaaS) paradigm. This approach enables the use of previously encountered pertinent files to classify newly discovered files in an investigation. Trained models can aid in the detection of these files during the acquisition stage, i.e., during their upload to a DFaaS system. The technique generates a relevancy score for file similarity using each artefact's filesystem metadata and associated timeline events. The approach presented is validated against three experimental usage scenarios.

* The 6th IEEE International Conference on Cyber Security and Protection of Digital Services (Cyber Security), Dublin, Ireland, June 2020  
Viaarxiv icon

SoK: Exploring the State of the Art and the Future Potential of Artificial Intelligence in Digital Forensic Investigation

Dec 02, 2020
Xiaoyu Du, Chris Hargreaves, John Sheppard, Felix Anda, Asanka Sayakkara, Nhien-An Le-Khac, Mark Scanlon

Multi-year digital forensic backlogs have become commonplace in law enforcement agencies throughout the globe. Digital forensic investigators are overloaded with the volume of cases requiring their expertise compounded by the volume of data to be processed. Artificial intelligence is often seen as the solution to many big data problems. This paper summarises existing artificial intelligence based tools and approaches in digital forensics. Automated evidence processing leveraging artificial intelligence based techniques shows great promise in expediting the digital forensic analysis process while increasing case processing capacities. For each application of artificial intelligence highlighted, a number of current challenges and future potential impact is discussed.

* The 15th International ARES Conference on Availability, Reliability and Security, August 25--28, 2020  
Viaarxiv icon

Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts

Jul 02, 2019
Xiaoyu Du, Mark Scanlon

Figure 1 for Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts
Figure 2 for Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts
Figure 3 for Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts
Figure 4 for Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts

The ever increasing volume of data in digital forensic investigation is one of the most discussed challenges in the field. Usually, most of the file artefacts on seized devices are not pertinent to the investigation. Manually retrieving suspicious files relevant to the investigation is akin to finding a needle in a haystack. In this paper, a methodology for the automatic prioritisation of suspicious file artefacts (i.e., file artefacts that are pertinent to the investigation) is proposed to reduce the manual analysis effort required. This methodology is designed to work in a human-in-the-loop fashion. In other words, it predicts/recommends that an artefact is likely to be suspicious rather than giving the final analysis result. A supervised machine learning approach is employed, which leverages the recorded results of previously processed cases. The process of features extraction, dataset generation, training and evaluation are presented in this paper. In addition, a toolkit for data extraction from disk images is outlined, which enables this method to be integrated with the conventional investigation process and work in an automated fashion.

* 14th International Conference on Availability, Reliability and Security (ARES 2019), Canterbury, UK, August 2019  
Viaarxiv icon

Modeling Embedding Dimension Correlations via Convolutional Neural Collaborative Filtering

Jun 26, 2019
Xiaoyu Du, Xiangnan He, Fajie Yuan, Jinhui Tang, Zhiguang Qin, Tat-Seng Chua

Figure 1 for Modeling Embedding Dimension Correlations via Convolutional Neural Collaborative Filtering
Figure 2 for Modeling Embedding Dimension Correlations via Convolutional Neural Collaborative Filtering
Figure 3 for Modeling Embedding Dimension Correlations via Convolutional Neural Collaborative Filtering
Figure 4 for Modeling Embedding Dimension Correlations via Convolutional Neural Collaborative Filtering

As the core of recommender system, collaborative filtering (CF) models the affinity between a user and an item from historical user-item interactions, such as clicks, purchases, and so on. Benefited from the strong representation power, neural networks have recently revolutionized the recommendation research, setting up a new standard for CF. However, existing neural recommender models do not explicitly consider the correlations among embedding dimensions, making them less effective in modeling the interaction function between users and items. In this work, we emphasize on modeling the correlations among embedding dimensions in neural networks to pursue higher effectiveness for CF. We propose a novel and general neural collaborative filtering framework, namely ConvNCF, which is featured with two designs: 1) applying outer product on user embedding and item embedding to explicitly model the pairwise correlations between embedding dimensions, and 2) employing convolutional neural network above the outer product to learn the high-order correlations among embedding dimensions. To justify our proposal, we present three instantiations of ConvNCF by using different inputs to represent a user and conduct experiments on two real-world datasets. Extensive results verify the utility of modeling embedding dimension correlations with ConvNCF, which outperforms several competitive CF methods.

* TOIS Minor 
Viaarxiv icon

Fast Matrix Factorization with Non-Uniform Weights on Missing Data

Nov 11, 2018
Xiangnan He, Jinhui Tang, Xiaoyu Du, Richang Hong, Tongwei Ren, Tat-Seng Chua

Figure 1 for Fast Matrix Factorization with Non-Uniform Weights on Missing Data
Figure 2 for Fast Matrix Factorization with Non-Uniform Weights on Missing Data
Figure 3 for Fast Matrix Factorization with Non-Uniform Weights on Missing Data
Figure 4 for Fast Matrix Factorization with Non-Uniform Weights on Missing Data

Matrix factorization (MF) has been widely used to discover the low-rank structure and to predict the missing entries of data matrix. In many real-world learning systems, the data matrix can be very high-dimensional but sparse. This poses an imbalanced learning problem, since the scale of missing entries is usually much larger than that of observed entries, but they cannot be ignored due to the valuable negative signal. For efficiency concern, existing work typically applies a uniform weight on missing entries to allow a fast learning algorithm. However, this simplification will decrease modeling fidelity, resulting in suboptimal performance for downstream applications. In this work, we weight the missing data non-uniformly, and more generically, we allow any weighting strategy on the missing data. To address the efficiency challenge, we propose a fast learning method, for which the time complexity is determined by the number of observed entries in the data matrix, rather than the matrix size. The key idea is two-fold: 1) we apply truncated SVD on the weight matrix to get a more compact representation of the weights, and 2) we learn MF parameters with element-wise alternating least squares (eALS) and memorize the key intermediate variables to avoid repeating computations that are unnecessary. We conduct extensive experiments on two recommendation benchmarks, demonstrating the correctness, efficiency, and effectiveness of our fast eALS method.

* IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 
Viaarxiv icon

Outer Product-based Neural Collaborative Filtering

Aug 12, 2018
Xiangnan He, Xiaoyu Du, Xiang Wang, Feng Tian, Jinhui Tang, Tat-Seng Chua

Figure 1 for Outer Product-based Neural Collaborative Filtering
Figure 2 for Outer Product-based Neural Collaborative Filtering
Figure 3 for Outer Product-based Neural Collaborative Filtering
Figure 4 for Outer Product-based Neural Collaborative Filtering

In this work, we contribute a new multi-layer neural network architecture named ONCF to perform collaborative filtering. The idea is to use an outer product to explicitly model the pairwise correlations between the dimensions of the embedding space. In contrast to existing neural recommender models that combine user embedding and item embedding via a simple concatenation or element-wise product, our proposal of using outer product above the embedding layer results in a two-dimensional interaction map that is more expressive and semantically plausible. Above the interaction map obtained by outer product, we propose to employ a convolutional neural network to learn high-order correlations among embedding dimensions. Extensive experiments on two public implicit feedback data demonstrate the effectiveness of our proposed ONCF framework, in particular, the positive effect of using outer product to model the correlations between embedding dimensions in the low level of multi-layer neural recommender model. The experiment codes are available at: https://github.com/duxy-me/ConvNCF

* IJCAI 2018 
Viaarxiv icon