Alert button
Picture for Zhigang Chang

Zhigang Chang

Alert button

Shanghai Jiao Tong University

Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline

Jul 19, 2023
Zhigang Chang, Weitai Hu, Qing Yang, Shibao Zheng

Figure 1 for Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline
Figure 2 for Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline
Figure 3 for Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline
Figure 4 for Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline

In dyadic speaker-listener interactions, the listener's head reactions along with the speaker's head movements, constitute an important non-verbal semantic expression together. The listener Head generation task aims to synthesize responsive listener's head videos based on audios of the speaker and reference images of the listener. Compared to the Talking-head generation, it is more challenging to capture the correlation clues from the speaker's audio and visual information. Following the ViCo baseline scheme, we propose a high-performance solution by enhancing the hierarchical semantic extraction capability of the audio encoder module and improving the decoder part, renderer and post-processing modules. Our solution gets the first place on the official leaderboard for the track of listening head generation. This paper is a technical report of ViCo@2023 Conversational Head Generation Challenge in ACM Multimedia 2023 conference.

* ACM MM 2023 
Viaarxiv icon

Seq-Masks: Bridging the gap between appearance and gait modeling for video-based person re-identification

Dec 10, 2021
Zhigang Chang, Zhao Yang, Yongbiao Chen, Qin Zhou, Shibao Zheng

Figure 1 for Seq-Masks: Bridging the gap between appearance and gait modeling for video-based person re-identification
Figure 2 for Seq-Masks: Bridging the gap between appearance and gait modeling for video-based person re-identification
Figure 3 for Seq-Masks: Bridging the gap between appearance and gait modeling for video-based person re-identification
Figure 4 for Seq-Masks: Bridging the gap between appearance and gait modeling for video-based person re-identification

ideo-based person re-identification (Re-ID) aims to match person images in video sequences captured by disjoint surveillance cameras. Traditional video-based person Re-ID methods focus on exploring appearance information, thus, vulnerable against illumination changes, scene noises, camera parameters, and especially clothes/carrying variations. Gait recognition provides an implicit biometric solution to alleviate the above headache. Nonetheless, it experiences severe performance degeneration as camera view varies. In an attempt to address these problems, in this paper, we propose a framework that utilizes the sequence masks (SeqMasks) in the video to integrate appearance information and gait modeling in a close fashion. Specifically, to sufficiently validate the effectiveness of our method, we build a novel dataset named MaskMARS based on MARS. Comprehensive experiments on our proposed large wild video Re-ID dataset MaskMARS evidenced our extraordinary performance and generalization capability. Validations on the gait recognition metric CASIA-B dataset further demonstrated the capability of our hybrid model.

* ICASSP2021 Submission 
Viaarxiv icon

TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval

May 05, 2021
Yongbiao Chen, Sheng Zhang, Fangxin Liu, Zhigang Chang, Mang Ye, Zhengwei Qi

Figure 1 for TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval
Figure 2 for TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval
Figure 3 for TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval
Figure 4 for TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval

Deep hamming hashing has gained growing popularity in approximate nearest neighbour search for large-scale image retrieval. Until now, the deep hashing for the image retrieval community has been dominated by convolutional neural network architectures, e.g. \texttt{Resnet}\cite{he2016deep}. In this paper, inspired by the recent advancements of vision transformers, we present \textbf{Transhash}, a pure transformer-based framework for deep hashing learning. Concretely, our framework is composed of two major modules: (1) Based on \textit{Vision Transformer} (ViT), we design a siamese vision transformer backbone for image feature extraction. To learn fine-grained features, we innovate a dual-stream feature learning on top of the transformer to learn discriminative global and local features. (2) Besides, we adopt a Bayesian learning scheme with a dynamically constructed similarity matrix to learn compact binary hash codes. The entire framework is jointly trained in an end-to-end manner.~To the best of our knowledge, this is the first work to tackle deep hashing learning problems without convolutional neural networks (\textit{CNNs}). We perform comprehensive experiments on three widely-studied datasets: \textbf{CIFAR-10}, \textbf{NUSWIDE} and \textbf{IMAGENET}. The experiments have evidenced our superiority against the existing state-of-the-art deep hashing methods. Specifically, we achieve 8.2\%, 2.6\%, 12.7\% performance gains in terms of average \textit{mAP} for different hash bit lengths on three public datasets, respectively.

Viaarxiv icon

Distribution Context Aware Loss for Person Re-identification

Nov 17, 2019
Zhigang Chang, Qin Zhou, Mingyang Yu, Shibao Zheng, Hua Yang, Tai-Pang Wu

Figure 1 for Distribution Context Aware Loss for Person Re-identification
Figure 2 for Distribution Context Aware Loss for Person Re-identification
Figure 3 for Distribution Context Aware Loss for Person Re-identification
Figure 4 for Distribution Context Aware Loss for Person Re-identification

To learn the optimal similarity function between probe and gallery images in Person re-identification, effective deep metric learning methods have been extensively explored to obtain discriminative feature embedding. However, existing metric loss like triplet loss and its variants always emphasize pair-wise relations but ignore the distribution context in feature space, leading to inconsistency and sub-optimal. In fact, the similarity of one pair not only decides the match of this pair, but also has potential impacts on other sample pairs. In this paper, we propose a novel Distribution Context Aware (DCA) loss based on triplet loss to combine both numerical similarity and relation similarity in feature space for better clustering. Extensive experiments on three benchmarks including Market-1501, DukeMTMC-reID and MSMT17, evidence the favorable performance of our method against the corresponding baseline and other state-of-the-art methods.

* IEEE VCIP 
Viaarxiv icon