Alert button
Picture for Haijun Zhang

Haijun Zhang

Alert button

Cut-and-Paste: Subject-Driven Video Editing with Attention Control

Nov 20, 2023
Zhichao Zuo, Zhao Zhang, Yan Luo, Yang Zhao, Haijun Zhang, Yi Yang, Meng Wang

This paper presents a novel framework termed Cut-and-Paste for real-word semantic video editing under the guidance of text prompt and additional reference image. While the text-driven video editing has demonstrated remarkable ability to generate highly diverse videos following given text prompts, the fine-grained semantic edits are hard to control by plain textual prompt only in terms of object details and edited region, and cumbersome long text descriptions are usually needed for the task. We therefore investigate subject-driven video editing for more precise control of both edited regions and background preservation, and fine-grained semantic generation. We achieve this goal by introducing an reference image as supplementary input to the text-driven video editing, which avoids racking your brain to come up with a cumbersome text prompt describing the detailed appearance of the object. To limit the editing area, we refer to a method of cross attention control in image editing and successfully extend it to video editing by fusing the attention map of adjacent frames, which strikes a balance between maintaining video background and spatio-temporal consistency. Compared with current methods, the whole process of our method is like ``cut" the source object to be edited and then ``paste" the target object provided by reference image. We demonstrate that our method performs favorably over prior arts for video editing under the guidance of text prompt and extra reference image, as measured by both quantitative and subjective evaluations.

Viaarxiv icon

IDVT: Interest-aware Denoising and View-guided Tuning for Social Recommendation

Aug 30, 2023
Dezhao Yang, Jianghong Ma, Shanshan Feng, Haijun Zhang, Zhao Zhang

Figure 1 for IDVT: Interest-aware Denoising and View-guided Tuning for Social Recommendation
Figure 2 for IDVT: Interest-aware Denoising and View-guided Tuning for Social Recommendation
Figure 3 for IDVT: Interest-aware Denoising and View-guided Tuning for Social Recommendation
Figure 4 for IDVT: Interest-aware Denoising and View-guided Tuning for Social Recommendation

In the information age, recommendation systems are vital for efficiently filtering information and identifying user preferences. Online social platforms have enriched these systems by providing valuable auxiliary information. Socially connected users are assumed to share similar preferences, enhancing recommendation accuracy and addressing cold start issues. However, empirical findings challenge the assumption, revealing that certain social connections can actually harm system performance. Our statistical analysis indicates a significant amount of noise in the social network, where many socially connected users do not share common interests. To address this issue, we propose an innovative \underline{I}nterest-aware \underline{D}enoising and \underline{V}iew-guided \underline{T}uning (IDVT) method for the social recommendation. The first ID part effectively denoises social connections. Specifically, the denoising process considers both social network structure and user interaction interests in a global view. Moreover, in this global view, we also integrate denoised social information (social domain) into the propagation of the user-item interactions (collaborative domain) and aggregate user representations from two domains using a gating mechanism. To tackle potential user interest loss and enhance model robustness within the global view, our second VT part introduces two additional views (local view and dropout-enhanced view) for fine-tuning user representations in the global view through contrastive learning. Extensive evaluations on real-world datasets with varying noise ratios demonstrate the superiority of IDVT over state-of-the-art social recommendation methods.

Viaarxiv icon

DRGame: Diversified Recommendation for Multi-category Video Games with Balanced Implicit Preferences

Aug 30, 2023
Kangzhe Liu, Jianghong Ma, Shanshan Feng, Haijun Zhang, Zhao Zhang

Figure 1 for DRGame: Diversified Recommendation for Multi-category Video Games with Balanced Implicit Preferences
Figure 2 for DRGame: Diversified Recommendation for Multi-category Video Games with Balanced Implicit Preferences
Figure 3 for DRGame: Diversified Recommendation for Multi-category Video Games with Balanced Implicit Preferences
Figure 4 for DRGame: Diversified Recommendation for Multi-category Video Games with Balanced Implicit Preferences

The growing popularity of subscription services in video game consumption has emphasized the importance of offering diversified recommendations. Providing users with a diverse range of games is essential for ensuring continued engagement and fostering long-term subscriptions. However, existing recommendation models face challenges in effectively handling highly imbalanced implicit feedback in gaming interactions. Additionally, they struggle to take into account the distinctive characteristics of multiple categories and the latent user interests associated with these categories. In response to these challenges, we propose a novel framework, named DRGame, to obtain diversified recommendation. It is centered on multi-category video games, consisting of two {components}: Balance-driven Implicit Preferences Learning for data pre-processing and Clustering-based Diversified Recommendation {Module} for final prediction. The first module aims to achieve a balanced representation of implicit feedback in game time, thereby discovering a comprehensive view of player interests across different categories. The second module adopts category-aware representation learning to cluster and select players and games based on balanced implicit preferences, and then employs asymmetric neighbor aggregation to achieve diversified recommendations. Experimental results on a real-world dataset demonstrate the superiority of our proposed method over existing approaches in terms of game diversity recommendations.

Viaarxiv icon

Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis

Mar 31, 2023
Yanjie Dong, Luya Wang, Yuanfang Chi, Jia Wang, Haijun Zhang, Fei Richard Yu, Victor C. M. Leung, Xiping Hu

Figure 1 for Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis
Figure 2 for Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis
Figure 3 for Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis
Figure 4 for Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis

A wireless federated learning system is investigated by allowing a server and workers to exchange uncoded information via orthogonal wireless channels. Since the workers frequently upload local gradients to the server via bandwidth-limited channels, the uplink transmission from the workers to the server becomes a communication bottleneck. Therefore, a one-shot distributed principle component analysis (PCA) is leveraged to reduce the dimension of uploaded gradients such that the communication bottleneck is relieved. A PCA-based wireless federated learning (PCA-WFL) algorithm and its accelerated version (i.e., PCA-AWFL) are proposed based on the low-dimensional gradients and the Nesterov's momentum. For the non-convex loss functions, a finite-time analysis is performed to quantify the impacts of system hyper-parameters on the convergence of the PCA-WFL and PCA-AWFL algorithms. The PCA-AWFL algorithm is theoretically certified to converge faster than the PCA-WFL algorithm. Besides, the convergence rates of PCA-WFL and PCA-AWFL algorithms quantitatively reveal the linear speedup with respect to the number of workers over the vanilla gradient descent algorithm. Numerical results are used to demonstrate the improved convergence rates of the proposed PCA-WFL and PCA-AWFL algorithms over the benchmarks.

Viaarxiv icon

DeepMA: End-to-end Deep Multiple Access for Wireless Image Transmission in Semantic Communication

Mar 21, 2023
Wenyu Zhang, Kaiyuan Bai, Sherali Zeadally, Haijun Zhang, Hua Shao, Hui Ma, Victor C. M. Leung

Figure 1 for DeepMA: End-to-end Deep Multiple Access for Wireless Image Transmission in Semantic Communication
Figure 2 for DeepMA: End-to-end Deep Multiple Access for Wireless Image Transmission in Semantic Communication
Figure 3 for DeepMA: End-to-end Deep Multiple Access for Wireless Image Transmission in Semantic Communication
Figure 4 for DeepMA: End-to-end Deep Multiple Access for Wireless Image Transmission in Semantic Communication

Semantic communication is a new paradigm that exploits deep learning models to enable end-to-end communications processes, and recent studies have shown that it can achieve better noise resiliency compared with traditional communication schemes in a low signal-to-noise (SNR) regime. To achieve multiple access in semantic communication, we propose a deep learning-based multiple access (DeepMA) method by training semantic communication models with the abilities of joint source-channel coding (JSCC) and orthogonal signal modulation. DeepMA is achieved by a DeepMA network (DMANet), which is comprised of several independent encoder-decoder pairs (EDPs), and the DeepMA encoders can encode the input data as mutually orthogonal semantic symbol vectors (SSVs) such that the DeepMA decoders can recover their own target data from a received mixed SSV (MSSV) superposed by multiple SSV components transmitted from different encoders. We describe frameworks of DeepMA in wireless device-to-device (D2D), downlink, and uplink channel multiplexing scenarios, along with the training algorithm. We evaluate the performance of the proposed DeepMA in wireless image transmission tasks and compare its performance with the attention module-based deep JSCC (ADJSCC) method and conventional communication schemes using better portable graphics (BPG) and Low-density parity-check code (LDPC). The results obtained show that the proposed DeepMA can achieve effective, flexible, and privacy-preserving channel multiplexing process, and demonstrate that our proposed DeepMA approach can yield comparable bandwidth efficiency compared with conventional multiple access schemes.

Viaarxiv icon

A Survey on Orthogonal Time Frequency Space: New Delay Doppler Communications Paradigm in 6G era

Nov 23, 2022
Weijie Yuan, Shuangyang Li, Zhiqiang Wei, Jiamo Jiang, Haijun Zhang, Pingzhi Fan

Figure 1 for A Survey on Orthogonal Time Frequency Space: New Delay Doppler Communications Paradigm in 6G era
Figure 2 for A Survey on Orthogonal Time Frequency Space: New Delay Doppler Communications Paradigm in 6G era
Figure 3 for A Survey on Orthogonal Time Frequency Space: New Delay Doppler Communications Paradigm in 6G era
Figure 4 for A Survey on Orthogonal Time Frequency Space: New Delay Doppler Communications Paradigm in 6G era

In 6G era, the space-air-ground integrated networks (SAGIN) are expected to provide global coverage and thus are required to support a wide range of emerging applications in hostile environments with high-mobility. In such scenarios, conventional orthogonal frequency division multiplexing (OFDM) modulation, which has been widely deployed in the cellular and Wi-Fi communications systems, will suffer from performance degradation due to high Doppler shift. To address this challenge, a new two-dimensional (2D) modulation scheme referred to as orthogonal time frequency space (OTFS) was proposed and has been recognized as an enabling technology for future high-mobility scenarios. In particular, OTFS modulates information in the delay-Doppler (DD) domain rather than the time-frequency (TF) domain for OFDM, providing the benefits of Doppler-resilience and delay-resilience, low signaling latency, low peak-to-average ratio (PAPR), and low-complexity implementation. Recent researches also show that the direct interaction of information and physical world in the DD domain makes OTFS an promising waveform for realizing integrated sensing and communications (ISAC). In this article, we will present a comprehensive survey of OTFS technology in 6G era, including the fundamentals, recent advances, and future works. Our aim is that this article could provide valuable references for all researchers working in the area of OTFS.

* Survey paper on OTFS, submitted to China Communications 
Viaarxiv icon

Long-Range Zero-Shot Generative Deep Network Quantization

Nov 17, 2022
Yan Luo, Yangcheng Gao, Zhao Zhang, Haijun Zhang, Mingliang Xu, Meng Wang

Figure 1 for Long-Range Zero-Shot Generative Deep Network Quantization
Figure 2 for Long-Range Zero-Shot Generative Deep Network Quantization
Figure 3 for Long-Range Zero-Shot Generative Deep Network Quantization
Figure 4 for Long-Range Zero-Shot Generative Deep Network Quantization

Quantization approximates a deep network model with floating-point numbers by the one with low bit width numbers, in order to accelerate inference and reduce computation. Quantizing a model without access to the original data, zero-shot quantization can be accomplished by fitting the real data distribution by data synthesis. However, zero-shot quantization achieves inferior performance compared to the post-training quantization with real data. We find it is because: 1) a normal generator is hard to obtain high diversity of synthetic data, since it lacks long-range information to allocate attention to global features; 2) the synthetic images aim to simulate the statistics of real data, which leads to weak intra-class heterogeneity and limited feature richness. To overcome these problems, we propose a novel deep network quantizer, dubbed Long-Range Zero-Shot Generative Deep Network Quantization (LRQ). Technically, we propose a long-range generator to learn long-range information instead of simple local features. In order for the synthetic data to contain more global features, long-range attention using large kernel convolution is incorporated into the generator. In addition, we also present an Adversarial Margin Add (AMA) module to force intra-class angular enlargement between feature vector and class center. As AMA increases the convergence difficulty of the loss function, which is opposite to the training objective of the original loss function, it forms an adversarial process. Furthermore, in order to transfer knowledge from the full-precision network, we also utilize a decoupled knowledge distillation. Extensive experiments demonstrate that LRQ obtains better performance than other competitors.

Viaarxiv icon

Network Topology Inference based on Timing Meta-Data

Oct 11, 2022
Wenbo Du, Tao Tan, Haijun Zhang, Xianbin Cao, Gang Yan, Osvaldo Simeone

Figure 1 for Network Topology Inference based on Timing Meta-Data
Figure 2 for Network Topology Inference based on Timing Meta-Data
Figure 3 for Network Topology Inference based on Timing Meta-Data
Figure 4 for Network Topology Inference based on Timing Meta-Data

Consider a processor having access only to meta-data consisting of the timings of data packets and acknowledgment (ACK) packets from all nodes in a network. The meta-data report the source node of each packet, but not the destination nodes or the contents of the packets. The goal of the processor is to infer the network topology based solely on such information. Prior work leveraged causality metrics to identify which links are active. If the data timings and ACK timings of two nodes -- say node 1 and node 2, respectively -- are causally related, this may be taken as evidence that node 1 is communicating to node 2 (which sends back ACK packets to node 1). This paper starts with the observation that packet losses can weaken the causality relationship between data and ACK timing streams. To obviate this problem, a new Expectation Maximization (EM)-based algorithm is introduced -- EM-causality discovery algorithm (EM-CDA) -- which treats packet losses as latent variables. EM-CDA iterates between the estimation of packet losses and the evaluation of causality metrics. The method is validated through extensive experiments in wireless sensor networks on the NS-3 simulation platform.

* submitted 
Viaarxiv icon

Uncertainty Minimization for Personalized Federated Semi-Supervised Learning

May 05, 2022
Yanhang Shi, Siguang Chen, Haijun Zhang

Figure 1 for Uncertainty Minimization for Personalized Federated Semi-Supervised Learning
Figure 2 for Uncertainty Minimization for Personalized Federated Semi-Supervised Learning
Figure 3 for Uncertainty Minimization for Personalized Federated Semi-Supervised Learning
Figure 4 for Uncertainty Minimization for Personalized Federated Semi-Supervised Learning

Since federated learning (FL) has been introduced as a decentralized learning technique with privacy preservation, statistical heterogeneity of distributed data stays the main obstacle to achieve robust performance and stable convergence in FL applications. Model personalization methods have been studied to overcome this problem. However, existing approaches are mainly under the prerequisite of fully labeled data, which is unrealistic in practice due to the requirement of expertise. The primary issue caused by partial-labeled condition is that, clients with deficient labeled data can suffer from unfair performance gain because they lack adequate insights of local distribution to customize the global model. To tackle this problem, 1) we propose a novel personalized semi-supervised learning paradigm which allows partial-labeled or unlabeled clients to seek labeling assistance from data-related clients (helper agents), thus to enhance their perception of local data; 2) based on this paradigm, we design an uncertainty-based data-relation metric to ensure that selected helpers can provide trustworthy pseudo labels instead of misleading the local training; 3) to mitigate the network overload introduced by helper searching, we further develop a helper selection protocol to achieve efficient communication with negligible performance sacrifice. Experiments show that our proposed method can obtain superior performance and more stable convergence than other related works with partial labeled data, especially in highly heterogeneous setting.

* 11 pages 
Viaarxiv icon

ClusterQ: Semantic Feature Distribution Alignment for Data-Free Quantization

Apr 30, 2022
Yangcheng Gao, Zhao Zhang, Richang Hong, Haijun Zhang, Jicong Fan, Shuicheng Yan, Meng Wang

Figure 1 for ClusterQ: Semantic Feature Distribution Alignment for Data-Free Quantization
Figure 2 for ClusterQ: Semantic Feature Distribution Alignment for Data-Free Quantization
Figure 3 for ClusterQ: Semantic Feature Distribution Alignment for Data-Free Quantization
Figure 4 for ClusterQ: Semantic Feature Distribution Alignment for Data-Free Quantization

Network quantization has emerged as a promising method for model compression and inference acceleration. However, tradtional quantization methods (such as quantization aware training and post training quantization) require original data for the fine-tuning or calibration of quantized model, which makes them inapplicable to the cases that original data are not accessed due to privacy or security. This gives birth to the data-free quantization with synthetic data generation. While current DFQ methods still suffer from severe performance degradation when quantizing a model into lower bit, caused by the low inter-class separability of semantic features. To this end, we propose a new and effective data-free quantization method termed ClusterQ, which utilizes the semantic feature distribution alignment for synthetic data generation. To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics to imitate the distribution of real data, so that the performance degradation is alleviated. Moreover, we incorporate the intra-class variance to solve class-wise mode collapse. We also employ the exponential moving average to update the centroid of each cluster for further feature distribution improvement. Extensive experiments across various deep models (e.g., ResNet-18 and MobileNet-V2) over the ImageNet dataset demonstrate that our ClusterQ obtains state-of-the-art performance.

Viaarxiv icon