Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhidong Zhao

HAM: A Training-Free Style Transfer Approach via Heterogeneous Attention Modulation for Diffusion Models

Mar 25, 2026

Yeqi He, Liang Li, Zhiwen Yang, Xichun Sheng, Zhidong Zhao, Chenggang Yan

Abstract:Diffusion models have demonstrated remarkable performance in image generation, particularly within the domain of style transfer. Prevailing style transfer approaches typically leverage pre-trained diffusion models' robust feature extraction capabilities alongside external modular control pathways to explicitly impose style guidance signals. However, these methods often fail to capture complex style reference or retain the identity of user-provided content images, thus falling into the trap of style-content balance. Thus, we propose a training-free style transfer approach via $\textbf{h}$eterogeneous $\textbf{a}$ttention $\textbf{m}$odulation ($\textbf{HAM}$) to protect identity information during image/text-guided style reference transfer, thereby addressing the style-content trade-off challenge. Specifically, we first introduces style noise initialization to initialize latent noise for diffusion. Then, during the diffusion process, it innovatively employs HAM for different attention mechanisms, including Global Attention Regulation (GAR) and Local Attention Transplantation (LAT), which better preserving the details of the content image while capturing complex style references. Our approach is validated through a series of qualitative and quantitative experiments, achieving state-of-the-art performance on multiple quantitative metrics.

* Accepted in CVPR 2026 Findings

Via

Access Paper or Ask Questions

GoAgent: Group-of-Agents Communication Topology Generation for LLM-based Multi-Agent Systems

Mar 20, 2026

Hongjiang Chen, Xin Zheng, Yixin Liu, Pengfei Jiao, Shiyuan Li, Huan Liu, Zhidong Zhao, Ziqi Xu, Ibrahim Khalil, Shirui Pan

Abstract:Large language model (LLM)-based multi-agent systems (MAS) have demonstrated exceptional capabilities in solving complex tasks, yet their effectiveness depends heavily on the underlying communication topology that coordinates agent interactions. Within these systems, successful problem-solving often necessitates task-specific group structures to divide and conquer subtasks. However, most existing approaches generate communication topologies in a node-centric manner, leaving group structures to emerge implicitly from local connectivity decisions rather than modeling them explicitly, often leading to suboptimal coordination and unnecessary communication overhead. To address this limitation, we propose GoAgent (Group-of-Agents), a communication topology generation method that explicitly treats collaborative groups as the atomic units of MAS construction. Specifically, GoAgent first enumerates task-relevant candidate groups through an LLM and then autoregressively selects and connects these groups as atomic units to construct the final communication graph, jointly capturing intra-group cohesion and inter-group coordination. To mitigate communication redundancy and noise propagation inherent in expanding topologies, we further introduce a conditional information bottleneck (CIB) objective that compresses inter-group communication, preserving task-relevant signals while filtering out redundant historical noise. Extensive experiments on six benchmarks demonstrate the state-of-the-art performance of GoAgent with 93.84% average accuracy while reducing token consumption by about 17%.

Via

Access Paper or Ask Questions

A Survey on Temporal Interaction Graph Representation Learning: Progress, Challenges, and Opportunities

May 07, 2025

Pengfei Jiao, Hongjiang Chen, Xuan Guo, Zhidong Zhao, Dongxiao He, Di Jin

Abstract:Temporal interaction graphs (TIGs), defined by sequences of timestamped interaction events, have become ubiquitous in real-world applications due to their capability to model complex dynamic system behaviors. As a result, temporal interaction graph representation learning (TIGRL) has garnered significant attention in recent years. TIGRL aims to embed nodes in TIGs into low-dimensional representations that effectively preserve both structural and temporal information, thereby enhancing the performance of downstream tasks such as classification, prediction, and clustering within constantly evolving data environments. In this paper, we begin by introducing the foundational concepts of TIGs and emphasize the critical role of temporal dependencies. We then propose a comprehensive taxonomy of state-of-the-art TIGRL methods, systematically categorizing them based on the types of information utilized during the learning process to address the unique challenges inherent to TIGs. To facilitate further research and practical applications, we curate the source of datasets and benchmarks, providing valuable resources for empirical investigations. Finally, we examine key open challenges and explore promising research directions in TIGRL, laying the groundwork for future advancements that have the potential to shape the evolution of this field.

* IJCAI 2025 Survey Track

Via

Access Paper or Ask Questions

Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection

Jan 01, 2025

Hao Wang, Cheng Deng, Zhidong Zhao

Figure 1 for Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection

Figure 2 for Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection

Figure 3 for Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection

Figure 4 for Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection

Abstract:Recent generative models demonstrate impressive performance on synthesizing photographic images, which makes humans hardly to distinguish them from pristine ones, especially on realistic-looking synthetic facial images. Previous works mostly focus on mining discriminative artifacts from vast amount of visual data. However, they usually lack the exploration of prior knowledge and rarely pay attention to the domain shift between training categories (e.g., natural and indoor objects) and testing ones (e.g., fine-grained human facial images), resulting in unsatisfactory detection performance. To address these issues, we propose a novel knowledge-guided prompt learning method for deepfake facial image detection. Specifically, we retrieve forgery-related prompts from large language models as expert knowledge to guide the optimization of learnable prompts. Besides, we elaborate test-time prompt tuning to alleviate the domain shift, achieving significant performance improvement and boosting the application in real-world scenarios. Extensive experiments on DeepFakeFaceForensics dataset show that our proposed approach notably outperforms state-of-the-art methods.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

Informative Subgraphs Aware Masked Auto-Encoder in Dynamic Graphs

Sep 14, 2024

Pengfe Jiao, Xinxun Zhang, Mengzhou Gao, Tianpeng Li, Zhidong Zhao

Figure 1 for Informative Subgraphs Aware Masked Auto-Encoder in Dynamic Graphs

Figure 2 for Informative Subgraphs Aware Masked Auto-Encoder in Dynamic Graphs

Figure 3 for Informative Subgraphs Aware Masked Auto-Encoder in Dynamic Graphs

Figure 4 for Informative Subgraphs Aware Masked Auto-Encoder in Dynamic Graphs

Abstract:Generative self-supervised learning (SSL), especially masked autoencoders (MAE), has greatly succeeded and garnered substantial research interest in graph machine learning. However, the research of MAE in dynamic graphs is still scant. This gap is primarily due to the dynamic graph not only possessing topological structure information but also encapsulating temporal evolution dependency. Applying a random masking strategy which most MAE methods adopt to dynamic graphs will remove the crucial subgraph that guides the evolution of dynamic graphs, resulting in the loss of crucial spatio-temporal information in node representations. To bridge this gap, in this paper, we propose a novel Informative Subgraphs Aware Masked Auto-Encoder in Dynamic Graph, namely DyGIS. Specifically, we introduce a constrained probabilistic generative model to generate informative subgraphs that guide the evolution of dynamic graphs, successfully alleviating the issue of missing dynamic evolution subgraphs. The informative subgraph identified by DyGIS will serve as the input of dynamic graph masked autoencoder (DGMAE), effectively ensuring the integrity of the evolutionary spatio-temporal information within dynamic graphs. Extensive experiments on eleven datasets demonstrate that DyGIS achieves state-of-the-art performance across multiple tasks.

Via

Access Paper or Ask Questions

Accurate Segmentation of Optic Disc And Cup from Multiple Pseudo-labels by Noise-Aware Learning

Nov 30, 2023

Tengjin Weng, Yang Shen, Zhidong Zhao, Zhiming Cheng, Shuai Wang

Figure 1 for Accurate Segmentation of Optic Disc And Cup from Multiple Pseudo-labels by Noise-Aware Learning

Figure 2 for Accurate Segmentation of Optic Disc And Cup from Multiple Pseudo-labels by Noise-Aware Learning

Figure 3 for Accurate Segmentation of Optic Disc And Cup from Multiple Pseudo-labels by Noise-Aware Learning

Figure 4 for Accurate Segmentation of Optic Disc And Cup from Multiple Pseudo-labels by Noise-Aware Learning

Abstract:Optic disc and cup segmentation play a crucial role in automating the screening and diagnosis of optic glaucoma. While data-driven convolutional neural networks (CNNs) show promise in this area, the inherent ambiguity of segmenting object and background boundaries in the task of optic disc and cup segmentation leads to noisy annotations that impact model performance. To address this, we propose an innovative label-denoising method of Multiple Pseudo-labels Noise-aware Network (MPNN) for accurate optic disc and cup segmentation. Specifically, the Multiple Pseudo-labels Generation and Guided Denoising (MPGGD) module generates pseudo-labels by multiple different initialization networks trained on true labels, and the pixel-level consensus information extracted from these pseudo-labels guides to differentiate clean pixels from noisy pixels. The training framework of the MPNN is constructed by a teacher-student architecture to learn segmentation from clean pixels and noisy pixels. Particularly, such a framework adeptly leverages (i) reliable and fundamental insights from clean pixels and (ii) the supplementary knowledge within noisy pixels via multiple perturbation-based unsupervised consistency. Compared to other label-denoising methods, comprehensive experimental results on the RIGA dataset demonstrate our method's excellent performance and significant denoising ability.

* to CSCWD 2024

Via

Access Paper or Ask Questions

MSE-Nets: Multi-annotated Semi-supervised Ensemble Networks for Improving Segmentation of Medical Image with Ambiguous Boundaries

Nov 17, 2023

Shuai Wang, Tengjin Weng, Jingyi Wang, Yang Shen, Zhidong Zhao, Yixiu Liu, Pengfei Jiao, Zhiming Cheng, Yaqi Wang

Figure 1 for MSE-Nets: Multi-annotated Semi-supervised Ensemble Networks for Improving Segmentation of Medical Image with Ambiguous Boundaries

Figure 2 for MSE-Nets: Multi-annotated Semi-supervised Ensemble Networks for Improving Segmentation of Medical Image with Ambiguous Boundaries

Figure 3 for MSE-Nets: Multi-annotated Semi-supervised Ensemble Networks for Improving Segmentation of Medical Image with Ambiguous Boundaries

Figure 4 for MSE-Nets: Multi-annotated Semi-supervised Ensemble Networks for Improving Segmentation of Medical Image with Ambiguous Boundaries

Abstract:Medical image segmentation annotations exhibit variations among experts due to the ambiguous boundaries of segmented objects and backgrounds in medical images. Although using multiple annotations for each image in the fully-supervised has been extensively studied for training deep models, obtaining a large amount of multi-annotated data is challenging due to the substantial time and manpower costs required for segmentation annotations, resulting in most images lacking any annotations. To address this, we propose Multi-annotated Semi-supervised Ensemble Networks (MSE-Nets) for learning segmentation from limited multi-annotated and abundant unannotated data. Specifically, we introduce the Network Pairwise Consistency Enhancement (NPCE) module and Multi-Network Pseudo Supervised (MNPS) module to enhance MSE-Nets for the segmentation task by considering two major factors: (1) to optimize the utilization of all accessible multi-annotated data, the NPCE separates (dis)agreement annotations of multi-annotated data at the pixel level and handles agreement and disagreement annotations in different ways, (2) to mitigate the introduction of imprecise pseudo-labels, the MNPS extends the training data by leveraging consistent pseudo-labels from unannotated data. Finally, we improve confidence calibration by averaging the predictions of base networks. Experiments on the ISIC dataset show that we reduced the demand for multi-annotated data by 97.75\% and narrowed the gap with the best fully-supervised baseline to just a Jaccard index of 4\%. Furthermore, compared to other semi-supervised methods that rely only on a single annotation or a combined fusion approach, the comprehensive experimental results on ISIC and RIGA datasets demonstrate the superior performance of our proposed method in medical image segmentation with ambiguous boundaries.

Via

Access Paper or Ask Questions