Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Linlin Zong

Mamba-VMR: Multimodal Query Augmentation via Generated Videos for Precise Temporal Grounding

Mar 23, 2026

Yunzhuo Sun, Xinyue Liu, Yanyang Li, Nanding Wu, Yifang Xu, Linlin Zong, Xianchao Zhang, Wenxin Liang

Abstract:Text-driven video moment retrieval (VMR) remains challenging due to limited capture of hidden temporal dynamics in untrimmed videos, leading to imprecise grounding in long sequences. Traditional methods rely on natural language queries (NLQs) or static image augmentations, overlooking motion sequences and suffering from high computational costs in Transformer-based architectures. Existing approaches fail to integrate subtitle contexts and generated temporal priors effectively, we therefore propose a novel two-stage framework for enhanced temporal grounding. In the first stage, LLM-guided subtitle matching identifies relevant textual cues from video subtitles, fused with the query to generate auxiliary short videos via text-to-video models, capturing implicit motion information as temporal priors. In the second stage, augmented queries are processed through a multi-modal controlled Mamba network, extending text-controlled selection with video-guided gating for efficient fusion of generated priors and long sequences while filtering noise. Our framework is agnostic to base retrieval models and widely applicable for multimodal VMR. Experimental evaluations on the TVR benchmark demonstrate significant improvements over state-of-the-art methods, including reduced computational overhead and higher recall in long-sequence grounding.

* The paper is accepted by CVPR-2026

Via

Access Paper or Ask Questions

Boosting Meta-Learning for Few-Shot Text Classification via Label-guided Distance Scaling

Feb 28, 2026

Yunlong Gao, Xinyue Liu, Yingbo Wang, Linlin Zong, Bo Xu

Abstract:Few-shot text classification aims to recognize unseen classes with limited labeled text samples. Existing approaches focus on boosting meta-learners by developing complex algorithms in the training stage. However, the labeled samples are randomly selected during the testing stage, so they may not provide effective supervision signals, leading to misclassification. To address this issue, we propose a \textbf{L}abel-guided \textbf{D}istance \textbf{S}caling (LDS) strategy. The core of our method is exploiting label semantics as supervision signals in both the training and testing stages. Specifically, in the training stage, we design a label-guided loss to inject label semantic information, pulling closer the sample representations and corresponding label representations. In the testing stage, we propose a Label-guided Scaler which scales sample representations with label semantics to provide additional supervision signals. Thus, even if labeled sample representations are far from class centers, our Label-guided Scaler pulls them closer to their class centers, thereby mitigating the misclassification. We combine two common meta-learners to verify the effectiveness of the method. Extensive experimental results demonstrate that our approach significantly outperforms state-of-the-art models. All datasets and codes are available at https://anonymous.4open.science/r/Label-guided-Text-Classification.

Via

Access Paper or Ask Questions

Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection

Jan 24, 2025

Bo Xu, Qiujie Xie, Jiahui Zhou, Linlin Zong

Figure 1 for Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection

Figure 2 for Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection

Figure 3 for Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection

Figure 4 for Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection

Abstract:Multimodal fake news detection has become one of the most crucial issues on social media platforms. Although existing methods have achieved advanced performance, two main challenges persist: (1) Under-performed multimodal news information fusion due to model architecture solidification, and (2) weak generalization ability on partial-modality contained fake news. To meet these challenges, we propose a novel and flexible triple path enhanced neural architecture search model MUSE. MUSE includes two dynamic paths for detecting partial-modality contained fake news and a static path for exploiting potential multimodal correlations. Experimental results show that MUSE achieves stable performance improvement over the baselines.

* This paper has been accepted into the IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP 2024)

Via

Access Paper or Ask Questions

Hawkes based Representation Learning for Reasoning over Scale-free Community-structured Temporal Knowledge Graphs

Dec 28, 2024

Yuwei Du, Xinyue Liu, Wenxin Liang, Linlin Zong, Xianchao Zhang

Abstract:Temporal knowledge graph (TKG) reasoning has become a hot topic due to its great value in many practical tasks. The key to TKG reasoning is modeling the structural information and evolutional patterns of the TKGs. While great efforts have been devoted to TKG reasoning, the structural and evolutional characteristics of real-world networks have not been considered. In the aspect of structure, real-world networks usually exhibit clear community structure and scale-free (long-tailed distribution) properties. In the aspect of evolution, the impact of an event decays with the time elapsing. In this paper, we propose a novel TKG reasoning model called Hawkes process-based Evolutional Representation Learning Network (HERLN), which learns structural information and evolutional patterns of a TKG simultaneously, considering the characteristics of real-world networks: community structure, scale-free and temporal decaying. First, we find communities in the input TKG to make the encoding get more similar intra-community embeddings. Second, we design a Hawkes process-based relational graph convolutional network to cope with the event impact-decaying phenomenon. Third, we design a conditional decoding method to alleviate biases towards frequent entities caused by long-tailed distribution. Experimental results show that HERLN achieves significant improvements over the state-of-the-art models.

Via

Access Paper or Ask Questions

Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks

Oct 14, 2024

Xinyue Liu, Yunlong Gao, Linlin Zong, Bo Xu

Figure 1 for Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks

Figure 2 for Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks

Figure 3 for Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks

Figure 4 for Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks

Abstract:Meta-learning has emerged as a prominent technology for few-shot text classification and has achieved promising performance. However, existing methods often encounter difficulties in drawing accurate class prototypes from support set samples, primarily due to probable large intra-class differences and small inter-class differences within the task. Recent approaches attempt to incorporate external knowledge or pre-trained language models to augment data, but this requires additional resources and thus does not suit many few-shot scenarios. In this paper, we propose a novel solution to address this issue by adequately leveraging the information within the task itself. Specifically, we utilize label information to construct a task-adaptive metric space, thereby adaptively reducing the intra-class differences and magnifying the inter-class differences. We further employ the optimal transport technique to estimate class prototypes with query set samples together, mitigating the problem of inaccurate and ambiguous support set samples caused by large intra-class differences. We conduct extensive experiments on eight benchmark datasets, and our approach shows obvious advantages over state-of-the-art models across all the tasks on all the datasets. For reproducibility, all the datasets and codes are available at https://github.com/YvoGao/LAQDA.

* Accepted by EMNLP 2024 Findings

Via

Access Paper or Ask Questions

Hate Speech Detection via Dual Contrastive Learning

Jul 10, 2023

Junyu Lu, Hongfei Lin, Xiaokun Zhang, Zhaoqing Li, Tongyue Zhang, Linlin Zong, Fenglong Ma, Bo Xu

Abstract:The fast spread of hate speech on social media impacts the Internet environment and our society by increasing prejudice and hurting people. Detecting hate speech has aroused broad attention in the field of natural language processing. Although hate speech detection has been addressed in recent work, this task still faces two inherent unsolved challenges. The first challenge lies in the complex semantic information conveyed in hate speech, particularly the interference of insulting words in hate speech detection. The second challenge is the imbalanced distribution of hate speech and non-hate speech, which may significantly deteriorate the performance of models. To tackle these challenges, we propose a novel dual contrastive learning (DCL) framework for hate speech detection. Our framework jointly optimizes the self-supervised and the supervised contrastive learning loss for capturing span-level information beyond the token-level emotional semantics used in existing models, particularly detecting speech containing abusive and insulting words. Moreover, we integrate the focal loss into the dual contrastive learning framework to alleviate the problem of data imbalance. We conduct experiments on two publicly available English datasets, and experimental results show that the proposed model outperforms the state-of-the-art models and precisely detects hate speeches.

* The paper has been accepted in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

Via

Access Paper or Ask Questions

Spectral Perturbation Meets Incomplete Multi-view Data

May 31, 2019

Hao Wang, Linlin Zong, Bing Liu, Yan Yang, Wei Zhou

Figure 1 for Spectral Perturbation Meets Incomplete Multi-view Data

Figure 2 for Spectral Perturbation Meets Incomplete Multi-view Data

Figure 3 for Spectral Perturbation Meets Incomplete Multi-view Data

Figure 4 for Spectral Perturbation Meets Incomplete Multi-view Data

Abstract:Beyond existing multi-view clustering, this paper studies a more realistic clustering scenario, referred to as incomplete multi-view clustering, where a number of data instances are missing in certain views. To tackle this problem, we explore spectral perturbation theory. In this work, we show a strong link between perturbation risk bounds and incomplete multi-view clustering. That is, as the similarity matrix fed into spectral clustering is a quantity bounded in magnitude O(1), we transfer the missing problem from data to similarity and tailor a matrix completion method for incomplete similarity matrix. Moreover, we show that the minimization of perturbation risk bounds among different views maximizes the final fusion result across all views. This provides a solid fusion criteria for multi-view data. We motivate and propose a Perturbation-oriented Incomplete multi-view Clustering (PIC) method. Experimental results demonstrate the effectiveness of the proposed method.

* to appear in IJCAI 2019

Via

Access Paper or Ask Questions