Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhi Chen

Spring

StableFusion: Continual Video Retrieval via Frame Adaptation

Mar 13, 2025

Zecheng Zhao, Zhi Chen, Zi Huang, Shazia Sadiq, Tong Chen

Figure 1 for StableFusion: Continual Video Retrieval via Frame Adaptation

Figure 2 for StableFusion: Continual Video Retrieval via Frame Adaptation

Figure 3 for StableFusion: Continual Video Retrieval via Frame Adaptation

Figure 4 for StableFusion: Continual Video Retrieval via Frame Adaptation

Abstract:Text-to-Video Retrieval (TVR) aims to match videos with corresponding textual queries, yet the continual influx of new video content poses a significant challenge for maintaining system performance over time. In this work, we introduce the first benchmark for Continual Text-to-Video Retrieval (CTVR) to overcome these limitations. Our analysis reveals that current TVR methods based on pre-trained models struggle to retain plasticity when adapting to new tasks, while existing continual learning approaches experience catastrophic forgetting, resulting in semantic misalignment between historical queries and stored video features. To address these challenges, we propose StableFusion, a novel CTVR framework comprising two main components: the Frame Fusion Adapter (FFA), which captures temporal dynamics in video content while preserving model flexibility, and the Task-Aware Mixture-of-Experts (TAME), which maintains consistent semantic alignment between queries across tasks and the stored video features. Comprehensive evaluations on two benchmark datasets under various task settings demonstrate that StableFusion outperforms existing continual learning and TVR methods, achieving superior retrieval performance with minimal degradation on earlier tasks in the context of continuous video streams. Our code is available at: https://github.com/JasonCodeMaker/CTVR

Via

Access Paper or Ask Questions

THz Beam Squint Mitigation via 3D Rotatable Antennas

Mar 11, 2025

Yike Xie, Weidong Mei, Dong Wang, Boyu Ning, Zhi Chen, Jun Fang, Wei Guo

Figure 1 for THz Beam Squint Mitigation via 3D Rotatable Antennas

Figure 2 for THz Beam Squint Mitigation via 3D Rotatable Antennas

Figure 3 for THz Beam Squint Mitigation via 3D Rotatable Antennas

Figure 4 for THz Beam Squint Mitigation via 3D Rotatable Antennas

Abstract:Analog beamforming holds great potential for future terahertz (THz) communications due to its ability to generate high-gain directional beams with low-cost phase shifters.However, conventional analog beamforming may suffer substantial performance degradation in wideband systems due to the beam-squint effects. Instead of relying on high-cost true time delayers, we propose in this paper an efficient three-dimensional (3D) rotatable antenna technology to mitigate the beam-squint effects, motivated by the fact that beam squint disappears along the boresight direction. In particular, we focus on a wideband wide-beam coverage problem in this paper, aiming to maximize the minimum beamforming gain within a given angle and frequency range by jointly optimizing the analog beamforming vector and the 3D rotation angles of the antenna array. However, this problem is non-convex and difficult to be optimally solved due to the coupling of the spatial and frequency domains and that of the antenna weights and rotation. To tackle this issue, we first reformulate the problem into an equivalent form by merging the spatial and frequency domains into a single composite domain. Next, we combine alternating optimization (AO) and successive convex approximation (SCA) algorithms to optimize the analog beamforming and rotation angles within this composite domain. Simulation results demonstrate that the proposed scheme can significantly outperform conventional schemes without antenna rotation, thus offering a cost-effective solution for wideband transmission over THz bands.

Via

Access Paper or Ask Questions

SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models

Feb 27, 2025

Zicheng Cai, Yaohua Tang, Yutao Lai, Hua Wang, Zhi Chen, Hao Chen

Abstract:We introduce SEKI, a novel large language model (LLM)-based neural architecture search (NAS) method. Inspired by the chain-of-thought (CoT) paradigm in modern LLMs, SEKI operates in two key stages: self-evolution and knowledge distillation. In the self-evolution stage, LLMs initially lack sufficient reference examples, so we implement an iterative refinement mechanism that enhances architectures based on performance feedback. Over time, this process accumulates a repository of high-performance architectures. In the knowledge distillation stage, LLMs analyze common patterns among these architectures to generate new, optimized designs. Combining these two stages, SEKI greatly leverages the capacity of LLMs on NAS and without requiring any domain-specific data. Experimental results show that SEKI achieves state-of-the-art (SOTA) performance across various datasets and search spaces while requiring only 0.05 GPU-days, outperforming existing methods in both efficiency and accuracy. Furthermore, SEKI demonstrates strong generalization capabilities, achieving SOTA-competitive results across multiple tasks.

Via

Access Paper or Ask Questions

Terahertz Aerospace Communications: Enabling Technologies and Future Directions

Feb 25, 2025

Weijun Gao, Chong Han, Zhi Chen, Yong Chen, Yuanzhi He, Wenjun Zhang

Abstract:To achieve ubiquitous connectivity in next-generation networks through aerospace communications while maintaining high data rates, Terahertz (THz) band communications (0.1-10 THz) with large continuous bandwidths are considered a promising candidate technology. However, key enabling techniques and practical implementations of THz communications for aerospace applications remain limited. In this paper, the wireless channel characteristics, enabling communication techniques, and networking strategies for THz aerospace communications are investigated, aiming to assess their feasibility and encourage future research efforts toward system realization. Specifically, the wireless channel characteristics across various altitudes and scenarios are first analyzed, focusing on modeling the interaction between the THz wave and the external environment, from ground to outer space. Next, key enabling communication technologies, including multiple-input multiple-output (MIMO) technique, beam alignment and tracking, integrated communication and radar sensing (ICARS), and resource allocation for networking are discussed. Finally, the existing challenges and possible future directions are summarized and discussed.

Via

Access Paper or Ask Questions

Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Feb 21, 2025

Yaohua Tang, Zhicheng Hu, Kun Cheng, Fan Mo, Qiheng Lv, Hua Wang, Zhi Chen

Figure 1 for Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Figure 2 for Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Figure 3 for Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Figure 4 for Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Abstract:The increasing context window size in large language models (LLMs) has improved their ability to handle complex, long-text tasks. However, as the conversation rounds continue, it is required to store a large amount of KV cache in GPU memory, which significantly affects the efficiency and even availability of the model serving systems. This paper analyzes dialogue data from real users and discovers that the LLM inference manifests a watershed layer, after which the distribution of round-level attention shows notable similarity. We propose Round Attention, a novel round-level attention mechanism that only recalls and computes the KV cache of the most relevant rounds. The experiments show that our method saves 55\% memory usage without compromising model performance.

Via

Access Paper or Ask Questions

LESA: Learnable LLM Layer Scaling-Up

Feb 19, 2025

Yifei Yang, Zouying Cao, Xinbei Ma, Yao Yao, Libo Qin, Zhi Chen, Hai Zhao

Abstract:Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. However, existing depth scaling-up methods rely on empirical heuristic rules for layer duplication, which result in poorer initialization and slower convergence during continual pre-training. We propose \textbf{LESA}, a novel learnable method for depth scaling-up. By concatenating parameters from each layer and applying Singular Value Decomposition, we uncover latent patterns between layers, suggesting that inter-layer parameters can be learned. LESA uses a neural network to predict the parameters inserted between adjacent layers, enabling better initialization and faster training. Experiments show that LESA outperforms existing baselines, achieving superior performance with less than half the computational cost during continual pre-training. Extensive analyses demonstrate its effectiveness across different model sizes and tasks.

Via

Access Paper or Ask Questions

Towards THz-based Obstacle Sensing: A Generative Radio Environment Awareness Framework

Feb 11, 2025

Tianyu Hu, Yunhang Xie, Shuai Wang, Boyu Ning, Lingxiang Li, Zhi Chen

Abstract:Obstacle sensing is essential for terahertz (THz) communication since the subsequent beam management can avoid THz signals blocked by the obstacles. In parallel, radio environment, which can be manifested by channel knowledge such as the distribution of received signal strength (RSS), reveals signal propagation situation and the corresponding obstacle information. However, the awareness of the radio environment for obstacle sensing is challenging in practice, as the sparsely deployed THz sensors can acquire only little a priori knowledge with their RSS measurements. Therefore, we formulate in this paper a radio environment awareness problem, which for the first time considers a probability distribution of obstacle attributes. To solve such a problem, we propose a THz-based generative radio environment awareness framework, in which obstacle information is obtained directly from the aware radio environment. We also propose a novel generative model based on conditional generative adversarial network (CGAN), where U-net and the objective function of the problem are introduced to enable accurate awareness of RSS distribution. Simulation results show that the proposed framework can improve the awareness of the radio environment, and thus achieve superior sensing performance in terms of average precision regarding obstacles' shape and location.

Via

Access Paper or Ask Questions

Antenna Position Optimization for Movable Antenna-Empowered Near-Field Sensing

Feb 05, 2025

Yushen Wang, Weidong Mei, Xin Wei, Boyu Ning, Zhi Chen

Figure 1 for Antenna Position Optimization for Movable Antenna-Empowered Near-Field Sensing

Figure 2 for Antenna Position Optimization for Movable Antenna-Empowered Near-Field Sensing

Figure 3 for Antenna Position Optimization for Movable Antenna-Empowered Near-Field Sensing

Figure 4 for Antenna Position Optimization for Movable Antenna-Empowered Near-Field Sensing

Abstract:Movable antennas (MAs) show great promise for enhancing the sensing capabilities of future sixth-generation (6G) networks. With the growing prevalence of near-field propagation at ultra-high frequencies, this paper focuses on the application of MAs for near-field sensing to jointly estimate the angle and distance information of a target. First, to gain essential insights into MA-enhanced near-field sensing, we investigate two simplified cases with only the spatial angle-of-arrival (AoA) or distance estimation, respectively, assuming that the other information is already known. We derive the worst-case Cramer-Rao bounds (CRBs) on the mean square errors (MSEs) of the AoA estimation and the distance estimation via the multiple signal classification (MUSIC) algorithm in these two cases. Then, we jointly optimize the positions of the MAs within a linear array to minimize these CRBs and derive their closed-form solutions, which yield an identical array geometry to MA-aided far-field sensing. Furthermore, we proceed to the more challenging case with the joint AoA and distance estimation and derive the worst-case CRB under the two-dimensional (2D) MUSIC algorithm. The corresponding CRB minimization problem is efficiently solved by adopting a discrete sampling-based approach. Numerical results demonstrate that the proposed MA-enhanced near-field sensing significantly outperforms conventional sensing with fixed-position antennas (FPAs). Moreover, the joint angle and distance estimation results in a different array geometry from that in the individual estimation of angle or distance.

Via

Access Paper or Ask Questions

ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model

Feb 05, 2025

Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiaqi Wang, Mengkang Hu, Zhi Chen, Wanxiang Che, Ting Liu

Figure 1 for ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model

Figure 2 for ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model

Figure 3 for ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model

Figure 4 for ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model

Abstract:Recent advancements in large language models (LLMs) have led to significant successes across various applications, where the most noticeable is to a series of emerging capabilities, particularly in the areas of In-Context Learning (ICL) and Chain-of-Thought (CoT). To better understand and control model performance, many studies have begun investigating the underlying causes of these phenomena and their impact on task outcomes. However, existing explanatory frameworks predominantly focus on isolating and explaining ICL and CoT independently, leading to an incomplete understanding of their combined influence on model performance. To address this gap, we propose the Electronic Circuit Model (ECM), which provides a foundation for developing scalable, learnable policies and improving the management of AI-generated content. Specifically, ECM conceptualizes model behavior as an electronic circuit: ICL is represented as semantic magnetic field to providing an additional voltage following Faraday's Law, while CoT is modeled as series resistors to constrain the model output performance following Ohm's Law. Experimental results demonstrate that the ECM effectively predicts and explains LLM performance across a variety of prompting strategies. Furthermore, we apply ECM to advanced reasoning strategy optimization on a series of tasks, such as the International Olympiad in Informatics (IOI) and the International Mathematical Olympiad (IMO), achieving competitive performance that surpasses nearly 80% of top human competitors.

* Manuscript

Via

Access Paper or Ask Questions

Sundial: A Family of Highly Capable Time Series Foundation Models

Feb 02, 2025

Yong Liu, Guo Qin, Zhiyuan Shi, Zhi Chen, Caiyin Yang, Xiangdong Huang, Jianmin Wang, Mingsheng Long

Abstract:We introduce Sundial, a family of native, flexible, and scalable time series foundation models. To predict the next-patch's distribution, we propose a TimeFlow Loss based on flow-matching, which facilitates native pre-training of Transformers on time series without discrete tokenization. Conditioned on arbitrary-length time series, our model is pre-trained without specifying any prior distribution and can generate multiple probable predictions, achieving flexibility in representation learning beyond using parametric densities. Towards time series foundation models, we leverage minimal but crucial adaptations of Transformers and curate TimeBench with 1 trillion time points, comprising mostly real-world datasets and synthetic data. By mitigating mode collapse through TimeFlow Loss, we pre-train a family of Sundial models on TimeBench, which exhibit unprecedented model capacity and generalization performance on zero-shot forecasting. In addition to presenting good scaling behavior, Sundial achieves new state-of-the-art on both point forecasting and probabilistic forecasting benchmarks. We believe that Sundial's pioneering generative paradigm will facilitate a wide variety of forecasting scenarios.

Via

Access Paper or Ask Questions