Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Zhang

Henry

AGMA: Adaptive Gaussian Mixture Anchors for Prior-Guided Multimodal Human Trajectory Forecasting

Feb 04, 2026

Chao Li, Rui Zhang, Siyuan Huang, Xian Zhong, Hongbo Jiang

Abstract:Human trajectory forecasting requires capturing the multimodal nature of pedestrian behavior. However, existing approaches suffer from prior misalignment. Their learned or fixed priors often fail to capture the full distribution of plausible futures, limiting both prediction accuracy and diversity. We theoretically establish that prediction error is lower-bounded by prior quality, making prior modeling a key performance bottleneck. Guided by this insight, we propose AGMA (Adaptive Gaussian Mixture Anchors), which constructs expressive priors through two stages: extracting diverse behavioral patterns from training data and distilling them into a scene-adaptive global prior for inference. Extensive experiments on ETH-UCY, Stanford Drone, and JRDB datasets demonstrate that AGMA achieves state-of-the-art performance, confirming the critical role of high-quality priors in trajectory forecasting.

* 14 pages, 3 figures

Via

Access Paper or Ask Questions

SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking

Feb 04, 2026

Weiguang Zhao, Haoran Xu, Xingyu Miao, Qin Zhao, Rui Zhang, Kaizhu Huang, Ning Gao, Peizhou Cao, Mingze Sun, Mulin Yu(+4 more)

Abstract:Point tracking aims to follow visual points through complex motion, occlusion, and viewpoint changes, and has advanced rapidly with modern foundation models. Yet progress toward general point tracking remains constrained by limited high-quality data, as existing datasets often provide insufficient diversity and imperfect trajectory annotations. To this end, we introduce SynthVerse, a large-scale, diverse synthetic dataset specifically designed for point tracking. SynthVerse includes several new domains and object types missing from existing synthetic datasets, such as animated-film-style content, embodied manipulation, scene navigation, and articulated objects. SynthVerse substantially expands dataset diversity by covering a broader range of object categories and providing high-quality dynamic motions and interactions, enabling more robust training and evaluation for general point tracking. In addition, we establish a highly diverse point tracking benchmark to systematically evaluate state-of-the-art methods under broader domain shifts. Extensive experiments and analyses demonstrate that training with SynthVerse yields consistent improvements in generalization and reveal limitations of existing trackers under diverse settings.

Via

Access Paper or Ask Questions

Fedcompass: Federated Clustered and Periodic Aggregation Framework for Hybrid Classical-Quantum Models

Feb 03, 2026

Yueheng Wang, Xing He, Zinuo Cai, Rui Zhang, Ruhui Ma, Yuan Liu, Rajkumar Buyya

Abstract:Federated learning enables collaborative model training across decentralized clients under privacy constraints. Quantum computing offers potential for alleviating computational and communication burdens in federated learning, yet hybrid classical-quantum federated learning remains susceptible to performance degradation under non-IID data. To address this,we propose FEDCOMPASS, a layered aggregation framework for hybrid classical-quantum federated learning. FEDCOMPASS employs spectral clustering to group clients by class distribution similarity and performs cluster-wise aggregation for classical feature extractors. For quantum parameters, it uses circular mean aggregation combined with adaptive optimization to ensure stable global updates. Experiments on three benchmark datasets show that FEDCOMPASS improves test accuracy by up to 10.22% and enhances convergence stability under non-IID settings, outperforming six strong federated learning baselines.

* Accepted by the 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP 2026)

Via

Access Paper or Ask Questions

Optimization and Generation in Aerodynamics Inverse Design

Feb 03, 2026

Huaguan Chen, Ning Lin, Luxi Chen, Rui Zhang, Wenbing Huang, Chongxuan Li, Hao Sun

Abstract:Inverse design with physics-based objectives is challenging because it couples high-dimensional geometry with expensive simulations, as exemplified by aerodynamic shape optimization for drag reduction. We revisit inverse design through two canonical solutions, the optimal design point and the optimal design distribution, and relate them to optimization and guided generation. Building on this view, we propose a new training loss for cost predictors and a density-gradient optimization method that improves objectives while preserving plausible shapes. We further unify existing training-free guided generation methods. To address their inability to approximate conditional covariance in high dimensions, we develop a time- and memory-efficient algorithm for approximate covariance estimation. Experiments on a controlled 2D study and high-fidelity 3D aerodynamic benchmarks (car and aircraft), validated by OpenFOAM simulations and miniature wind-tunnel tests with 3D-printed prototypes, demonstrate consistent gains in both optimization and guided generation. Additional offline RL results further support the generality of our approach.

Via

Access Paper or Ask Questions

Kimi K2.5: Visual Agentic Intelligence

Feb 02, 2026

Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, S. H. Cai, Yuan Cao, Y. Charles, H. S. Che, Cheng Chen, Guanduo Chen(+315 more)

Abstract:We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modalities enhance each other. This includes a series of techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Building on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Extensive evaluations show that Kimi K2.5 achieves state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to $4.5\times$ over single-agent baselines. We release the post-trained Kimi K2.5 model checkpoint to facilitate future research and real-world applications of agentic intelligence.

* Kimi K2.5 tech report

Via

Access Paper or Ask Questions

LocalV: Exploiting Information Locality for IP-level Verilog Generation

Jan 31, 2026

Hanqi Lyu, Di Huang, Yaoyu Zhu, Kangcheng Liu, Bohan Dou, Chongxiao Li, Pengwei Jin, Shuyao Cheng, Rui Zhang, Zidong Du(+3 more)

Abstract:The generation of Register-Transfer Level (RTL) code is a crucial yet labor-intensive step in digital hardware design, traditionally requiring engineers to manually translate complex specifications into thousands of lines of synthesizable Hardware Description Language (HDL) code. While Large Language Models (LLMs) have shown promise in automating this process, existing approaches-including fine-tuned domain-specific models and advanced agent-based systems-struggle to scale to industrial IP-level design tasks. We identify three key challenges: (1) handling long, highly detailed documents, where critical interface constraints become buried in unrelated submodule descriptions; (2) generating long RTL code, where both syntactic and semantic correctness degrade sharply with increasing output length; and (3) navigating the complex debugging cycles required for functional verification through simulation and waveform analysis. To overcome these challenges, we propose LocalV, a multi-agent framework that leverages information locality in modular hardware design. LocalV decomposes the long-document to long-code generation problem into a set of short-document, short-code tasks, enabling scalable generation and debugging. Specifically, LocalV integrates hierarchical document partitioning, task planning, localized code generation, interface-consistent merging, and AST-guided locality-aware debugging. Experiments on RealBench, an IP-level Verilog generation benchmark, demonstrate that LocalV substantially outperforms state-of-the-art (SOTA) LLMs and agents, achieving a pass rate of 45.0% compared to 21.6%.

Via

Access Paper or Ask Questions

QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals

Jan 31, 2026

Nan Zhang, Eugene Kwek, Yusen Zhang, Muyu Pan, Suhang Wang, Prasenjit Mitra, Rui Zhang

Abstract:Weight-only quantization is important for compressing Large Language Models (LLMs). Inspired by the spirit of classical magnitude pruning, we study whether the magnitude of weight updates during reasoning-incentivized fine-tuning can provide valuable signals for quantizing Large Reasoning Models (LRMs). We hypothesize that the smallest and largest weight updates during fine-tuning are more important than those of intermediate magnitude, a phenomenon we term "protecting both ends". Upon hypothesis validation, we introduce QuantLRM, which stands for weight quantization of LRMs via fine-tuning signals. We fit simple restricted quadratic functions on weight updates to protect both ends. By multiplying the average quadratic values with the count of zero weight updates of channels, we compute channel importance that is more effective than using activation or second-order information. We run QuantLRM to quantize various fine-tuned models (including supervised, direct preference optimization, and reinforcement learning fine-tuning) over four reasoning benchmarks (AIME-120, FOLIO, temporal sequences, and GPQA-Diamond) and empirically find that QuantLRM delivers a consistent improvement for LRMs quantization, with an average improvement of 6.55% on a reinforcement learning fine-tuned model. Also supporting non-fine-tuned LRMs, QuantLRM gathers effective signals via pseudo-fine-tuning, which greatly enhances its applicability.

Via

Access Paper or Ask Questions

SORIS: A Self-Organized Reconfigurable Intelligent Surface Architecture for Wireless Communications

Jan 30, 2026

Evangelos Koutsonas, Alexandros-Apostolos A. Boulogeorgos, Stylianos E. Trevlakis, George C. Alexandropoulos, Theodoros A. Tsiftsis, Rui Zhang

Abstract:In this paper, a new reconfigurable intelligent surface (RIS) hardware architecture, called self-organized RIS (SORIS), is proposed. The architecture incorporates a microcontroller connected to a single-antenna receiver operating at the same frequency as the RIS unit elements, operating either in transmission or reflection mode. The transmitting RIS elements enable the low latency estimation of both the incoming and outcoming channels at the microcontroller's side. In addition, a machine learning approach for estimating the incoming and outcoming channels involving the remaining RIS elements operating in reflection mode is devised. Specifically, by appropriately selecting a small number of elements in transmission mode, and based on the channel reciprocity principle, the respective channel coefficients are first estimated, which are then fed to a low-complexity neural network that, leveraging spatial channel correlation over RIS elements, returns predictions of the channel coefficients referring to the rest of elements. In this way, the SORIS microcontroller acquires channel state information, and accordingly reconfigures the panel's metamaterials to assist data communication between a transmitter and a receiver, without the need for separate connections with them. Moreover, the impact of channel estimation on the proposed solution, and a detailed complexity analysis for the used model, as well as a wiring density and control signaling analysis, is performed. The feasibility and efficacy of the proposed self-organized RIS design and operation are verified by Monte Carlo simulations, providing useful guidelines on the selection of the RIS elements for operating in transmission mode for initial channel estimation.

* IEEE Transactions on Green Communications and Networking, vol. 10, pp. 1749-1764, 2026
* 12 figures, 1 table, journal

Via

Access Paper or Ask Questions

Learning to Optimize Job Shop Scheduling Under Structural Uncertainty

Jan 29, 2026

Rui Zhang, Jianwei Niu, Xuefeng Liu, Shaojie Tang, Jing Yuan

Abstract:The Job-Shop Scheduling Problem (JSSP), under various forms of manufacturing uncertainty, has recently attracted considerable research attention. Most existing studies focus on parameter uncertainty, such as variable processing times, and typically adopt the actor-critic framework. In this paper, we explore a different but prevalent form of uncertainty in JSSP: structural uncertainty. Structural uncertainty arises when a job may follow one of several routing paths, and the selection is determined not by policy, but by situational factors (e.g., the quality of intermediate products) that cannot be known in advance. Existing methods struggle to address this challenge due to incorrect credit assignment: a high-quality action may be unfairly penalized if it is followed by a time-consuming path. To address this problem, we propose a novel method named UP-AAC. In contrast to conventional actor-critic methods, UP-AAC employs an asymmetric architecture. While its actor receives a standard stochastic state, the critic is crucially provided with a deterministic state reconstructed in hindsight. This design allows the critic to learn a more accurate value function, which in turn provides a lower-variance policy gradient to the actor, leading to more stable learning. In addition, we design an attention-based Uncertainty Perception Model (UPM) to enhance the actor's scheduling decisions. Extensive experiments demonstrate that our method outperforms existing approaches in reducing makespan on benchmark instances.

Via

Access Paper or Ask Questions

Compressive Beam-Pattern-Aware Near-field Beam Training via Total Variation Denoising

Jan 29, 2026

Zijun Wang, Maria Nivetha A, Ye Hu, Rui Zhang

Abstract:Extremely large antenna arrays envisioned for 6G incurs near-field effect, where steering vector depends on angles and range simultaneously. Polar-domain near-field codebooks can focus energy accurately but incur extra two-dimensional sweeping overhead; compressed-sensing (CS) approaches with Gaussian-masked DFT sensing offer a lower-overhead alternative. This letter revisits near-field beam training using conventional DFT codebooks. Unlike far-field responses that concentrate energy on a few isolated DFT beams, near-field responses produce contiguous, plateau-like energy segments with sharp transitions in the DFT beamspace. Pure LASSO denoising, therefore, tends to over-shrink magnitudes and fragment plateaus. We propose a beam-pattern-preserving beam training scheme for multiple-path scenarios that combines LASSO with a lightweight denoising pipeline: LASSO to suppress small-amplitude noise, followed by total variation (TV) to maintain plateau levels and edge sharpness. The two proximal steps require no near-field codebook design. Simulations with Gaussian pilots show consistent NMSE and cosine-similarity gains over least squares and LASSO at the same pilot budget.

Via

Access Paper or Ask Questions