Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhi Chen

Spring

SRSUPM: Sequential Recommender System Based on User Psychological Motivation

Feb 09, 2026

Yicheng Di, Yuan Liu, Zhi Chen, Jingcai Guo

Abstract:Sequential recommender infers users' evolving psychological motivations from historical interactions to recommend the next preferred items. Most existing methods compress recent behaviors into a single vector and optimize it toward a single observed target item, but lack explicit modeling of psychological motivation shift. As a result, they struggle to uncover the distributional patterns across different shift degrees and to capture collaborative knowledge that is sensitive to psychological motivation shift. We propose a general framework, the Sequential Recommender System Based on User Psychological Motivation, to enhance sequential recommenders with psychological motivation shift-aware user modeling. Specifically, the Psychological Motivation Shift Assessment quantitatively measures psychological motivation shift; guided by PMSA, the Shift Information Construction models dynamically evolving multi-level shift states, and the Psychological Motivation Shift-driven Information Decomposition decomposes and regularizes representations across shift levels. Moreover, the Psychological Motivation Shift Information Matching strengthens collaborative patterns related to psychological motivation shift to learn more discriminative user representations. Extensive experiments on three public benchmarks show that SRSUPM consistently outperforms representative baselines on diverse sequential recommender tasks.

* 9 pages, 8 pages

Via

Access Paper or Ask Questions

Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

Feb 08, 2026

Zhi Chen, Zhensu Sun, Yuling Shi, Chao Peng, Xiaodong Gu, David Lo, Lingxiao Jiang

Abstract:Large Language Model (LLM) code agents increasingly resolve repository-level issues by iteratively editing code, invoking tools, and validating candidate patches. In these workflows, agents often write tests on the fly, a paradigm adopted by many high-ranking agents on the SWE-bench leaderboard. However, we observe that GPT-5.2, which writes almost no new tests, can even achieve performance comparable to top-ranking agents. This raises the critical question: whether such tests meaningfully improve issue resolution or merely mimic human testing practices while consuming a substantial interaction budget. To reveal the impact of agent-written tests, we present an empirical study that analyzes agent trajectories across six state-of-the-art LLMs on SWE-bench Verified. Our results show that while test writing is commonly adopted, but resolved and unresolved tasks within the same model exhibit similar test-writing frequencies Furthermore, these tests typically serve as observational feedback channels, where agents prefer value-revealing print statements significantly more than formal assertion-based checks. Based on these insights, we perform a controlled experiment by revising the prompts of four agents to either increase or reduce test writing. The results suggest that changes in the volume of agent-written tests do not significantly change final outcomes. Taken together, our study reveals that current test-writing practices may provide marginal utility in autonomous software engineering tasks.

Via

Access Paper or Ask Questions

StomataSeg: Semi-Supervised Instance Segmentation for Sorghum Stomatal Components

Jan 31, 2026

Zhongtian Huang, Zhi Chen, Zi Huang, Xin Yu, Daniel Smith, Chaitanya Purushothama, Erik Van Oosterom, Alex Wu, William Salter, Yan Li(+1 more)

Abstract:Sorghum is a globally important cereal grown widely in water-limited and stress-prone regions. Its strong drought tolerance makes it a priority crop for climate-resilient agriculture. Improving water-use efficiency in sorghum requires precise characterisation of stomatal traits, as stomata control of gas exchange, transpiration and photosynthesis have a major influence on crop performance. Automated analysis of sorghum stomata is difficult because the stomata are small (often less than 40 $μ$m in length in grasses such as sorghum) and vary in shape across genotypes and leaf surfaces. Automated segmentation contributes to high-throughput stomatal phenotyping, yet current methods still face challenges related to nested small structures and annotation bottlenecks. In this paper, we propose a semi-supervised instance segmentation framework tailored for analysis of sorghum stomatal components. We collect and annotate a sorghum leaf imagery dataset containing 11,060 human-annotated patches, covering the three stomatal components (pore, guard cell and complex area) across multiple genotypes and leaf surfaces. To improve the detection of tiny structures, we split high-resolution microscopy images into overlapping small patches. We then apply a pseudo-labelling strategy to unannotated images, producing an additional 56,428 pseudo-labelled patches. Benchmarking across semantic and instance segmentation models shows substantial performance gains: for semantic models the top mIoU increases from 65.93% to 70.35%, whereas for instance models the top AP rises from 28.30% to 46.10%. These results demonstrate that combining patch-based preprocessing with semi-supervised learning significantly improves the segmentation of fine stomatal structures. The proposed framework supports scalable extraction of stomatal traits and facilitates broader adoption of AI-driven phenotyping in crop science.

Via

Access Paper or Ask Questions

Generative Recall, Dense Reranking: Learning Multi-View Semantic IDs for Efficient Text-to-Video Retrieval

Jan 29, 2026

Zecheng Zhao, Zhi Chen, Zi Huang, Shazia Sadiq, Tong Chen

Abstract:Text-to-Video Retrieval (TVR) is essential in video platforms. Dense retrieval with dual-modality encoders leads in accuracy, but its computation and storage scale poorly with corpus size. Thus, real-time large-scale applications adopt two-stage retrieval, where a fast recall model gathers a small candidate pool, which is reranked by an advanced dense retriever. Due to hugely reduced candidates, the reranking model can use any off-the-shelf dense retriever without hurting efficiency, meaning the recall model bounds two-stage TVR performance. Recently, generative retrieval (GR) replaces dense video embeddings with discrete semantic IDs and retrieves by decoding text queries into ID tokens. GR offers near-constant inference and storage complexity, and its semantic IDs capture high-level video features via quantization, making it ideal for quickly eliminating irrelevant candidates during recall. However, as a recall model in two-stage TVR, GR suffers from (i) semantic ambiguity, where each video satisfies diverse queries but is forced into one semantic ID; and (ii) cross-modal misalignment, as semantic IDs are solely derived from visual features without text supervision. We propose Generative Recall and Dense Reranking (GRDR), designing a novel GR method to uplift recalled candidate quality. GRDR assigns multiple semantic IDs to each video using a query-guided multi-view tokenizer exposing diverse semantic access paths, and jointly trains the tokenizer and generative retriever via a shared codebook to cast semantic IDs as the semantic bridge between texts and videos. At inference, trie-constrained decoding generates a compact candidate set reranked by a dense model for fine-grained matching. Experiments on TVR benchmarks show GRDR matches strong dense retrievers in accuracy while reducing index storage by an order of magnitude and accelerating up to 300$\times$ in full-corpus retrieval.

* 10 pages

Via

Access Paper or Ask Questions

Movable Antenna-Enhanced Near-Field Flexible Beamforming: Performance Analysis and Optimization

Jan 25, 2026

Shun Yang, Xin Wei, Nianbing Su, Weidong Mei, Zhi Chen, Boyu Ning

Abstract:As an emerging wireless communication technology, movable antennas (MAs) offer the ability to adjust the spatial correlation of steering vectors, enabling more flexible beamforming compared to fixed-position antennas (FPAs). In this paper, we investigate the use of MAs for two typical near-field beamforming scenarios: beam nulling and multi-beam forming. In the first scenario, we aim to jointly optimize the positions of multiple MAs and the beamforming vector to maximize the beam gain toward a desired direction while nulling interference toward multiple undesired directions. In the second scenario, the objective is to maximize the minimum beam gain among all the above directions. However, both problems are non-convex and challenging to solve optimally. To gain insights, we first analyze several special cases and show that, with proper positioning of the MAs, directing the beam toward a specific direction can lead to nulls or full gains in other directions in the two scenarios, respectively. For the general cases, we propose a discrete sampling method and an alternating optimization algorithm to obtain high-quality suboptimal solutions to the two formulated problems. Furthermore, considering the practical limitations in antenna positioning accuracy, we analyze the impact of position errors on the performance of the optimized beamforming and MA positions, by introducing a Taylor series approximation for the near-field beam gain at each target. Numerical results validate our theoretical findings and demonstrate the effectiveness of our proposed algorithms.

Via

Access Paper or Ask Questions

SOP: A Scalable Online Post-Training System for Vision-Language-Action Models

Jan 06, 2026

Mingjie Pan, Siyuan Feng, Qinglin Zhang, Xinchen Li, Jianheng Song, Chendi Qu, Yi Wang, Chuankang Li, Ziyu Xiong, Zhi Chen(+2 more)

Abstract:Vision-language-action (VLA) models achieve strong generalization through large-scale pre-training, but real-world deployment requires expert-level task proficiency in addition to broad generality. Existing post-training approaches for VLA models are typically offline, single-robot, or task-specific, limiting effective on-policy adaptation and scalable learning from real-world interaction. We introduce a Scalable Online Post-training (SOP) system that enables online, distributed, multi-task post-training of generalist VLA models directly in the physical world. SOP tightly couples execution and learning through a closed-loop architecture in which a fleet of robots continuously streams on-policy experience and human intervention signals to a centralized cloud learner, and asynchronously receives updated policies. This design supports prompt on-policy correction, scales experience collection through parallel deployment, and preserves generality during adaptation. SOP is agnostic to the choice of post-training algorithm; we instantiate it with both interactive imitation learning (HG-DAgger) and reinforcement learning (RECAP). Across a range of real-world manipulation tasks including cloth folding, box assembly, and grocery restocking, we show that SOP substantially improves the performance of large pretrained VLA models while maintaining a single shared policy across tasks. Effective post-training can be achieved within hours of real-world interaction, and performance scales near-linearly with the number of robots in the fleet. These results suggest that tightly coupling online learning with fleet-scale deployment is instrumental to enabling efficient, reliable, and scalable post-training of generalist robot policies in the physical world.

Via

Access Paper or Ask Questions

Movable Antenna Enhanced Multi-Region Beam Coverage: A Multi-Notch-Filter-Inspired Design

Dec 30, 2025

Dong Wang, Weidong Mei, Zhi Chen, Boyu Ning

Abstract:Movable antenna (MA) has emerged as a promising technology to enhance wireless communication performance by exploiting the new degree of freedom (DoF) via antenna position optimization. In this letter, we investigate the MA-enhanced wide beam coverage over multiple subregions in the spatial domain. Specifically, we aim to maximize the minimum beam gain over the desired subregions by jointly optimizing the transmit beamforming and antenna position vector (APV). Although this problem is non-convex, we propose an efficient algorithm to solve it by leveraging the similarity between the considered multi-region coverage and classical multi-notch filter (MNF) design. In particular, we construct a spatial MNF-based transmit beamforming vector by assuming a continuous amplitude and phase-shift profile within the antenna movement region. Based on this continuous profile, we propose a sequential update algorithm to select an optimal subset of MA positions for multi-region coverage, jointly with a Gibbs sampling (GS) procedure to avoid undesired local optimum. Numerical results show that our proposed algorithm can significantly outperform conventional fixed position antennas (FPAs) and achieve a comparable performance to the alternating optimization (AO) algorithm with dramatically lower complexity.

* 5 pages, 5 figures

Via

Access Paper or Ask Questions

Large Language Newsvendor: Decision Biases and Cognitive Mechanisms

Dec 14, 2025

Jifei Liu, Zhi Chen, Yuanguang Zhong

Abstract:Problem definition: Although large language models (LLMs) are increasingly integrated into business decision making, their potential to replicate and even amplify human cognitive biases cautions a significant, yet not well-understood, risk. This is particularly critical in high-stakes operational contexts like supply chain management. To address this, we investigate the decision-making patterns of leading LLMs using the canonical newsvendor problem in a dynamic setting, aiming to identify the nature and origins of their cognitive biases. Methodology/results: Through dynamic, multi-round experiments with GPT-4, GPT-4o, and LLaMA-8B, we tested for five established decision biases. We found that LLMs consistently replicated the classic ``Too Low/Too High'' ordering bias and significantly amplified other tendencies like demand-chasing behavior compared to human benchmarks. Our analysis uncovered a ``paradox of intelligence'': the more sophisticated GPT-4 demonstrated the greatest irrationality through overthinking, while the efficiency-optimized GPT-4o performed near-optimally. Because these biases persist even when optimal formulas are provided, we conclude they stem from architectural constraints rather than knowledge gaps. Managerial implications: First, managers should select models based on the specific task, as our results show that efficiency-optimized models can outperform more complex ones on certain optimization problems. Second, the significant amplification of bias by LLMs highlights the urgent need for robust human-in-the-loop oversight in high-stakes decisions to prevent costly errors. Third, our findings suggest that designing structured, rule-based prompts is a practical and effective strategy for managers to constrain models' heuristic tendencies and improve the reliability of AI-assisted decisions.

Via

Access Paper or Ask Questions

Rotatable Antenna Array-Enhanced Null Steering: Performance Analysis and Optimization

Dec 13, 2025

Yingqi Wen, Weidong Mei, Yike Xie, Beixiong Zheng, Zhi Chen, Boyu Ning

Abstract:Conventional fixed-orientation antenna (FOA) arrays offer limited degrees of freedom (DoF) for flexible beamforming such as null steering. To address this limitation, we propose a new rotatable antenna array (RAA) architecture in this paper, which enables three-dimensional (3D) rotational control of an antenna array to provide enhanced spatial flexibility for null steering. To characterize its performance, we aim to jointly optimize the 3D rotational angles of the RAA, to maximize the beam gain over a given desired direction, while nulling those over multiple interference directions under zero-forcing (ZF) beamforming. However, this problem is non-convex and challenging to tackle due to the highly nonlinear expression of the beam gain in terms of the rotational angles. To gain insights, we first examine several special cases including both isotropic and directional antenna radiation patterns, deriving the conditions under which full beam gain can be achieved over the desired direction while meeting the nulling constraints for interference directions. These conditions clearly indicate that compared with FOA arrays, RAAs can significantly relax the angular separation requirement for achieving effective null steering. For other general cases, we propose a sequential update algorithm, that iteratively refines the 3D rotational angles by discretizing the 3D angular search space. To avoid undesired local optimum, a Gibbs sampling (GS) procedure is also employed between two consecutive rounds of sequential update for solution exploration. Simulation results verify our analytical results and show superior null-steering performance of RAAs to FOA arrays.

* Submitted to IEEE ICC 2026 (Signal Processing for Communications)

Via

Access Paper or Ask Questions

Fine-Grained Zero-Shot Learning with Attribute-Centric Representations

Dec 13, 2025

Zhi Chen, Jingcai Guo, Taotao Cai, Yuxiang Cai

Abstract:Recognizing unseen fine-grained categories demands a model that can distinguish subtle visual differences. This is typically achieved by transferring visual-attribute relationships from seen classes to unseen classes. The core challenge is attribute entanglement, where conventional models collapse distinct attributes like color, shape, and texture into a single visual embedding. This causes interference that masks these critical distinctions. The post-hoc solutions of previous work are insufficient, as they operate on representations that are already mixed. We propose a zero-shot learning framework that learns AttributeCentric Representations (ACR) to tackle this problem by imposing attribute disentanglement during representation learning. ACR is achieved with two mixture-of-experts components, including Mixture of Patch Experts (MoPE) and Mixture of Attribute Experts (MoAE). First, MoPE is inserted into the transformer using a dual-level routing mechanism to conditionally dispatch image patches to specialized experts. This ensures coherent attribute families are processed by dedicated experts. Finally, the MoAE head projects these expert-refined features into sparse, partaware attribute maps for robust zero-shot classification. On zero-shot learning benchmark datasets CUB, AwA2, and SUN, our ACR achieves consistent state-of-the-art results.

* Preprint

Via

Access Paper or Ask Questions