Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kai Wu

Sid

EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning

Jun 24, 2026

Boyun Zhang, Chao Wang, Kai Wu

Abstract:In actor-critic reinforcement learning, network architectures are typically manually designed. Automating this design is challenging because each candidate must be trained before evaluation, and the design space is open-ended. To address these challenges, we introduce EVOM, an agentic meta-evolution framework for discovering high-performance actor-critic architectures. We frame architecture search as a bi-level optimization: an inner loop trains weights via the low-fidelity proximal policy optimization (PPO), while an outer loop drives meta-evolution by iteratively refining architecture programs. Crucially, this outer loop is powered by an LLM-based design agent that operates purely as an architecture designer, completely decoupled from policy execution and environment control. Experiments reveal that EVOM outperforms the manually designed baseline, an LLM-guided random search, and the state-of-the-art LLM-guided programmatic policy search method MLES, delivering superior performance on Ant-v4 and HalfCheetah-v4. Ablation studies validate that both the meta-evolution loop and the LLM Design Agent are indispensable for final performance.

Via

Access Paper or Ask Questions

Feature to Dynamics: Feature-space to Autoregression strategy for Zero-shot Time Series Forecasting

May 31, 2026

Yifan Wu, Junjie Wu, Kai Wu, Xiaoyu Zhang, Jian Lou

Abstract:Zero-shot time series forecasting aims to predict future values for previously unseen series, requiring models to generalize temporal dynamics beyond the training distribution. While recent foundation models achieve strong in-domain performance through large-scale pretraining, their effectiveness often relies on broad data coverage and implicit pattern memorization, which can limit generalization when data are scarce or source and target domains are disjoint. In this work, we propose FSA, a feature-to-strategy framework for controlled zero-shot univariate forecasting. Instead of directly modeling raw sequences in the observation space, FSA learns a structured mapping from an interpretable feature space to an autoregressive strategy space. This design introduces explicit inductive biases that disentangle global trends, periodic components, and local temporal dynamics, enabling the model to capture transferable time-series structure with fewer data assumptions. Empirical results show that, under identical pretraining data, training protocol, and comparable parameter budgets, FSA outperforms Transformer-based architectures in our controlled zero-shot setting.

Via

Access Paper or Ask Questions

ReactBench: A Cause-Driven Benchmark for Multimodal Hallucination via Systematic Evaluation

May 28, 2026

Shizhe Zhou, Bohan Jia, Kai Wu, Yan Shen, Tongyun Li, Yuyang Wu, Shaohui Lin

Abstract:While multimodal large language models (MLLMs) have achieved rapid progress in vision-language understanding, they remain prone to multimodal hallucinations, producing responses that are inconsistent with the visual input. Existing benchmarks predominantly focus on detecting hallucination outcomes rather than evaluating the underlying causes of these failures. Moreover, many benchmarks rely on simplistic scenarios and limited evaluation formats that no longer challenge state-of-the-art models. To address these limitations, we introduce ReactBench, a cause-driven hallucination benchmark featuring multiple tasks and an exam-style evaluation format. By generating adversarial images and hallucination-inducing queries, ReactBench introduces four targeted tasks: Relational Erasure, Counterfactual Attribute, Alteration Tracing, and Dense Counting. These tasks systematically expose co-occurrence bias, language priors, cross-image comparative perception deficiencies, and fine-grained perceptual bottlenecks. Beyond standard accuracy-based evaluation, we leverage Chain-of-Thought reasoning to identify fine-grained sub-causes of hallucination within each task. Extensive evaluations reveal that current MLLMs remain notably vulnerable to cause-specific hallucination triggers, demonstrating the value of ReactBench as a systematic and interpretable testbed for diagnosing and improving multimodal model robustness. The project page is available at https://reactbench.github.io/.

Via

Access Paper or Ask Questions

Posterior-Aware Differential Channel Tracking for Reliable Single-Stream DAB+ Passive Radar

May 23, 2026

Kai Wu, Brendan Hall, Zhongqin Wang, Andrew Zhang, Jay Guo

Abstract:Digital audio broadcasting plus (DAB+) is an attractive illuminator for passive radar because it provides persistent, high-power, and geographically widespread very high frequency (VHF) orthogonal frequency-division multiplexing (OFDM) signals. A channel state information (CSI) sensing approach can convert a single received DAB+ stream into a CSI sequence for radar sensing, avoiding the need for a separately received reference signal in conventional passive radars. However, CSI estimation in DAB+ is challenging due to the differentially encoded communication symbols across time. A wrong symbol transition estimation leads to a persistent multiplicative error in the sequential CSI sequence within a DAB+ frame. This paper formulates single-stream DAB+ passive radar as a posterior-probability-aware differential CSI tracking problem. The proposed method uses the previously tracked CSI as a channel prior, performs prediction-aided maximum a posteriori detection of current symbol, converts posterior transition reliability into observation uncertainty, and applies linear minimum mean squared error fusion to obtain a stable tracking CSI. A reliability-informed CSI fusion strategy is also introduced to preserve weak target information. Theoretical analysis is provided, showing guaranteed performance again in symbol and CSI estimation. Simulation results show that the proposed method can reduce CSI estimation error by over 15~dB compared with prior art. It also improves median target-to-background ratio by more than 11~dB in random fading scenes. Experiments in Sydney, Australia demonstrate improved range-Doppler maps for commercial aircraft sensing.

* 13 pages, 9 figures, submitted to ieee trans

Via

Access Paper or Ask Questions

Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs

Apr 28, 2026

Jianghang Lin, Haihua Yang, Deli Yu, Kai Wu, Kai Ye, Jinghao Lin, Zihan Wang, Yuhang Wu, Liujuan Cao

Abstract:Multimodal Large Language Models (MLLMs) have shown transformative potential in medical applications, yet their performance is hindered by conventional data curation strategies that rely on coarse-grained partitioning by modality or department. Such fragmented approaches fail to capture the hierarchical and interconnected nature of clinical medical knowledge, limiting the models' ability to perform fine-grained recognition and complex reasoning. In this paper, we propose a novel Entity-Centric Medical Data Engineering framework. We automatically extract entities from authoritative medical literature to construct a Medical Entity Tree (MET), a hierarchical structure that systematically encodes diseases, anatomical structures, modalities, and symptoms into a unified knowledge repository. Building upon the MET, we propose an advanced data engine that includes: (1) node-guided retrieval to anchor raw data to specific medical concepts, (2) a two-stage hybrid filtering and alignment pipeline to ensure precise visual-semantic correspondence, and (3) knowledge-aware data synthesis to generate enriched captions and targeted reasoning VQA pairs, leveraging structural constraints. Extensive evaluations across six medical benchmarks demonstrate that our approach significantly enhances the medical capabilities of general-purpose MLLMs, improving their ability to handle complex clinical queries and achieve state-of-the-art performance in diverse medical contexts.

Via

Access Paper or Ask Questions

Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs

Mar 22, 2026

Zihui Chen, Yuling Wang, Pengfei Jiao, Kai Wu, Xiao Wang, Xiang Ao, Dalin Zhang

Abstract:Text-attributed graphs (TAGs) enhance graph learning by integrating rich textual semantics and topological context for each node. While boosting expressiveness, they also expose new vulnerabilities in graph learning through text-based adversarial surfaces. Recent advances leverage diverse backbones, such as graph neural networks (GNNs) and pre-trained language models (PLMs), to capture both structural and textual information in TAGs. This diversity raises a key question: How can we design universal adversarial attacks that generalize across architectures to assess the security of TAG models? The challenge arises from the stark contrast in how different backbones-GNNs and PLMs-perceive and encode graph patterns, coupled with the fact that many PLMs are only accessible via APIs, limiting attacks to black-box settings. To address this, we propose BadGraph, a novel attack framework that deeply elicits large language models (LLMs) understanding of general graph knowledge to jointly perturb both node topology and textual semantics. Specifically, we design a target influencer retrieval module that leverages graph priors to construct cross-modally aligned attack shortcuts, thereby enabling efficient LLM-based perturbation reasoning. Experiments show that BadGraph achieves universal and effective attacks across GNN- and LLM-based reasoners, with up to a 76.3% performance drop, while theoretical and empirical analyses confirm its stealthy yet interpretable nature.

* Accepted by TheWebConf (WWW) 2026

Via

Access Paper or Ask Questions

Uplink Networked Sensing via Multiuser Correlation Exploitation

Mar 17, 2026

Jingying Bao, J. Andrew Zhang, Kai Wu, Christos Masouros, Y. Jay Guo

Abstract:In this correspondence, we investigate networked sensing in perceptive mobile networks under a bistatic multi-transmitter single-receiver uplink topology, where multiple user equipments (UEs) transmit signals over orthogonal frequency-division multiple access (OFDMA) resources and a single base station performs joint sensing. Uplink clock asynchronism introduces offsets that destroy inter-packet coherence and hinder high-resolution sensing, while multi-user observations exhibit exploitable cross-user correlation. We therefore formulate an asynchronous multi-user uplink OFDMA sensing model and exploit common delay-cluster sparsity across UEs. A line-of-sight (LoS)-referenced calibration first suppresses the offsets, after which a shared-private delay-domain sparse Bayesian learning (SBL) model is used for delay support recovery and user grouping. Doppler and angle of arrival are then estimated from temporal and spatial phase differences. Simulation results show that the proposed scheme outperforms per-user processing, particularly under limited subcarrier budgets and in low signal-to-noise ratio (SNR) regimes.

Via

Access Paper or Ask Questions

Spherical Latent Motion Prior for Physics-Based Simulated Humanoid Control

Mar 01, 2026

Jing Tan, Weisheng Xu, Xiangrui Jiang, Jiaxi Zhang, Kun Yang, Kai Wu, Jiaqi Xiong, Shiting Chen, Yangfan Li, Yixiao Feng(+4 more)

Abstract:Learning motion priors for physics-based humanoid control is an active research topic. Existing approaches mainly include variational autoencoders (VAE) and adversarial motion priors (AMP). VAE introduces information loss, and random latent sampling may sometimes produce invalid behaviors. AMP suffers from mode collapse and struggles to capture diverse motion skills. We present the Spherical Latent Motion Prior (SLMP), a two-stage method for learning motion priors. In the first stage, we train a high-quality motion tracking controller. In the second stage, we distill the tracking controller into a spherical latent space. A combination of distillation, a discriminator, and a discriminator-guided local semantic consistency constraint shapes a structured latent action space, allowing stable random sampling without information loss. To evaluate SLMP, we collect a two-hour human combat motion capture dataset and show that SLMP preserves fine motion detail without information loss, and random sampling yields semantically valid and stable behaviors. When applied to a two-agent physics-based combat task, SLMP produces human-like and physically plausible combat behaviors only using simple rule-based rewards. Furthermore, SLMP generalizes across different humanoid robot morphologies, demonstrating its transferability beyond a single simulated avatar.

Via

Access Paper or Ask Questions

Iterative Closed-Loop Motion Synthesis for Scaling the Capabilities of Humanoid Control

Feb 25, 2026

Weisheng Xu, Qiwei Wu, Jiaxi Zhang, Tan Jing, Yangfan Li, Yuetong Fang, Jiaqi Xiong, Kai Wu, Rong Ou, Renjing Xu

Abstract:Physics-based humanoid control relies on training with motion datasets that have diverse data distributions. However, the fixed difficulty distribution of datasets limits the performance ceiling of the trained control policies. Additionally, the method of acquiring high-quality data through professional motion capture systems is constrained by costs, making it difficult to achieve large-scale scalability. To address these issues, we propose a closed-loop automated motion data generation and iterative framework. It can generate high-quality motion data with rich action semantics, including martial arts, dance, combat, sports, gymnastics, and more. Furthermore, our framework enables difficulty iteration of policies and data through physical metrics and objective evaluations, allowing the trained tracker to break through its original difficulty limits. On the PHC single-primitive tracker, using only approximately 1/10 of the AMASS dataset size, the average failure rate on the test set (2201 clips) is reduced by 45\% compared to the baseline. Finally, we conduct comprehensive ablation and comparative experiments to highlight the rationality and advantages of our framework.

Via

Access Paper or Ask Questions

MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

Feb 16, 2026

Baorong Shi, Bo Cui, Boyuan Jiang, Deli Yu, Fang Qian, Haihua Yang, Huichao Wang, Jiale Chen, Jianfei Pan, Jieqiong Cao(+10 more)

Abstract:We present MedXIAOHE, a medical vision-language foundation model designed to advance general-purpose medical understanding and reasoning in real-world clinical applications. MedXIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities. To achieve this, we propose an entity-aware continual pretraining framework that organizes heterogeneous medical corpora to broaden knowledge coverage and reduce long-tail gaps (e.g., rare diseases). For medical expert-level reasoning and interaction, MedXIAOHE incorporates diverse medical reasoning patterns via reinforcement learning and tool-augmented agentic training, enabling multi-step diagnostic reasoning with verifiable decision traces. To improve reliability in real-world use, MedXIAOHE integrates user-preference rubrics, evidence-grounded reasoning, and low-hallucination long-form report generation, with improved adherence to medical instructions. We release this report to document our practical design choices, scaling insights, and evaluation framework, hoping to inspire further research.

* XIAOHE Medical AI team. Currently, the model is exclusively available on XIAOHE AI Doctor, accessible via both the App Store and the Douyin Mini Program

Via

Access Paper or Ask Questions