Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinyu Li

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

Jun 16, 2026

Ning Gao, Jinliang Zheng, Xing Gao, Haoxiang Ma, Hanqing Wang, Yukai Wang, Jiantong Chen, Zanxin Chen, Shujie Zhang, Mingda Jia(+15 more)

Abstract:We present EBench, a simulation benchmark that diagnoses generalist mobile manipulation policies beyond a single success-rate scalar. EBench comprises 26 diverse and challenging manipulation tasks annotated along 5 capability dimensions and 4 generalization dimensions. We evaluate state-of-the-art generalist manipulation models including $π_0$, $π_{0.5}$, XVLA, and InternVLA-A1, and reveal that models with near success rates exhibit strikingly different capability profiles: $π_{0.5}$ achieves the highest test success rate and the best train--test retention, whereas InternVLA-A1 dominates mobile manipulation but collapses on dexterous tasks, and XVLA exhibits strengths on a disjoint set of atomic skills compared to other policies. Beyond capability profiling, EBench analyzes the generalization ability from 4 representative perspectives, identifying the impact of different distribution shift factors. The results reveal strengths and weaknesses of models behind an overall score. We hope this benchmark offers a broad set of diagnostic signals to guide iteration on generalist manipulation models.

Via

Access Paper or Ask Questions

Traits Run Deeper: Trait-Specific Asymmetric Fusion for Personality Assessment

Jun 09, 2026

Jia Li, Qian Chen, Wei Wang, Xinyu Li, Zhenzhen Hu, Dongsheng Shao, Richang Hong, Meng Wang

Abstract:Personality assessment aims to infer stable personality traits from dynamic behaviors across language, voice, and facial cues. Since different personality dimensions are revealed through distinct behavioral perspectives, modeling trait-specific evidence is challenging. However, most existing approaches adopt a uniform multimodal fusion strategy across all dimensions, assuming identical modality contributions. This overlooks trait-specific modality preferences and introduces cross-modal interference. To address this issue, we propose a novel personality assessment framework called Traits Run Deeper, which consists of three components. Specifically, the Multimodal Foundation Representation (MFR) module constructs personality-oriented multimodal inputs and leverages psychology-informed semantic templates as anchors, enabling foundation models to capture trait-relevant information. Building upon MFR, the Trait-Specific Modality Fusion (TSMF) module acts as an asymmetric fusion mechanism, allowing each dimension to selectively exploit different modality pathways from modality-specific modeling to complementary fusion. Thus, TSMF captures heterogeneous modality preferences while reducing cross-modal contamination. Furthermore, the Distribution-Calibrated Personality Regression (DCPR) module mitigates label imbalance and central tendency bias through target distribution calibration, improving robustness and stability. Experimental results on the AVI Challenge 2026 validation set demonstrate the effectiveness of the proposed framework, reducing mean squared error (MSE) by approximately 25% compared with the baseline. Consistent improvements are observed on the official test set, where our method achieves the best performance and ranks first in the Personality Assessment Track. The source code will be made available at https://github.com/MSA-LMC/AVI2026.

Via

Access Paper or Ask Questions

Causal Ensemble Agent: Hierarchical Causal Discovery with LLM-guided Expert Reweighting

Jun 09, 2026

Xinyu Li, Yuanyuan Wang, Haoxuan Li, Chuan Zhou, Erdun Gao, Bo Han, Tongliang Liu, Kun Zhang, Howard Bondell, Mingming Gong

Abstract:Causal discovery aims to uncover causal structures from observational data, which is crucial for real-world decision-making. However, different causal discovery algorithms can produce divergent results that conflict with each other, complicating the identification of accurate causal graphs. Traditional approaches rely on numerical values and statistical assumptions, often ignoring rich domain-specific information, such as feature descriptions, which could also help structure learning. While recent works explore using Large Language Models (LLMs) to infer causal relations via direct queries, such methods can be unreliable due to a lack of alignment with the actual data. To address these limitations, we propose Causal Ensemble Agent (CEA), a novel framework that aggregates structural insights from statistical discovery experts across different graph levels via linear opinion pooling, and uses an LLM as a meta-referee to dynamically reweight experts when the aggregated confidence is close to the decision boundary, thereby composing an improved and more complete causal graph. Extensive experiments on both synthetic and real-world datasets demonstrate that CEA achieves the strongest overall performance across a wide range of causal discovery methods, highlighting the effectiveness of using LLMs for meta-analysis in causal discovery.

Via

Access Paper or Ask Questions

Self-Regulated Learning in Essay Writing: Consistency of Strategies and Impact on Outcomes

May 14, 2026

Gloria Fernández-Nieto, Kiyoshige Garcés, Mladen Raković, Tongguang Li, Xinyu Li, Linxuan Zhao, Dragan Gašević

Abstract:Background: Abilities for effective self-regulated learning (SRL) are critical for lifelong learning, particularly during adolescence when these skills consolidate and strongly influence future learning. Their importance has grown with the rise of online and blended education. Yet, little is known about how secondary school students self-regulate in online environments, how their SRL processes and strategies evolve, or how they affect outcomes. In secondary education, understanding these processes can reveal patterns and indicators of learning success, informing the design of online support mechanisms. Evidence from repeated-measures designs remains scarce. Objectives: This study aims to examine how secondary school students enact SRL strategies during online essay writing, how these strategies change over time, and how they relate to learning outcomes. Methods: We analysed metacognition-related trace data collected from secondary students during a two-wave online essay-writing task conducted one week apart in two Colombian schools (N = 93 for session 1, N = 95 for session 2) via a digital learning platform. Using a combination of process mining and unsupervised machine learning techniques, we identified dominant SRL strategies grounded in established SRL processes and examined their stability and association with learning outcomes. Results and conclusions: Three dominant SRL strategies were identified. Results showed variability: many students remained in or shifted to Read first, write next, while none used Write intensively, read selectively in session 2. Although less common, latter strategy was positively associated with learning outcomes.

* 16 pages, 4 figures, submitted to Journal of Computer Assisted Learning (JCAL) [Under Review]

Via

Access Paper or Ask Questions

Signature Approach for Contextual Bandits with Nonlinear and Path-dependent Rewards

May 11, 2026

Xin Guo, Grace He, Xinyu Li

Abstract:We study contextual bandits with nonlinear and path-dependent rewards through a novel signature-transform-based approach. Leveraging the universal nonlinearity property of signatures, we approximate continuous path-dependent reward functionals by linear functionals in the signature space. This representation enables the use of efficient linear contextual bandit methods while preserving expressive sequential structure. Building on this framework, we propose \texttt{DisSigUCB}, a signature-based disjoint upper confidence bound (UCB) algorithm. Under boundedness and non-degeneracy assumptions, we prove a high-probability data-dependent sublinear regret bound of order $\tilde{\mathcal O}(\sqrt{(d+m)KT})$ where $d$ is the context dimension and $m$ is the signature feature dimension. Synthetic experiments and numerical applications on temperature sensor monitoring, sleep-stage classification, and hospital nurse staffing demonstrate that \texttt{DisSigUCB} consistently outperforms classical linear and kernelized contextual bandit baselines in nonlinear and path-dependent settings.

Via

Access Paper or Ask Questions

Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models

May 06, 2026

Mingda Li, Rundong Lv, Xinyu Li, Weinan Zhang, Ting Liu

Abstract:Uncertainty quantification (UQ) is an important technique for ensuring the trustworthiness of LLMs, given their tendency to hallucinate. Existing state-of-the-art UQ approaches for free-form generation rely heavily on sampling, which incurs high computational cost and variance. In this work, we propose the first gradient-based UQ method for free-form generation, SemGrad, which is sampling-free and computationally efficient. Unlike prior gradient-based methods developed for classification tasks that operates in parameter space, we propose to consider gradients in semantic space. Our method builds on the key intuition that a confident LLM should maintain stable output distributions under semantically equivalent input perturbations. We interpret the stability as the gradients in semantic space and introduce a Semantic Preservation Score (SPS) to identify embeddings that best capture semantics, with respect to which gradients are computed. We further propose HybridGrad, which combines the strengths of SemGrad and parameter gradients. Experiments demonstrate that both of our methods provide efficient and effective uncertainty estimates, achieving superior performance than state-of-the-art methods, particularly in settings with multiple valid responses.

* Accepted by ICML 2026

Via

Access Paper or Ask Questions

Variational Matrix-Learning Fourier Networks for Parametric Multiphysics Surrogates

May 04, 2026

Xinyu Li, Jianhua Zhang, Liang Chen

Abstract:Multiphysics simulation is critical for system-technology co-optimization (STCO) in chiplet-based design, but repeated finite-element solutions of PDE-governed problems are computationally expensive in parametric design exploration. This paper proposes a variational matrix-learning Fourier network (VMLFN) for efficient parametric multiphysics surrogate modeling. VMLFN constructs a log-space sine neural representation with randomly sampled spectral frequencies, frequency-dependent decay regulation, and embedded Dirichlet boundary conditions. With fixed hidden-layer parameters, the output-layer weights are determined by reformulating the governing PDEs into variational weak forms and enforcing the stationarity condition of the resulting energy functional. This converts physics-informed training into a linear matrix-solving problem, requiring only first-order derivatives and avoiding both high-order automatic differentiation and penalty-coefficient tuning. A heuristic frequency-scanning algorithm is further introduced to select a problem-adaptive maximum frequency that covers the dominant spectral range of the target problem. The proposed method is validated on heat conduction, solid mechanics, and Helmholtz wave propagation problems. Results from five benchmark cases demonstrate that VMLFN delivers accurate full-field predictions with substantial speedup over conventional physics-informed neural networks and repeated finite-element simulations.

Via

Access Paper or Ask Questions

STORM: End-to-End Referring Multi-Object Tracking in Videos

Apr 12, 2026

Zijia Lu, Jingru Yi, Jue Wang, Yuxiao Chen, Junwen Chen, Xinyu Li, Davide Modolo

Abstract:Referring multi-object tracking (RMOT) is a task of associating all the objects in a video that semantically match with given textual queries or referring expressions. Existing RMOT approaches decompose object grounding and tracking into separated modules and exhibit limited performance due to the scarcity of training videos, ambiguous annotations, and restricted domains. In this work, we introduce STORM, an end-to-end MLLM that jointly performs grounding and tracking within a unified framework, eliminating external detectors and enabling coherent reasoning over appearance, motion, and language. To improve data efficiency, we propose a task-composition learning (TCL) strategy that decomposes RMOT into image grounding and object tracking, allowing STORM to leverage data-rich sub-tasks and learn structured spatial--temporal reasoning. We further construct STORM-Bench, a new RMOT dataset with accurate trajectories and diverse, unambiguous referring expressions generated through a bottom-up annotation pipeline. Extensive experiments show that STORM achieves state-of-the-art performance on image grounding, single-object tracking, and RMOT benchmarks, demonstrating strong generalization and robust spatial--temporal grounding in complex real-world scenarios. STORM-Bench is released at https://github.com/amazon-science/storm-referring-multi-object-grounding.

* CVPR 2026 Findings

Via

Access Paper or Ask Questions

DSevolve: Enabling Real-Time Adaptive Scheduling on Dynamic Shop Floor with LLM-Evolved Heuristic Portfolios

Mar 29, 2026

Jin Huang, Jie Yang, XinLei Zhou, Qihao Liu, Liang Gao, Xinyu Li

Abstract:In dynamic manufacturing environments, disruptions such as machine breakdowns and new order arrivals continuously shift the optimal dispatching strategy, making adaptive rule selection essential. Existing LLM-powered Automatic Heuristic Design (AHD) frameworks evolve toward a single elite rule that cannot meet this adaptability demand. To address this, we present DSevolve, an industrial scheduling framework that evolves a quality-diverse portfolio of dispatching rules offline and adaptively deploys them online with second-level response time. Multi-persona seeding and topology-aware evolutionary operators produce a behaviorally diverse rule archive indexed by a MAP-Elites feature space. Upon each disruption event, a probe-based fingerprinting mechanism characterizes the current shop floor state, retrieves high-quality candidate rules from an offline knowledge base, and selects the best one via rapid look-ahead simulation. Evaluated on 500 dynamic flexible job shop instances derived from real industrial data, DSevolve outperforms state-of-the-art AHD frameworks, classical dispatching rules, genetic programming, and deep reinforcement learning, offering a practical and deployable solution for intelligent shop floor scheduling.

Via

Access Paper or Ask Questions

IncreRTL: Traceability-Guided Incremental RTL Generation under Requirement Evolution

Mar 26, 2026

Luanrong Chen, Renzhi Chen, Xinyu Li, Shanshan Li, Rui Gong, Lei Wang

Abstract:Large language models (LLMs) have shown promise in generating RTL code from natural-language descriptions, but existing methods remain static and struggle to adapt to evolving design requirements, potentially causing structural drift and costly full regeneration. We propose IncreRTL, a LLM-driven framework for incremental RTL generation under requirement evolution. By constructing requirement-code traceability links to locate and regenerate affected code segments, IncreRTL achieves accurate and consistent updates. Evaluated on our newly constructed EvoRTL-Bench, IncreRTL demonstrates notable improvements in regeneration consistency and efficiency, advancing LLM-based RTL generation toward practical engineering deployment.

Via

Access Paper or Ask Questions