Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinyi Wang

University of Maryland, College Park & Beijing Normal University

CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video

Nov 10, 2025

Xinyi Wang, Angeliki Katsenou, Junxiao Shen, David Bull

Abstract:The prevalence of user-generated content (UGC) on platforms such as YouTube and TikTok has rendered no-reference (NR) perceptual video quality assessment (VQA) vital for optimizing video delivery. Nonetheless, the characteristics of non-professional acquisition and the subsequent transcoding of UGC video on sharing platforms present significant challenges for NR-VQA. Although NR-VQA models attempt to infer mean opinion scores (MOS), their modeling of subjective scores for compressed content remains limited due to the absence of fine-grained perceptual annotations of artifact types. To address these challenges, we propose CAMP-VQA, a novel NR-VQA framework that exploits the semantic understanding capabilities of large vision-language models. Our approach introduces a quality-aware prompting mechanism that integrates video metadata (e.g., resolution, frame rate, bitrate) with key fragments extracted from inter-frame variations to guide the BLIP-2 pretraining approach in generating fine-grained quality captions. A unified architecture has been designed to model perceptual quality across three dimensions: semantic alignment, temporal characteristics, and spatial characteristics. These multimodal features are extracted and fused, then regressed to video quality scores. Extensive experiments on a wide variety of UGC datasets demonstrate that our model consistently outperforms existing NR-VQA methods, achieving improved accuracy without the need for costly manual fine-grained annotations. Our method achieves the best performance in terms of average rank and linear correlation (SRCC: 0.928, PLCC: 0.938) compared to state-of-the-art methods. The source code and trained models, along with a user-friendly demo, are available at: https://github.com/xinyiW915/CAMP-VQA.

* 14 pages, 6 figures

Via

Access Paper or Ask Questions

Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model

Oct 30, 2025

Biao Zhang, Yong Cheng, Siamak Shakeri, Xinyi Wang, Min Ma, Orhan Firat

Abstract:Recent large language model (LLM) research has undergone an architectural shift from encoder-decoder modeling to nowadays the dominant decoder-only modeling. This rapid transition, however, comes without a rigorous comparative analysis especially \textit{from the scaling perspective}, raising concerns that the potential of encoder-decoder models may have been overlooked. To fill this gap, we revisit encoder-decoder LLM (RedLLM), enhancing it with recent recipes from decoder-only LLM (DecLLM). We conduct a comprehensive comparison between RedLLM, pretrained with prefix language modeling (LM), and DecLLM, pretrained with causal LM, at different model scales, ranging from $\sim$150M to $\sim$8B. Using RedPajama V1 (1.6T tokens) for pretraining and FLAN for instruction tuning, our experiments show that RedLLM produces compelling scaling properties and surprisingly strong performance. While DecLLM is overall more compute-optimal during pretraining, RedLLM demonstrates comparable scaling and context length extrapolation capabilities. After instruction tuning, RedLLM achieves comparable and even better results on various downstream tasks while enjoying substantially better inference efficiency. We hope our findings could inspire more efforts on re-examining RedLLM, unlocking its potential for developing powerful and efficient LLMs.

* The scaling study inspiring T5Gemma

Via

Access Paper or Ask Questions

Exploration of Summarization by Generative Language Models for Automated Scoring of Long Essays

Oct 26, 2025

Haowei Hua, Hong Jiao, Xinyi Wang

Abstract:BERT and its variants are extensively explored for automated scoring. However, a limit of 512 tokens for these encoder-based models showed the deficiency in automated scoring of long essays. Thus, this research explores generative language models for automated scoring of long essays via summarization and prompting. The results revealed great improvement of scoring accuracy with QWK increased from 0.822 to 0.8878 for the Learning Agency Lab Automated Essay Scoring 2.0 dataset.

* 19 pages, 5 Tables 7 Figures, Presentation at Artificial Intelligence in Measurement and Education Conference (AIME-Con)

Via

Access Paper or Ask Questions

HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness

Oct 10, 2025

Xinyi Wang, Jinyi Han, Zishang Jiang, Tingyun Li, Jiaqing Liang, Sihang Jiang, Zhaoqian Dai, Shuguang Ma, Fei Yu, Yanghua Xiao

Abstract:Reinforcement Learning (RL) has become a key driver for enhancing the long chain-of-thought (CoT) reasoning capabilities of Large Language Models (LLMs). However, prevalent methods like GRPO often fail when task difficulty exceeds the model's capacity, leading to reward sparsity and inefficient training. While prior work attempts to mitigate this using off-policy data, such as mixing RL with Supervised Fine-Tuning (SFT) or using hints, they often misguide policy updates In this work, we identify a core issue underlying these failures, which we term low training affinity. This condition arises from a large distributional mismatch between external guidance and the model's policy. To diagnose this, we introduce Affinity, the first quantitative metric for monitoring exploration efficiency and training stability. To improve Affinity, we propose HINT: Helping Ineffective rollouts Navigate Towards effectiveness, an adaptive hinting framework. Instead of providing direct answers, HINT supplies heuristic hints that guide the model to discover solutions on its own, preserving its autonomous reasoning capabilities. Extensive experiments on mathematical reasoning tasks show that HINT consistently outperforms existing methods, achieving state-of-the-art results with models of various scales, while also demonstrating significantly more stable learning and greater data efficiency.Code is available on Github.

Via

Access Paper or Ask Questions

AI Foundation Model for Time Series with Innovations Representation

Oct 02, 2025

Lang Tong, Xinyi Wang

Figure 1 for AI Foundation Model for Time Series with Innovations Representation

Figure 2 for AI Foundation Model for Time Series with Innovations Representation

Figure 3 for AI Foundation Model for Time Series with Innovations Representation

Figure 4 for AI Foundation Model for Time Series with Innovations Representation

Abstract:This paper introduces an Artificial Intelligence (AI) foundation model for time series in engineering applications, where causal operations are required for real-time monitoring and control. Since engineering time series are governed by physical, rather than linguistic, laws, large-language-model-based AI foundation models may be ineffective or inefficient. Building on the classical innovations representation theory of Wiener, Kallianpur, and Rosenblatt, we propose Time Series GPT (TS-GPT) -- an innovations-representation-based Generative Pre-trained Transformer for engineering monitoring and control. As an example of foundation model adaptation, we consider Probabilistic Generative Forecasting, which produces future time series samples from conditional probability distributions given past realizations. We demonstrate the effectiveness of TS-GPT in forecasting real-time locational marginal prices using historical data from U.S. independent system operators.

Via

Access Paper or Ask Questions

A Novel Symbol Level Precoding based AFDM Transmission Framework: Offloading Equalization Burden to Transmitter Side

Aug 17, 2025

Shuntian Tang, Zesong Fei, Xinyi Wang, Dongkai Zhou, Zhiqiang Wei, Christos Masouros

Abstract:Affine Frequency Division Multiplexing (AFDM) has attracted considerable attention for its robustness to Doppler effects. However, its high receiver-side computational complexity remains a major barrier to practical deployment. To address this, we propose a novel symbol-level precoding (SLP)-based AFDM transmission framework, which shifts the signal processing burden in downlink communications from user side to the base station (BS), enabling direct symbol detection without requiring channel estimation or equalization at the receiver. Specifically, in the uplink phase, we propose a Sparse Bayesian Learning (SBL) based channel estimation algorithm by exploiting the inherent sparsity of affine frequency (AF) domain channels. In particular, the sparse prior is modeled via a hierarchical Laplace distribution, and parameters are iteratively updated using the Expectation-Maximization (EM) algorithm. We also derive the Bayesian Cramer-Rao Bound (BCRB) to characterize the theoretical performance limit. In the downlink phase, the BS employs the SLP technology to design the transmitted waveform based on the estimated uplink channel state information (CSI) and channel reciprocity. The resulting optimization problem is formulated as a second-order cone programming (SOCP) problem, and its dual problem is investigated by Lagrangian function and Karush-Kuhn-Tucker conditions. Simulation results demonstrate that the proposed SBL estimator outperforms traditional orthogonal matching pursuit (OMP) in accuracy and robustness to off-grid effects, while the SLP-based waveform design scheme achieves performance comparable to conventional AFDM receivers while significantly reducing the computational complexity at receiver, validating the practicality of our approach.

* 13 pages, 9 figures; submitted to IEEE journals for possible publication

Via

Access Paper or Ask Questions

LEDOM: An Open and Fundamental Reverse Language Model

Jul 02, 2025

Xunjian Yin, Sitao Cheng, Yuxi Xie, Xinyu Hu, Li Lin, Xinyi Wang, Liangming Pan, William Yang Wang, Xiaojun Wan

Abstract:We introduce LEDOM, the first purely reverse language model, trained autoregressively on 435B tokens with 2B and 7B parameter variants, which processes sequences in reverse temporal order through previous token prediction. For the first time, we present the reverse language model as a potential foundational model across general tasks, accompanied by a set of intriguing examples and insights. Based on LEDOM, we further introduce a novel application: Reverse Reward, where LEDOM-guided reranking of forward language model outputs leads to substantial performance improvements on mathematical reasoning tasks. This approach leverages LEDOM's unique backward reasoning capability to refine generation quality through posterior evaluation. Our findings suggest that LEDOM exhibits unique characteristics with broad application potential. We will release all models, training code, and pre-training data to facilitate future research.

* Work in progress

Via

Access Paper or Ask Questions

Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models

May 27, 2025

Xudong Tan, Yaoxin Yang, Peng Ye, Jialin Zheng, Bizhe Bai, Xinyi Wang, Jia Hao, Tao Chen

Figure 1 for Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models

Figure 2 for Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models

Figure 3 for Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models

Figure 4 for Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models

Abstract:Vision-Language-Action (VLA) models have emerged as a powerful paradigm for general-purpose robot control through natural language instructions. However, their high inference cost-stemming from large-scale token computation and autoregressive decoding-poses significant challenges for real-time deployment and edge applications. While prior work has primarily focused on architectural optimization, we take a different perspective by identifying a dual form of redundancy in VLA models: (i) high similarity across consecutive action steps, and (ii) substantial redundancy in visual tokens. Motivated by these observations, we propose FlashVLA, the first training-free and plug-and-play acceleration framework that enables action reuse in VLA models. FlashVLA improves inference efficiency through a token-aware action reuse mechanism that avoids redundant decoding across stable action steps, and an information-guided visual token selection strategy that prunes low-contribution tokens. Extensive experiments on the LIBERO benchmark show that FlashVLA reduces FLOPs by 55.7% and latency by 36.0%, with only a 0.7% drop in task success rate. These results demonstrate the effectiveness of FlashVLA in enabling lightweight, low-latency VLA inference without retraining.

Via

Access Paper or Ask Questions

A system identification approach to clustering vector autoregressive time series

May 20, 2025

Zuogong Yue, Xinyi Wang, Victor Solo

Abstract:Clustering of time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling. Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction, where the autocorrelation pattern/feature is mostly ignored. Instead of relying on heuristic feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics. We first derive a clustering algorithm based on a mixture autoregressive model. Unfortunately it turns out to have significant computational problems. We then derive a `small-noise' limiting version of the algorithm, which we call k-LMVAR (Limiting Mixture Vector AutoRegression), that is computationally manageable. We develop an associated BIC criterion for choosing the number of clusters and model order. The algorithm performs very well in comparative simulations and also scales well computationally.

Via

Access Paper or Ask Questions

Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions

May 01, 2025

Chenggang Wang, Xinyi Wang, Yutong Dong, Lei Song, Xinping Guan

Figure 1 for Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions

Figure 2 for Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions

Figure 3 for Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions

Figure 4 for Multi-Constraint Safe Reinforcement Learning via Closed-form Solution for Log-Sum-Exp Approximation of Control Barrier Functions

Abstract:The safety of training task policies and their subsequent application using reinforcement learning (RL) methods has become a focal point in the field of safe RL. A central challenge in this area remains the establishment of theoretical guarantees for safety during both the learning and deployment processes. Given the successful implementation of Control Barrier Function (CBF)-based safety strategies in a range of control-affine robotic systems, CBF-based safe RL demonstrates significant promise for practical applications in real-world scenarios. However, integrating these two approaches presents several challenges. First, embedding safety optimization within the RL training pipeline requires that the optimization outputs be differentiable with respect to the input parameters, a condition commonly referred to as differentiable optimization, which is non-trivial to solve. Second, the differentiable optimization framework confronts significant efficiency issues, especially when dealing with multi-constraint problems. To address these challenges, this paper presents a CBF-based safe RL architecture that effectively mitigates the issues outlined above. The proposed approach constructs a continuous AND logic approximation for the multiple constraints using a single composite CBF. By leveraging this approximation, a close-form solution of the quadratic programming is derived for the policy network in RL, thereby circumventing the need for differentiable optimization within the end-to-end safe RL pipeline. This strategy significantly reduces computational complexity because of the closed-form solution while maintaining safety guarantees. Simulation results demonstrate that, in comparison to existing approaches relying on differentiable optimization, the proposed method significantly reduces training computational costs while ensuring provable safety throughout the training process.

Via

Access Paper or Ask Questions