Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lu Li

Scaling Latent Reasoning via Looped Language Models

Oct 29, 2025

Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing(+23 more)

Abstract:Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks. Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities. We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT. We hope our results show the potential of LoopLM as a novel scaling direction in the reasoning era. Our model could be found in: http://ouro-llm.github.io.

Via

Access Paper or Ask Questions

Bag-of-Word-Groups (BoWG): A Robust and Efficient Loop Closure Detection Method Under Perceptual Aliasing

Oct 26, 2025

Xiang Fei, Tina Tian, Howie Choset, Lu Li

Abstract:Loop closure is critical in Simultaneous Localization and Mapping (SLAM) systems to reduce accumulative drift and ensure global mapping consistency. However, conventional methods struggle in perceptually aliased environments, such as narrow pipes, due to vector quantization, feature sparsity, and repetitive textures, while existing solutions often incur high computational costs. This paper presents Bag-of-Word-Groups (BoWG), a novel loop closure detection method that achieves superior precision-recall, robustness, and computational efficiency. The core innovation lies in the introduction of word groups, which captures the spatial co-occurrence and proximity of visual words to construct an online dictionary. Additionally, drawing inspiration from probabilistic transition models, we incorporate temporal consistency directly into similarity computation with an adaptive scheme, substantially improving precision-recall performance. The method is further strengthened by a feature distribution analysis module and dedicated post-verification mechanisms. To evaluate the effectiveness of our method, we conduct experiments on both public datasets and a confined-pipe dataset we constructed. Results demonstrate that BoWG surpasses state-of-the-art methods, including both traditional and learning-based approaches, in terms of precision-recall and computational efficiency. Our approach also exhibits excellent scalability, achieving an average processing time of 16 ms per image across 17,565 images in the Bicocca25b dataset.

* This paper has been accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

Via

Access Paper or Ask Questions

The Three Regimes of Offline-to-Online Reinforcement Learning

Oct 01, 2025

Lu Li, Tianwei Ni, Yihao Sun, Pierre-Luc Bacon

Figure 1 for The Three Regimes of Offline-to-Online Reinforcement Learning

Figure 2 for The Three Regimes of Offline-to-Online Reinforcement Learning

Figure 3 for The Three Regimes of Offline-to-Online Reinforcement Learning

Figure 4 for The Three Regimes of Offline-to-Online Reinforcement Learning

Abstract:Offline-to-online reinforcement learning (RL) has emerged as a practical paradigm that leverages offline datasets for pretraining and online interactions for fine-tuning. However, its empirical behavior is highly inconsistent: design choices of online-fine tuning that work well in one setting can fail completely in another. We propose a stability--plasticity principle that can explain this inconsistency: we should preserve the knowledge of pretrained policy or offline dataset during online fine-tuning, whichever is better, while maintaining sufficient plasticity. This perspective identifies three regimes of online fine-tuning, each requiring distinct stability properties. We validate this framework through a large-scale empirical study, finding that the results strongly align with its predictions in 45 of 63 cases. This work provides a principled framework for guiding design choices in offline-to-online RL based on the relative performance of the offline dataset and the pretrained policy.

Via

Access Paper or Ask Questions

Empowering Clinical Trial Design through AI: A Randomized Evaluation of PowerGPT

Sep 15, 2025

Yiwen Lu, Lu Li, Dazheng Zhang, Xinyao Jian, Tingyin Wang, Siqi Chen, Yuqing Lei, Jiayi Tong, Zhaohan Xi, Haitao Chu(+13 more)

Abstract:Sample size calculations for power analysis are critical for clinical research and trial design, yet their complexity and reliance on statistical expertise create barriers for many researchers. We introduce PowerGPT, an AI-powered system integrating large language models (LLMs) with statistical engines to automate test selection and sample size estimation in trial design. In a randomized trial to evaluate its effectiveness, PowerGPT significantly improved task completion rates (99.3% vs. 88.9% for test selection, 99.3% vs. 77.8% for sample size calculation) and accuracy (94.1% vs. 55.4% in sample size estimation, p < 0.001), while reducing average completion time (4.0 vs. 9.3 minutes, p < 0.001). These gains were consistent across various statistical tests and benefited both statisticians and non-statisticians as well as bridging expertise gaps. Already under deployment across multiple institutions, PowerGPT represents a scalable AI-driven approach that enhances accessibility, efficiency, and accuracy in statistical power analysis for clinical research.

Via

Access Paper or Ask Questions

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Jun 18, 2025

Roger Creus Castanyer, Johan Obando-Ceron, Lu Li, Pierre-Luc Bacon, Glen Berseth, Aaron Courville, Pablo Samuel Castro

Figure 1 for Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Figure 2 for Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Figure 3 for Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Figure 4 for Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Abstract:Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure mode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.

Via

Access Paper or Ask Questions

SVD-Based Graph Fractional Fourier Transform on Directed Graphs and Its Application

Jun 04, 2025

Lu Li, Haiye Huo

Abstract:Graph fractional Fourier transform (GFRFT) is an extension of graph Fourier transform (GFT) that provides an additional fractional analysis tool for graph signal processing (GSP) by generalizing temporal-vertex domain Fourier analysis to fractional orders. In recent years, a large number of studies on GFRFT based on undirected graphs have emerged, but there are very few studies on directed graphs. Therefore, in this paper, one of our main contributions is to introduce two novel GFRFTs defined on Cartesian product graph of two directed graphs, by performing singular value decomposition on graph fractional Laplacian matrices. We prove that two proposed GFRFTs can effectively express spatial-temporal data sets on directed graphs with strong correlation. Moreover, we extend the theoretical results to a generalized Cartesian product graph, which is constructed by $m$ directed graphs. Finally, the denoising performance of our proposed two GFRFTs are testified through simulation by processing hourly temperature data sets collected from 32 weather stations in the Brest region of France.

* 30 pages,14 figures

Via

Access Paper or Ask Questions

Advancing Image Super-resolution Techniques in Remote Sensing: A Comprehensive Survey

May 29, 2025

Yunliang Qi, Meng Lou, Yimin Liu, Lu Li, Zhen Yang, Wen Nie

Abstract:Remote sensing image super-resolution (RSISR) is a crucial task in remote sensing image processing, aiming to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts. Despite the growing number of RSISR methods proposed in recent years, a systematic and comprehensive review of these methods is still lacking. This paper presents a thorough review of RSISR algorithms, covering methodologies, datasets, and evaluation metrics. We provide an in-depth analysis of RSISR methods, categorizing them into supervised, unsupervised, and quality evaluation approaches, to help researchers understand current trends and challenges. Our review also discusses the strengths, limitations, and inherent challenges of these techniques. Notably, our analysis reveals significant limitations in existing methods, particularly in preserving fine-grained textures and geometric structures under large-scale degradation. Based on these findings, we outline future research directions, highlighting the need for domain-specific architectures and robust evaluation protocols to bridge the gap between synthetic and real-world RSISR scenarios.

* 31 pages,7 figures, an survey

Via

Access Paper or Ask Questions

STRICT: Stress Test of Rendering Images Containing Text

May 25, 2025

Tianyu Zhang, Xinyu Wang, Zhenghan Tai, Lu Li, Jijun Chi, Jingrui Tian, Hailin He, Suyuchen Wang

Abstract:While diffusion models have revolutionized text-to-image generation with their ability to synthesize realistic and diverse scenes, they continue to struggle to generate consistent and legible text within images. This shortcoming is commonly attributed to the locality bias inherent in diffusion-based generation, which limits their ability to model long-range spatial dependencies. In this paper, we introduce $\textbf{STRICT}$, a benchmark designed to systematically stress-test the ability of diffusion models to render coherent and instruction-aligned text in images. Our benchmark evaluates models across multiple dimensions: (1) the maximum length of readable text that can be generated; (2) the correctness and legibility of the generated text, and (3) the ratio of not following instructions for generating text. We evaluate several state-of-the-art models, including proprietary and open-source variants, and reveal persistent limitations in long-range consistency and instruction-following capabilities. Our findings provide insights into architectural bottlenecks and motivate future research directions in multimodal generative modeling. We release our entire evaluation pipeline at https://github.com/tianyu-z/STRICT-Bench.

* 13 pages

Via

Access Paper or Ask Questions

MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

May 21, 2025

Lu Li, Cunhang Fan, Hongyu Zhang, Jingjing Zhang, Xiaoke Yang, Jian Zhou, Zhao Lv

Figure 1 for MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

Figure 2 for MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

Figure 3 for MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

Figure 4 for MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

Abstract:Auditory attention detection (AAD) aims to detect the target speaker in a multi-talker environment from brain signals, such as electroencephalography (EEG), which has made great progress. However, most AAD methods solely utilize attention mechanisms sequentially and overlook valuable multi-scale contextual information within EEG signals, limiting their ability to capture long-short range spatiotemporal dependencies simultaneously. To address these issues, this paper proposes a multi-scale hybrid attention network (MHANet) for AAD, which consists of the multi-scale hybrid attention (MHA) module and the spatiotemporal convolution (STC) module. Specifically, MHA combines channel attention and multi-scale temporal and global attention mechanisms. This effectively extracts multi-scale temporal patterns within EEG signals and captures long-short range spatiotemporal dependencies simultaneously. To further improve the performance of AAD, STC utilizes temporal and spatial convolutions to aggregate expressive spatiotemporal representations. Experimental results show that the proposed MHANet achieves state-of-the-art performance with fewer trainable parameters across three datasets, 3 times lower than that of the most advanced model. Code is available at: https://github.com/fchest/MHANet.

Via

Access Paper or Ask Questions

ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

May 15, 2025

Cunhang Fan, Xiaoke Yang, Hongyu Zhang, Ying Chen, Lu Li, Jian Zhou, Zhao Lv

Figure 1 for ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

Figure 2 for ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

Figure 3 for ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

Figure 4 for ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

Abstract:Auditory attention detection (AAD) aims to identify the direction of the attended speaker in multi-speaker environments from brain signals, such as Electroencephalography (EEG) signals. However, existing EEG-based AAD methods overlook the spatio-temporal dependencies of EEG signals, limiting their decoding and generalization abilities. To address these issues, this paper proposes a Lightweight Spatio-Temporal Enhancement Nested Network (ListenNet) for AAD. The ListenNet has three key components: Spatio-temporal Dependency Encoder (STDE), Multi-scale Temporal Enhancement (MSTE), and Cross-Nested Attention (CNA). The STDE reconstructs dependencies between consecutive time windows across channels, improving the robustness of dynamic pattern extraction. The MSTE captures temporal features at multiple scales to represent both fine-grained and long-range temporal patterns. In addition, the CNA integrates hierarchical features more effectively through novel dynamic attention mechanisms to capture deep spatio-temporal correlations. Experimental results on three public datasets demonstrate the superiority of ListenNet over state-of-the-art methods in both subject-dependent and challenging subject-independent settings, while reducing the trainable parameter count by approximately 7 times. Code is available at:https://github.com/fchest/ListenNet.

Via

Access Paper or Ask Questions