Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Chen

LLM Hallucination Detection: A Fast Fourier Transform Method Based on Hidden Layer Temporal Signals

Sep 16, 2025

Jinxin Li, Gang Tu, ShengYu Cheng, Junjie Hu, Jinting Wang, Rui Chen, Zhilong Zhou, Dongbo Shan

Figure 1 for LLM Hallucination Detection: A Fast Fourier Transform Method Based on Hidden Layer Temporal Signals

Figure 2 for LLM Hallucination Detection: A Fast Fourier Transform Method Based on Hidden Layer Temporal Signals

Figure 3 for LLM Hallucination Detection: A Fast Fourier Transform Method Based on Hidden Layer Temporal Signals

Figure 4 for LLM Hallucination Detection: A Fast Fourier Transform Method Based on Hidden Layer Temporal Signals

Abstract:Hallucination remains a critical barrier for deploying large language models (LLMs) in reliability-sensitive applications. Existing detection methods largely fall into two categories: factuality checking, which is fundamentally constrained by external knowledge coverage, and static hidden-state analysis, that fails to capture deviations in reasoning dynamics. As a result, their effectiveness and robustness remain limited. We propose HSAD (Hidden Signal Analysis-based Detection), a novel hallucination detection framework that models the temporal dynamics of hidden representations during autoregressive generation. HSAD constructs hidden-layer signals by sampling activations across layers, applies Fast Fourier Transform (FFT) to obtain frequency-domain representations, and extracts the strongest non-DC frequency component as spectral features. Furthermore, by leveraging the autoregressive nature of LLMs, HSAD identifies optimal observation points for effective and reliable detection. Across multiple benchmarks, including TruthfulQA, HSAD achieves over 10 percentage points improvement compared to prior state-of-the-art methods. By integrating reasoning-process modeling with frequency-domain analysis, HSAD establishes a new paradigm for robust hallucination detection in LLMs.

Via

Access Paper or Ask Questions

UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features

Sep 05, 2025

Haowang Cui, Rui Chen, Tao Luo, Rui Li, Jiaze Wang

Abstract:The task of synthesizing novel views from a single image is highly ill-posed due to multiple explanations for unobserved areas. Most current methods tend to generate unseen regions from ambiguity priors and interpolation near input views, which often lead to severe distortions. To address this limitation, we propose a novel model dubbed as UniView, which can leverage reference images from a similar object to provide strong prior information during view synthesis. More specifically, we construct a retrieval and augmentation system and employ a multimodal large language model (MLLM) to assist in selecting reference images that meet our requirements. Additionally, a plug-and-play adapter module with multi-level isolation layers is introduced to dynamically generate reference features for the target views. Moreover, in order to preserve the details of an original input image, we design a decoupled triple attention mechanism, which can effectively align and integrate multi-branch features into the synthesis process. Extensive experiments have demonstrated that our UniView significantly improves novel view synthesis performance and outperforms state-of-the-art methods on the challenging datasets.

* Submitted to ACM TOMM

Via

Access Paper or Ask Questions

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?

Sep 03, 2025

Ouxiang Li, Yuan Wang, Xinting Hu, Huijuan Huang, Rui Chen, Jiarong Ou, Xin Tao, Pengfei Wan, Fuli Feng

Abstract:Text-to-image (T2I) generation aims to synthesize images from textual prompts, which jointly specify what must be shown and imply what can be inferred, thereby corresponding to two core capabilities: composition and reasoning. However, with the emerging advances of T2I models in reasoning beyond composition, existing benchmarks reveal clear limitations in providing comprehensive evaluations across and within these capabilities. Meanwhile, these advances also enable models to handle more complex prompts, whereas current benchmarks remain limited to low scene density and simplified one-to-one reasoning. To address these limitations, we propose T2I-CoReBench, a comprehensive and complex benchmark that evaluates both composition and reasoning capabilities of T2I models. To ensure comprehensiveness, we structure composition around scene graph elements (instance, attribute, and relation) and reasoning around the philosophical framework of inference (deductive, inductive, and abductive), formulating a 12-dimensional evaluation taxonomy. To increase complexity, driven by the inherent complexities of real-world scenarios, we curate each prompt with high compositional density for composition and multi-step inference for reasoning. We also pair each prompt with a checklist that specifies individual yes/no questions to assess each intended element independently to facilitate fine-grained and reliable evaluation. In statistics, our benchmark comprises 1,080 challenging prompts and around 13,500 checklist questions. Experiments across 27 current T2I models reveal that their composition capability still remains limited in complex high-density scenarios, while the reasoning capability lags even further behind as a critical bottleneck, with all models struggling to infer implicit elements from prompts. Our project page: https://t2i-corebench.github.io/.

* Project Page: https://t2i-corebench.github.io/

Via

Access Paper or Ask Questions

A Soft Fabric-Based Thermal Haptic Device for VR and Teleoperation

Aug 28, 2025

Rui Chen, Domenico Chiaradia, Antonio Frisoli, Daniele Leonardis

Abstract:This paper presents a novel fabric-based thermal-haptic interface for virtual reality and teleoperation. It integrates pneumatic actuation and conductive fabric with an innovative ultra-lightweight design, achieving only 2~g for each finger unit. By embedding heating elements within textile pneumatic chambers, the system delivers modulated pressure and thermal stimuli to fingerpads through a fully soft, wearable interface. Comprehensive characterization demonstrates rapid thermal modulation with heating rates up to 3$^{\circ}$C/s, enabling dynamic thermal feedback for virtual or teleoperation interactions. The pneumatic subsystem generates forces up to 8.93~N at 50~kPa, while optimization of fingerpad-actuator clearance enhances cooling efficiency with minimal force reduction. Experimental validation conducted with two different user studies shows high temperature identification accuracy (0.98 overall) across three thermal levels, and significant manipulation improvements in a virtual pick-and-place tasks. Results show enhanced success rates (88.5\% to 96.4\%, p = 0.029) and improved force control precision (p = 0.013) when haptic feedback is enabled, validating the effectiveness of the integrated thermal-haptic approach for advanced human-machine interaction applications.

Via

Access Paper or Ask Questions

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Aug 26, 2025

Lin Li, Zehuan Huang, Haoran Feng, Gengxiong Zhuang, Rui Chen, Chunchao Guo, Lu Sheng

Figure 1 for VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Figure 2 for VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Figure 3 for VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Figure 4 for VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Abstract:3D local editing of specified regions is crucial for game industry and robot interaction. Recent methods typically edit rendered multi-view images and then reconstruct 3D models, but they face challenges in precisely preserving unedited regions and overall coherence. Inspired by structured 3D generative models, we propose VoxHammer, a novel training-free approach that performs precise and coherent editing in 3D latent space. Given a 3D model, VoxHammer first predicts its inversion trajectory and obtains its inverted latents and key-value tokens at each timestep. Subsequently, in the denoising and editing phase, we replace the denoising features of preserved regions with the corresponding inverted latents and cached key-value tokens. By retaining these contextual features, this approach ensures consistent reconstruction of preserved areas and coherent integration of edited parts. To evaluate the consistency of preserved regions, we constructed Edit3D-Bench, a human-annotated dataset comprising hundreds of samples, each with carefully labeled 3D editing regions. Experiments demonstrate that VoxHammer significantly outperforms existing methods in terms of both 3D consistency of preserved regions and overall quality. Our method holds promise for synthesizing high-quality edited paired data, thereby laying the data foundation for in-context 3D generation. See our project page at https://huanngzh.github.io/VoxHammer-Page/.

* Project page: https://huanngzh.github.io/VoxHammer-Page/

Via

Access Paper or Ask Questions

Disentanglement in T-space for Faster and Distributed Training of Diffusion Models with Fewer Latent-states

Aug 20, 2025

Samarth Gupta, Raghudeep Gadde, Rui Chen, Aleix M. Martinez

Abstract:We challenge a fundamental assumption of diffusion models, namely, that a large number of latent-states or time-steps is required for training so that the reverse generative process is close to a Gaussian. We first show that with careful selection of a noise schedule, diffusion models trained over a small number of latent states (i.e. $T \sim 32$) match the performance of models trained over a much large number of latent states ($T \sim 1,000$). Second, we push this limit (on the minimum number of latent states required) to a single latent-state, which we refer to as complete disentanglement in T-space. We show that high quality samples can be easily generated by the disentangled model obtained by combining several independently trained single latent-state models. We provide extensive experiments to show that the proposed disentangled model provides 4-6$\times$ faster convergence measured across a variety of metrics on two different datasets.

Via

Access Paper or Ask Questions

LET-US: Long Event-Text Understanding of Scenes

Aug 10, 2025

Rui Chen, Xingyu Chen, Shaoan Wang, Shihan Kong, Junzhi Yu

Abstract:Event cameras output event streams as sparse, asynchronous data with microsecond-level temporal resolution, enabling visual perception with low latency and a high dynamic range. While existing Multimodal Large Language Models (MLLMs) have achieved significant success in understanding and analyzing RGB video content, they either fail to interpret event streams effectively or remain constrained to very short sequences. In this paper, we introduce LET-US, a framework for long event-stream--text comprehension that employs an adaptive compression mechanism to reduce the volume of input events while preserving critical visual details. LET-US thus establishes a new frontier in cross-modal inferential understanding over extended event sequences. To bridge the substantial modality gap between event streams and textual representations, we adopt a two-stage optimization paradigm that progressively equips our model with the capacity to interpret event-based scenes. To handle the voluminous temporal information inherent in long event streams, we leverage text-guided cross-modal queries for feature reduction, augmented by hierarchical clustering and similarity computation to distill the most representative event features. Moreover, we curate and construct a large-scale event-text aligned dataset to train our model, achieving tighter alignment of event features within the LLM embedding space. We also develop a comprehensive benchmark covering a diverse set of tasks -- reasoning, captioning, classification, temporal localization and moment retrieval. Experimental results demonstrate that LET-US outperforms prior state-of-the-art MLLMs in both descriptive accuracy and semantic comprehension on long-duration event streams. All datasets, codes, and models will be publicly available.

Via

Access Paper or Ask Questions

Multibeam High Throughput Satellite: Hardware Foundation, Resource Allocation, and Precoding

Aug 01, 2025

Rui Chen, Wen-Xuan Long, Bing-Qian Wang, Yuan He, Rui-Jin Sun, Nan Cheng, Gan Zheng, Dusit Niyato

Figure 1 for Multibeam High Throughput Satellite: Hardware Foundation, Resource Allocation, and Precoding

Figure 2 for Multibeam High Throughput Satellite: Hardware Foundation, Resource Allocation, and Precoding

Figure 3 for Multibeam High Throughput Satellite: Hardware Foundation, Resource Allocation, and Precoding

Figure 4 for Multibeam High Throughput Satellite: Hardware Foundation, Resource Allocation, and Precoding

Abstract:With its wide coverage and uninterrupted service, satellite communication is a critical technology for next-generation 6G communications. High throughput satellite (HTS) systems, utilizing multipoint beam and frequency multiplexing techniques, enable satellite communication capacity of up to Tbps to meet the growing traffic demand. Therefore, it is imperative to review the-state-of-the-art of multibeam HTS systems and identify their associated challenges and perspectives. Firstly, we summarize the multibeam HTS hardware foundations, including ground station systems, on-board payloads, and user terminals. Subsequently, we review the flexible on-board radio resource allocation approaches of bandwidth, power, time slot, and joint allocation schemes of HTS systems to optimize resource utilization and cater to non-uniform service demand. Additionally, we survey multibeam precoding methods for the HTS system to achieve full-frequency reuse and interference cancellation, which are classified according to different deployments such as single gateway precoding, multiple gateway precoding, on-board precoding, and hybrid on-board/on-ground precoding. Finally, we disscuss the challenges related to Q/V band link outage, time and frequency synchronization of gateways, the accuracy of channel state information (CSI), payload light-weight development, and the application of deep learning (DL). Research on these topics will contribute to enhancing the performance of HTS systems and finally delivering high-speed data to areas underserved by terrestrial networks.

* 38 pages, 18 figures

Via

Access Paper or Ask Questions

Imbalance in Balance: Online Concept Balancing in Generation Models

Jul 17, 2025

Yukai Shi, Jiarong Ou, Rui Chen, Haotian Yang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Kun Gai

Abstract:In visual generation tasks, the responses and combinations of complex concepts often lack stability and are error-prone, which remains an under-explored area. In this paper, we attempt to explore the causal factors for poor concept responses through elaborately designed experiments. We also design a concept-wise equalization loss function (IMBA loss) to address this issue. Our proposed method is online, eliminating the need for offline dataset processing, and requires minimal code changes. In our newly proposed complex concept benchmark Inert-CompBench and two other public test sets, our method significantly enhances the concept response capability of baseline models and yields highly competitive results with only a few codes.

* Accepted by ICCV2025

Via

Access Paper or Ask Questions

UniTEX: Universal High Fidelity Generative Texturing for 3D Shapes

May 29, 2025

Yixun Liang, Kunming Luo, Xiao Chen, Rui Chen, Hongyu Yan, Weiyu Li, Jiarui Liu, Ping Tan

Abstract:We present UniTEX, a novel two-stage 3D texture generation framework to create high-quality, consistent textures for 3D assets. Existing approaches predominantly rely on UV-based inpainting to refine textures after reprojecting the generated multi-view images onto the 3D shapes, which introduces challenges related to topological ambiguity. To address this, we propose to bypass the limitations of UV mapping by operating directly in a unified 3D functional space. Specifically, we first propose that lifts texture generation into 3D space via Texture Functions (TFs)--a continuous, volumetric representation that maps any 3D point to a texture value based solely on surface proximity, independent of mesh topology. Then, we propose to predict these TFs directly from images and geometry inputs using a transformer-based Large Texturing Model (LTM). To further enhance texture quality and leverage powerful 2D priors, we develop an advanced LoRA-based strategy for efficiently adapting large-scale Diffusion Transformers (DiTs) for high-quality multi-view texture synthesis as our first stage. Extensive experiments demonstrate that UniTEX achieves superior visual quality and texture integrity compared to existing approaches, offering a generalizable and scalable solution for automated 3D texture generation. Code will available in: https://github.com/YixunLiang/UniTEX.

* 10 pages, 9 figures

Via

Access Paper or Ask Questions