Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiao Li

StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams

Mar 08, 2025

Yang LI, Jinglu Wang, Lei Chu, Xiao Li, Shiu-hong Kao, Ying-Cong Chen, Yan Lu

Figure 1 for StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams

Figure 2 for StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams

Figure 3 for StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams

Figure 4 for StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams

Abstract:The advent of 3D Gaussian Splatting (3DGS) has advanced 3D scene reconstruction and novel view synthesis. With the growing interest of interactive applications that need immediate feedback, online 3DGS reconstruction in real-time is in high demand. However, none of existing methods yet meet the demand due to three main challenges: the absence of predetermined camera parameters, the need for generalizable 3DGS optimization, and the necessity of reducing redundancy. We propose StreamGS, an online generalizable 3DGS reconstruction method for unposed image streams, which progressively transform image streams to 3D Gaussian streams by predicting and aggregating per-frame Gaussians. Our method overcomes the limitation of the initial point reconstruction \cite{dust3r} in tackling out-of-domain (OOD) issues by introducing a content adaptive refinement. The refinement enhances cross-frame consistency by establishing reliable pixel correspondences between adjacent frames. Such correspondences further aid in merging redundant Gaussians through cross-frame feature aggregation. The density of Gaussians is thereby reduced, empowering online reconstruction by significantly lowering computational and memory costs. Extensive experiments on diverse datasets have demonstrated that StreamGS achieves quality on par with optimization-based approaches but does so 150 times faster, and exhibits superior generalizability in handling OOD scenes.

* 8 pages

Via

Access Paper or Ask Questions

Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective

Feb 24, 2025

Chengyin Xu, Kaiyuan Chen, Xiao Li, Ke Shen, Chenggang Li

Abstract:The rapid advancements in computing dramatically increase the scale and cost of training Large Language Models (LLMs). Accurately predicting downstream task performance prior to model training is crucial for efficient resource allocation, yet remains challenging due to two primary constraints: (1) the "emergence phenomenon", wherein downstream performance metrics become meaningful only after extensive training, which limits the ability to use smaller models for prediction; (2) Uneven task difficulty distributions and the absence of consistent scaling laws, resulting in substantial metric variability. Existing performance prediction methods suffer from limited accuracy and reliability, thereby impeding the assessment of potential LLM capabilities. To address these challenges, we propose a Clustering-On-Difficulty (COD) downstream performance prediction framework. COD first constructs a predictable support subset by clustering tasks based on difficulty features, strategically excluding non-emergent and non-scalable clusters. The scores on the selected subset serve as effective intermediate predictors of downstream performance on the full evaluation set. With theoretical support, we derive a mapping function that transforms performance metrics from the predictable subset to the full evaluation set, thereby ensuring accurate extrapolation of LLM downstream performance. The proposed method has been applied to predict performance scaling for a 70B LLM, providing actionable insights for training resource allocation and assisting in monitoring the training process. Notably, COD achieves remarkable predictive accuracy on the 70B LLM by leveraging an ensemble of small models, demonstrating an absolute mean deviation of 1.36% across eight important LLM evaluation benchmarks.

* 21 pages,6 figures

Via

Access Paper or Ask Questions

Guaranteed Conditional Diffusion: 3D Block-based Models for Scientific Data Compression

Feb 18, 2025

Jaemoon Lee, Xiao Li, Liangji Zhu, Sanjay Ranka, Anand Rangarajan

Abstract:This paper proposes a new compression paradigm -- Guaranteed Conditional Diffusion with Tensor Correction (GCDTC) -- for lossy scientific data compression. The framework is based on recent conditional diffusion (CD) generative models, and it consists of a conditional diffusion model, tensor correction, and error guarantee. Our diffusion model is a mixture of 3D conditioning and 2D denoising U-Net. The approach leverages a 3D block-based compressing module to address spatiotemporal correlations in structured scientific data. Then, the reverse diffusion process for 2D spatial data is conditioned on the ``slices'' of content latent variables produced by the compressing module. After training, the denoising decoder reconstructs the data with zero noise and content latent variables, and thus it is entirely deterministic. The reconstructed outputs of the CD model are further post-processed by our tensor correction and error guarantee steps to control and ensure a maximum error distortion, which is an inevitable requirement in lossy scientific data compression. Our experiments involving two datasets generated by climate and chemical combustion simulations show that our framework outperforms standard convolutional autoencoders and yields competitive compression quality with an existing scientific data compression algorithm.

Via

Access Paper or Ask Questions

DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Feb 11, 2025

Xiliang Yang, Feng Jiang, Qianen Zhang, Lei Zhao, Xiao Li

Figure 1 for DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Figure 2 for DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Figure 3 for DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Figure 4 for DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Abstract:Direct Preference Optimization (DPO) and its variants have become increasingly popular for aligning language models with human preferences. These methods aim to teach models to better distinguish between chosen (or preferred) and rejected (or dispreferred) responses. However, prior research has identified that the probability of chosen responses often decreases during training, and this phenomenon is known as likelihood displacement. To tackle this challenge, in this work we introduce \method to controllably shift the distribution of the chosen probability. Then, we show that \method exhibits a fundamental trade-off between improving the chosen probability and sacrificing the reward margin, as supported by both theoretical analysis and experimental validation. Furthermore, we demonstrate the superiority of \method over DPO on downstream tasks such as MT-Bench and a designed win rate experiment. We believe this study shows that the likelihood displacement issue of DPO can be effectively mitigated with a simple, theoretically grounded solution. Our code is available at https://github.com/Meaquadddd/DPO-Shift.

Via

Access Paper or Ask Questions

SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content

Feb 10, 2025

Yongping Zhai, Xiaoxi Fu, Qiang Su, Jia Hu, Yake Zhang, Yunfeng Zhou, Chaofan Zhang, Xiao Li, Wenxin Wang, Dongdong Wu(+1 more)

Figure 1 for SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content

Figure 2 for SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content

Figure 3 for SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content

Figure 4 for SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content

Abstract:Autofocus is necessary for high-throughput and real-time scanning in microscopic imaging. Traditional methods rely on complex hardware or iterative hill-climbing algorithms. Recent learning-based approaches have demonstrated remarkable efficacy in a one-shot setting, avoiding hardware modifications or iterative mechanical lens adjustments. However, in this paper, we highlight a significant challenge that the richness of image content can significantly affect autofocus performance. When the image content is sparse, previous autofocus methods, whether traditional climbing-hill or learning-based, tend to fail. To tackle this, we propose a content-importance-based solution, named SparseFocus, featuring a novel two-stage pipeline. The first stage measures the importance of regions within the image, while the second stage calculates the defocus distance from selected important regions. To validate our approach and benefit the research community, we collect a large-scale dataset comprising millions of labelled defocused images, encompassing both dense, sparse and extremely sparse scenarios. Experimental results show that SparseFocus surpasses existing methods, effectively handling all levels of content sparsity. Moreover, we integrate SparseFocus into our Whole Slide Imaging (WSI) system that performs well in real-world applications. The code and dataset will be made available upon the publication of this paper.

Via

Access Paper or Ask Questions

Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling

Feb 09, 2025

Xiao Li, Zekai Zhang, Xiang Li, Siyi Chen, Zhihui Zhu, Peng Wang, Qing Qu

Figure 1 for Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling

Figure 2 for Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling

Figure 3 for Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling

Figure 4 for Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling

Abstract:This work addresses the critical question of why and when diffusion models, despite being designed for generative tasks, can excel at learning high-quality representations in a self-supervised manner. To address this, we develop a mathematical framework based on a low-dimensional data model and posterior estimation, revealing a fundamental trade-off between generation and representation quality near the final stage of image generation. Our analysis explains the unimodal representation dynamics across noise scales, mainly driven by the interplay between data denoising and class specification. Building on these insights, we propose an ensemble method that aggregates features across noise levels, significantly improving both clean performance and robustness under label noise. Extensive experiments on both synthetic and real-world datasets validate our findings.

* First two authors contributed equally

Via

Access Paper or Ask Questions

UVRM: A Scalable 3D Reconstruction Model from Unposed Videos

Jan 16, 2025

Shiu-hong Kao, Xiao Li, Jinglu Wang, Chi-Keung Tang, Yu-Wing Tai, Yan Lu

Figure 1 for UVRM: A Scalable 3D Reconstruction Model from Unposed Videos

Figure 2 for UVRM: A Scalable 3D Reconstruction Model from Unposed Videos

Figure 3 for UVRM: A Scalable 3D Reconstruction Model from Unposed Videos

Figure 4 for UVRM: A Scalable 3D Reconstruction Model from Unposed Videos

Abstract:Large Reconstruction Models (LRMs) have recently become a popular method for creating 3D foundational models. Training 3D reconstruction models with 2D visual data traditionally requires prior knowledge of camera poses for the training samples, a process that is both time-consuming and prone to errors. Consequently, 3D reconstruction training has been confined to either synthetic 3D datasets or small-scale datasets with annotated poses. In this study, we investigate the feasibility of 3D reconstruction using unposed video data of various objects. We introduce UVRM, a novel 3D reconstruction model capable of being trained and evaluated on monocular videos without requiring any information about the pose. UVRM uses a transformer network to implicitly aggregate video frames into a pose-invariant latent feature space, which is then decoded into a tri-plane 3D representation. To obviate the need for ground-truth pose annotations during training, UVRM employs a combination of the score distillation sampling (SDS) method and an analysis-by-synthesis approach, progressively synthesizing pseudo novel-views using a pre-trained diffusion model. We qualitatively and quantitatively evaluate UVRM's performance on the G-Objaverse and CO3D datasets without relying on pose information. Extensive experiments show that UVRM is capable of effectively and efficiently reconstructing a wide range of 3D objects from unposed videos.

Via

Access Paper or Ask Questions

Foundation Model for Lossy Compression of Spatiotemporal Scientific Data

Dec 22, 2024

Xiao Li, Jaemoon Lee, Anand Rangarajan, Sanjay Ranka

Abstract:We present a foundation model (FM) for lossy scientific data compression, combining a variational autoencoder (VAE) with a hyper-prior structure and a super-resolution (SR) module. The VAE framework uses hyper-priors to model latent space dependencies, enhancing compression efficiency. The SR module refines low-resolution representations into high-resolution outputs, improving reconstruction quality. By alternating between 2D and 3D convolutions, the model efficiently captures spatiotemporal correlations in scientific data while maintaining low computational cost. Experimental results demonstrate that the FM generalizes well to unseen domains and varying data shapes, achieving up to 4 times higher compression ratios than state-of-the-art methods after domain-specific fine-tuning. The SR module improves compression ratio by 30 percent compared to simple upsampling techniques. This approach significantly reduces storage and transmission costs for large-scale scientific simulations while preserving data integrity and fidelity.

Via

Access Paper or Ask Questions

Understanding and Mitigating Memorization in Diffusion Models for Tabular Data

Dec 15, 2024

Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen, Xiao Li, Jing Li

Abstract:Tabular data generation has attracted significant research interest in recent years, with the tabular diffusion models greatly improving the quality of synthetic data. However, while memorization, where models inadvertently replicate exact or near-identical training data, has been thoroughly investigated in image and text generation, its effects on tabular data remain largely unexplored. In this paper, we conduct the first comprehensive investigation of memorization phenomena in diffusion models for tabular data. Our empirical analysis reveals that memorization appears in tabular diffusion models and increases with larger training epochs. We further examine the influence of factors such as dataset sizes, feature dimensions, and different diffusion models on memorization. Additionally, we provide a theoretical explanation for why memorization occurs in tabular diffusion models. To address this issue, we propose TabCutMix, a simple yet effective data augmentation technique that exchanges randomly selected feature segments between random same-class training sample pairs. Building upon this, we introduce TabCutMixPlus, an enhanced method that clusters features based on feature correlations and ensures that features within the same cluster are exchanged together during augmentation. This clustering mechanism mitigates out-of-distribution (OOD) generation issues by maintaining feature coherence. Experimental results across various datasets and diffusion models demonstrate that TabCutMix effectively mitigates memorization while maintaining high-quality data generation.

Via

Access Paper or Ask Questions

Safe Adaptive Cruise Control Under Perception Uncertainty: A Deep Ensemble and Conformal Tube Model Predictive Control Approach

Dec 05, 2024

Xiao Li, Anouck Girard, Ilya Kolmanovsky

Figure 1 for Safe Adaptive Cruise Control Under Perception Uncertainty: A Deep Ensemble and Conformal Tube Model Predictive Control Approach

Figure 2 for Safe Adaptive Cruise Control Under Perception Uncertainty: A Deep Ensemble and Conformal Tube Model Predictive Control Approach

Figure 3 for Safe Adaptive Cruise Control Under Perception Uncertainty: A Deep Ensemble and Conformal Tube Model Predictive Control Approach

Figure 4 for Safe Adaptive Cruise Control Under Perception Uncertainty: A Deep Ensemble and Conformal Tube Model Predictive Control Approach

Abstract:Autonomous driving heavily relies on perception systems to interpret the environment for decision-making. To enhance robustness in these safety critical applications, this paper considers a Deep Ensemble of Deep Neural Network regressors integrated with Conformal Prediction to predict and quantify uncertainties. In the Adaptive Cruise Control setting, the proposed method performs state and uncertainty estimation from RGB images, informing the downstream controller of the DNN perception uncertainties. An adaptive cruise controller using Conformal Tube Model Predictive Control is designed to ensure probabilistic safety. Evaluations with a high-fidelity simulator demonstrate the algorithm's effectiveness in speed tracking and safe distance maintaining, including in Out-Of-Distribution scenarios.

Via

Access Paper or Ask Questions