Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoxu Zheng

AnchorSplat: Feed-Forward 3D Gaussian Splatting with 3D Geometric Priors

Apr 09, 2026

Xiaoxue Zhang, Xiaoxu Zheng, Yixuan Yin, Tiao Zhao, Kaihua Tang, Michael Bi Mi, Zhan Xu, Dave Zhenyu Chen

Abstract:Recent feed-forward Gaussian reconstruction models adopt a pixel-aligned formulation that maps each 2D pixel to a 3D Gaussian, entangling Gaussian representations tightly with the input images. In this paper, we propose AnchorSplat, a novel feed-forward 3DGS framework for scene-level reconstruction that represents the scene directly in 3D space. AnchorSplat introduces an anchor-aligned Gaussian representation guided by 3D geometric priors (e.g., sparse point clouds, voxels, or RGB-D point clouds), enabling a more geometry-aware renderable 3D Gaussians that is independent of image resolution and number of views. This design substantially reduces the number of required Gaussians, improving computational efficiency while enhancing reconstruction fidelity. Beyond the anchor-aligned design, we utilize a Gaussian Refiner to adjust the intermediate Gaussiansy via merely a few forward passes. Experiments on the ScanNet++ v2 NVS benchmark demonstrate the SOTA performance, outperforming previous methods with more view-consistent and substantially fewer Gaussian primitives.

* CVPR 2026

Via

Access Paper or Ask Questions

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Dec 22, 2025

Hanyang Kong, Xingyi Yang, Xiaoxu Zheng, Xinchao Wang

Abstract:Generating long-range, geometrically consistent video presents a fundamental dilemma: while consistency demands strict adherence to 3D geometry in pixel space, state-of-the-art generative models operate most effectively in a camera-conditioned latent space. This disconnect causes current methods to struggle with occluded areas and complex camera trajectories. To bridge this gap, we propose WorldWarp, a framework that couples a 3D structural anchor with a 2D generative refiner. To establish geometric grounding, WorldWarp maintains an online 3D geometric cache built via Gaussian Splatting (3DGS). By explicitly warping historical content into novel views, this cache acts as a structural scaffold, ensuring each new frame respects prior geometry. However, static warping inevitably leaves holes and artifacts due to occlusions. We address this using a Spatio-Temporal Diffusion (ST-Diff) model designed for a "fill-and-revise" objective. Our key innovation is a spatio-temporal varying noise schedule: blank regions receive full noise to trigger generation, while warped regions receive partial noise to enable refinement. By dynamically updating the 3D cache at every step, WorldWarp maintains consistency across video chunks. Consequently, it achieves state-of-the-art fidelity by ensuring that 3D logic guides structure while diffusion logic perfects texture. Project page: \href{https://hyokong.github.io/worldwarp-page/}{https://hyokong.github.io/worldwarp-page/}.

* Project page: https://hyokong.github.io/worldwarp-page/

Via

Access Paper or Ask Questions

GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

Nov 20, 2024

Hang Zhou, Xiaoxu Zheng, Yunhe Wang, Michael Bi Mi, Deyi Xiong, Kai Han

Figure 1 for GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

Figure 2 for GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

Figure 3 for GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

Figure 4 for GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

Abstract:Recurrent neural network (RNNs) that are capable of modeling long-distance dependencies are widely used in various speech tasks, eg., keyword spotting (KWS) and speech enhancement (SE). Due to the limitation of power and memory in low-resource devices, efficient RNN models are urgently required for real-world applications. In this paper, we propose an efficient RNN architecture, GhostRNN, which reduces hidden state redundancy with cheap operations. In particular, we observe that partial dimensions of hidden states are similar to the others in trained RNN models, suggesting that redundancy exists in specific RNNs. To reduce the redundancy and hence computational cost, we propose to first generate a few intrinsic states, and then apply cheap operations to produce ghost states based on the intrinsic states. Experiments on KWS and SE tasks demonstrate that the proposed GhostRNN significantly reduces the memory usage (~40%) and computation cost while keeping performance similar.

* Proc. INTERSPEECH 2023, 226-230

Via

Access Paper or Ask Questions

Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction

May 01, 2023

Ziwei Yu, Chen Li, Linlin Yang, Xiaoxu Zheng, Michael Bi Mi, Gim Hee Lee, Angela Yao

Figure 1 for Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction

Figure 2 for Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction

Figure 3 for Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction

Figure 4 for Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction

Abstract:Direct mesh fitting for 3D hand shape reconstruction is highly accurate. However, the reconstructed meshes are prone to artifacts and do not appear as plausible hand shapes. Conversely, parametric models like MANO ensure plausible hand shapes but are not as accurate as the non-parametric methods. In this work, we introduce a novel weakly-supervised hand shape estimation framework that integrates non-parametric mesh fitting with MANO model in an end-to-end fashion. Our joint model overcomes the tradeoff in accuracy and plausibility to yield well-aligned and high-quality 3D meshes, especially in challenging two-hand and hand-object interaction scenarios.

* CVPR 2023

Via

Access Paper or Ask Questions

Improving Deep Regression with Ordinal Entropy

Jan 21, 2023

Shihao Zhang, Linlin Yang, Michael Bi Mi, Xiaoxu Zheng, Angela Yao

Figure 1 for Improving Deep Regression with Ordinal Entropy

Figure 2 for Improving Deep Regression with Ordinal Entropy

Figure 3 for Improving Deep Regression with Ordinal Entropy

Figure 4 for Improving Deep Regression with Ordinal Entropy

Abstract:In computer vision, it is often observed that formulating regression problems as a classification task often yields better performance. We investigate this curious phenomenon and provide a derivation to show that classification, with the cross-entropy loss, outperforms regression with a mean squared error loss in its ability to learn high-entropy feature representations. Based on the analysis, we propose an ordinal entropy loss to encourage higher-entropy feature spaces while maintaining ordinal relationships to improve the performance of regression tasks. Experiments on synthetic and real-world regression tasks demonstrate the importance and benefits of increasing entropy for regression.

* Accepted to ICLR 2023. Project page: https://github.com/needylove/OrdinalEntropy

Via

Access Paper or Ask Questions