Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xianfeng Gu

Inverse Rendering for High-Genus 3D Surface Meshes from Multi-view Images with Persistent Homology Priors

Jan 17, 2026

Xiang Gao, Xinmu Wang, Yuanpeng Liu, Yue Wang, Junqi Huang, Wei Chen, Xianfeng Gu

Abstract:Reconstructing 3D objects from images is inherently an ill-posed problem due to ambiguities in geometry, appearance, and topology. This paper introduces collaborative inverse rendering with persistent homology priors, a novel strategy that leverages topological constraints to resolve these ambiguities. By incorporating priors that capture critical features such as tunnel loops and handle loops, our approach directly addresses the difficulty of reconstructing high-genus surfaces. The collaboration between photometric consistency from multi-view images and homology-based guidance enables recovery of complex high-genus geometry while circumventing catastrophic failures such as collapsing tunnels or losing high-genus structure. Instead of neural networks, our method relies on gradient-based optimization within a mesh-based inverse rendering framework to highlight the role of topological priors. Experimental results show that incorporating persistent homology priors leads to lower Chamfer Distance (CD) and higher Volume IoU compared to state-of-the-art mesh-based methods, demonstrating improved geometric accuracy and robustness against topological failure.

* ICASSP2026 Accepted

Via

Access Paper or Ask Questions

Learning Domain Agnostic Latent Embeddings of 3D Faces for Zero-shot Animal Expression Transfer

Jan 10, 2026

Yue Wang, Lawrence Amadi, Xiang Gao, Yazheng Chen, Yuanpeng Liu, Ning Lu, Xianfeng Gu

Abstract:We present a zero-shot framework for transferring human facial expressions to 3D animal face meshes. Our method combines intrinsic geometric descriptors (HKS/WKS) with a mesh-agnostic latent embedding that disentangles facial identity and expression. The ID latent space captures species-independent facial structure, while the expression latent space encodes deformation patterns that generalize across humans and animals. Trained only with human expression pairs, the model learns the embeddings, decoupling, and recoupling of cross-identity expressions, enabling expression transfer without requiring animal expression data. To enforce geometric consistency, we employ Jacobian loss together with vertex-position and Laplacian losses. Experiments show that our approach achieves plausible cross-species expression transfer, effectively narrowing the geometric gap between human and animal facial shapes.

* WACV 2026 Workshop LENS

Via

Access Paper or Ask Questions

A Survey of AI Methods for Geometry Preparation and Mesh Generation in Engineering Simulation

Dec 16, 2025

Steven Owen, Nathan Brown, Nikos Chrisochoides, Rao Garimella, Xianfeng Gu, Franck Ledoux, Na Lei, Roshan Quadros, Navamita Ray, Nicolas Winovich(+1 more)

Abstract:Artificial intelligence is beginning to ease long-standing bottlenecks in the CAD-to-mesh pipeline. This survey reviews recent advances where machine learning aids part classification, mesh quality prediction, and defeaturing. We explore methods that improve unstructured and block-structured meshing, support volumetric parameterizations, and accelerate parallel mesh generation. We also examine emerging tools for scripting automation, including reinforcement learning and large language models. Across these efforts, AI acts as an assistive technology, extending the capabilities of traditional geometry and meshing tools. The survey highlights representative methods, practical deployments, and key research challenges that will shape the next generation of data-driven meshing workflows.

* 35 pages, 0 figure, accepted by the International Meshing Roundtable conference 2026

Via

Access Paper or Ask Questions

OT-Talk: Animating 3D Talking Head with Optimal Transportation

May 03, 2025

Xinmu Wang, Xiang Gao, Xiyun Song, Heather Yu, Zongfang Lin, Liang Peng, Xianfeng Gu

Abstract:Animating 3D head meshes using audio inputs has significant applications in AR/VR, gaming, and entertainment through 3D avatars. However, bridging the modality gap between speech signals and facial dynamics remains a challenge, often resulting in incorrect lip syncing and unnatural facial movements. To address this, we propose OT-Talk, the first approach to leverage optimal transportation to optimize the learning model in talking head animation. Building on existing learning frameworks, we utilize a pre-trained Hubert model to extract audio features and a transformer model to process temporal sequences. Unlike previous methods that focus solely on vertex coordinates or displacements, we introduce Chebyshev Graph Convolution to extract geometric features from triangulated meshes. To measure mesh dissimilarities, we go beyond traditional mesh reconstruction errors and velocity differences between adjacent frames. Instead, we represent meshes as probability measures and approximate their surfaces. This allows us to leverage the sliced Wasserstein distance for modeling mesh variations. This approach facilitates the learning of smooth and accurate facial motions, resulting in coherent and natural facial animations. Our experiments on two public audio-mesh datasets demonstrate that our method outperforms state-of-the-art techniques both quantitatively and qualitatively in terms of mesh reconstruction accuracy and temporal alignment. In addition, we conducted a user perception study with 20 volunteers to further assess the effectiveness of our approach.

Via

Access Paper or Ask Questions

HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Jul 19, 2024

Zezeng Li, Weimin Wang, WenHai Li, Na Lei, Xianfeng Gu

Figure 1 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Figure 2 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Figure 3 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Figure 4 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Abstract:Recent CLIP-guided 3D generation methods have achieved promising results but struggle with generating faithful 3D shapes that conform with input text due to the gap between text and image embeddings. To this end, this paper proposes HOTS3D which makes the first attempt to effectively bridge this gap by aligning text features to the image features with spherical optimal transport (SOT). However, in high-dimensional situations, solving the SOT remains a challenge. To obtain the SOT map for high-dimensional features obtained from CLIP encoding of two modalities, we mathematically formulate and derive the solution based on Villani's theorem, which can directly align two hyper-sphere distributions without manifold exponential maps. Furthermore, we implement it by leveraging input convex neural networks (ICNNs) for the optimal Kantorovich potential. With the optimally mapped features, a diffusion-based generator and a Nerf-based decoder are subsequently utilized to transform them into 3D shapes. Extensive qualitative and qualitative comparisons with state-of-the-arts demonstrate the superiority of the proposed HOTS3D for 3D shape generation, especially on the consistency with text semantics.

Via

Access Paper or Ask Questions

Global Parameterization-based Texture Space Optimization

Jun 06, 2024

Wei Chen, Yuxue Ren, Na Lei, Zhongxuan Luo, Xianfeng Gu

Figure 1 for Global Parameterization-based Texture Space Optimization

Figure 2 for Global Parameterization-based Texture Space Optimization

Figure 3 for Global Parameterization-based Texture Space Optimization

Figure 4 for Global Parameterization-based Texture Space Optimization

Abstract:Texture mapping is a common technology in the area of computer graphics, it maps the 3D surface space onto the 2D texture space. However, the loose texture space will reduce the efficiency of data storage and GPU memory addressing in the rendering process. Many of the existing methods focus on repacking given textures, but they still suffer from high computational cost and hardly produce a wholly tight texture space. In this paper, we propose a method to optimize the texture space and produce a new texture mapping which is compact based on global parameterization. The proposed method is computationally robust and efficient. Experiments show the effectiveness of the proposed method and the potency in improving the storage and rendering efficiency.

* Preprint submitted to Comput. Math. Math. Phys

Via

Access Paper or Ask Questions

Backdoor Attack with Mode Mixture Latent Modification

Mar 12, 2024

Hongwei Zhang, Xiaoyin Xu, Dongsheng An, Xianfeng Gu, Min Zhang

Abstract:Backdoor attacks become a significant security concern for deep neural networks in recent years. An image classification model can be compromised if malicious backdoors are injected into it. This corruption will cause the model to function normally on clean images but predict a specific target label when triggers are present. Previous research can be categorized into two genres: poisoning a portion of the dataset with triggered images for users to train the model from scratch, or training a backdoored model alongside a triggered image generator. Both approaches require significant amount of attackable parameters for optimization to establish a connection between the trigger and the target label, which may raise suspicions as more people become aware of the existence of backdoor attacks. In this paper, we propose a backdoor attack paradigm that only requires minimal alterations (specifically, the output layer) to a clean model in order to inject the backdoor under the guise of fine-tuning. To achieve this, we leverage mode mixture samples, which are located between different modes in latent space, and introduce a novel method for conducting backdoor attacks. We evaluate the effectiveness of our method on four popular benchmark datasets: MNIST, CIFAR-10, GTSRB, and TinyImageNet.

Via

Access Paper or Ask Questions

DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Jul 21, 2023

Zezeng Li, ShengHao Li, Zhanpeng Wang, Na Lei, Zhongxuan Luo, Xianfeng Gu

Figure 1 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Figure 2 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Figure 3 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Figure 4 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Abstract:Sampling from diffusion probabilistic models (DPMs) can be viewed as a piecewise distribution transformation, which generally requires hundreds or thousands of steps of the inverse diffusion trajectory to get a high-quality image. Recent progress in designing fast samplers for DPMs achieves a trade-off between sampling speed and sample quality by knowledge distillation or adjusting the variance schedule or the denoising equation. However, it can't be optimal in both aspects and often suffer from mode mixture in short steps. To tackle this problem, we innovatively regard inverse diffusion as an optimal transport (OT) problem between latents at different stages and propose the DPM-OT, a unified learning framework for fast DPMs with a direct expressway represented by OT map, which can generate high-quality samples within around 10 function evaluations. By calculating the semi-discrete optimal transport map between the data latents and the white noise, we obtain an expressway from the prior distribution to the data distribution, while significantly alleviating the problem of mode mixture. In addition, we give the error bound of the proposed method, which theoretically guarantees the stability of the algorithm. Extensive experiments validate the effectiveness and advantages of DPM-OT in terms of speed and quality (FID and mode mixture), thus representing an efficient solution for generative modeling. Source codes are available at https://github.com/cognaclee/DPM-OT

* iccv2023 accepted

Via

Access Paper or Ask Questions

Large language models improve Alzheimer's disease diagnosis using multi-modality data

May 26, 2023

Yingjie Feng, Jun Wang, Xianfeng Gu, Xiaoyin Xu, Min Zhang

Abstract:In diagnosing challenging conditions such as Alzheimer's disease (AD), imaging is an important reference. Non-imaging patient data such as patient information, genetic data, medication information, cognitive and memory tests also play a very important role in diagnosis. Effect. However, limited by the ability of artificial intelligence models to mine such information, most of the existing models only use multi-modal image data, and cannot make full use of non-image data. We use a currently very popular pre-trained large language model (LLM) to enhance the model's ability to utilize non-image data, and achieved SOTA results on the ADNI dataset.

Via

Access Paper or Ask Questions

What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

Nov 11, 2022

Zezeng Li, Zebin Xu, Ying Li, Xianfeng Gu, Na Lei

Figure 1 for What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

Figure 2 for What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

Figure 3 for What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

Figure 4 for What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

Abstract:Intelligent mesh generation (IMG) refers to a technique to generate mesh by machine learning, which is a relatively new and promising research field. Within its short life span, IMG has greatly expanded the generalizability and practicality of mesh generation techniques and brought many breakthroughs and potential possibilities for mesh generation. However, there is a lack of surveys focusing on IMG methods covering recent works. In this paper, we are committed to a systematic and comprehensive survey describing the contemporary IMG landscape. Focusing on 110 preliminary IMG methods, we conducted an in-depth analysis and evaluation from multiple perspectives, including the core technique and application scope of the algorithm, agent learning goals, data types, targeting challenges, advantages and limitations. With the aim of literature collection and classification based on content extraction, we propose three different taxonomies from three views of key technique, output mesh unit element, and applicable input data types. Finally, we highlight some promising future research directions and challenges in IMG. To maximize the convenience of readers, a project page of IMG is provided at \url{https://github.com/xzb030/IMG_Survey}.

Via

Access Paper or Ask Questions