Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Li

College of Business, City University of Hong Kong, Hong Kong, China

Complex Physics-Informed Neural Network

Feb 07, 2025

Chenhao Si, Ming Yan, Xin Li, Zhihong Xia

Abstract:We propose compleX-PINN, a novel physics-informed neural network (PINN) architecture that incorporates a learnable activation function inspired by Cauchy integral theorem. By learning the parameters of the activation function, compleX-PINN achieves high accuracy with just a single hidden layer. Empirical results show that compleX-PINN effectively solves problems where traditional PINNs struggle and consistently delivers significantly higher precision, often by an order of magnitude.

* 16 pages, 9 figures

Via

Access Paper or Ask Questions

Gaussian Process Regression for Inverse Problems in Linear PDEs

Feb 06, 2025

Xin Li, Markus Lange-Hegermann, Bogdan Raiţă

Figure 1 for Gaussian Process Regression for Inverse Problems in Linear PDEs

Figure 2 for Gaussian Process Regression for Inverse Problems in Linear PDEs

Figure 3 for Gaussian Process Regression for Inverse Problems in Linear PDEs

Figure 4 for Gaussian Process Regression for Inverse Problems in Linear PDEs

Abstract:This paper introduces a computationally efficient algorithm in system theory for solving inverse problems governed by linear partial differential equations (PDEs). We model solutions of linear PDEs using Gaussian processes with priors defined based on advanced commutative algebra and algebraic analysis. The implementation of these priors is algorithmic and achieved using the Macaulay2 computer algebra software. An example application includes identifying the wave speed from noisy data for classical wave equations, which are widely used in physics. The method achieves high accuracy while enhancing computational efficiency.

Via

Access Paper or Ask Questions

WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction

Feb 03, 2025

Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo

Abstract:In this paper, we present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis. Previous dynamic human avatar reconstruction methods typically require the input video to have full coverage of the observed human body. However, in daily practice, one typically has access to limited viewpoints, such as monocular front-view videos, making it a cumbersome task for previous methods to reconstruct the unseen parts of the human avatar. To tackle the issue, we present WonderHuman, which leverages 2D generative diffusion model priors to achieve high-quality, photorealistic reconstructions of dynamic human avatars from monocular videos, including accurate rendering of unseen body parts. Our approach introduces a Dual-Space Optimization technique, applying Score Distillation Sampling (SDS) in both canonical and observation spaces to ensure visual consistency and enhance realism in dynamic human reconstruction. Additionally, we present a View Selection strategy and Pose Feature Injection to enforce the consistency between SDS predictions and observed data, ensuring pose-dependent effects and higher fidelity in the reconstructed avatar. In the experiments, our method achieves SOTA performance in producing photorealistic renderings from the given monocular video, particularly for those challenging unseen parts. The project page and source code can be found at https://wyiguanw.github.io/WonderHuman/.

Via

Access Paper or Ask Questions

ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking

Feb 03, 2025

Jianqiu Chen, Zikun Zhou, Xin Li, Ye Zheng, Tianpeng Bao, Zhenyu He

Figure 1 for ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking

Figure 2 for ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking

Figure 3 for ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking

Figure 4 for ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking

Abstract:Bin-picking is a practical and challenging robotic manipulation task, where accurate 6D pose estimation plays a pivotal role. The workpieces in bin-picking are typically textureless and randomly stacked in a bin, which poses a significant challenge to 6D pose estimation. Existing solutions are typically learning-based methods, which require object-specific training. Their efficiency of practical deployment for novel workpieces is highly limited by data collection and model retraining. Zero-shot 6D pose estimation is a potential approach to address the issue of deployment efficiency. Nevertheless, existing zero-shot 6D pose estimation methods are designed to leverage feature matching to establish point-to-point correspondences for pose estimation, which is less effective for workpieces with textureless appearances and ambiguous local regions. In this paper, we propose ZeroBP, a zero-shot pose estimation framework designed specifically for the bin-picking task. ZeroBP learns Position-Aware Correspondence (PAC) between the scene instance and its CAD model, leveraging both local features and global positions to resolve the mismatch issue caused by ambiguous regions with similar shapes and appearances. Extensive experiments on the ROBI dataset demonstrate that ZeroBP outperforms state-of-the-art zero-shot pose estimation methods, achieving an improvement of 9.1% in average recall of correct poses.

* ICRA 2025

Via

Access Paper or Ask Questions

Learning Fused State Representations for Control from Multi-View Observations

Feb 03, 2025

Zeyu Wang, Yao-Hui Li, Xin Li, Hongyu Zang, Romain Laroche, Riashat Islam

Figure 1 for Learning Fused State Representations for Control from Multi-View Observations

Figure 2 for Learning Fused State Representations for Control from Multi-View Observations

Figure 3 for Learning Fused State Representations for Control from Multi-View Observations

Figure 4 for Learning Fused State Representations for Control from Multi-View Observations

Abstract:Multi-View Reinforcement Learning (MVRL) seeks to provide agents with multi-view observations, enabling them to perceive environment with greater effectiveness and precision. Recent advancements in MVRL focus on extracting latent representations from multiview observations and leveraging them in control tasks. However, it is not straightforward to learn compact and task-relevant representations, particularly in the presence of redundancy, distracting information, or missing views. In this paper, we propose Multi-view Fusion State for Control (MFSC), firstly incorporating bisimulation metric learning into MVRL to learn task-relevant representations. Furthermore, we propose a multiview-based mask and latent reconstruction auxiliary task that exploits shared information across views and improves MFSC's robustness in missing views by introducing a mask token. Extensive experimental results demonstrate that our method outperforms existing approaches in MVRL tasks. Even in more realistic scenarios with interference or missing views, MFSC consistently maintains high performance.

Via

Access Paper or Ask Questions

Enhancing Neural Function Approximation: The XNet Outperforming KAN

Jan 31, 2025

Xin Li, Xiaotao Zheng, Zhihong Xia

Abstract:XNet is a single-layer neural network architecture that leverages Cauchy integral-based activation functions for high-order function approximation. Through theoretical analysis, we show that the Cauchy activation functions used in XNet can achieve arbitrary-order polynomial convergence, fundamentally outperforming traditional MLPs and Kolmogorov-Arnold Networks (KANs) that rely on increased depth or B-spline activations. Our extensive experiments on function approximation, PDE solving, and reinforcement learning demonstrate XNet's superior performance - reducing approximation error by up to 50000 times and accelerating training by up to 10 times compared to existing approaches. These results establish XNet as a highly efficient architecture for both scientific computing and AI applications.

* arXiv admin note: text overlap with arXiv:2410.02033

Via

Access Paper or Ask Questions

TransPathNet: A Novel Two-Stage Framework for Indoor Radio Map Prediction

Jan 27, 2025

Xin Li, Ran Liu, Saihua Xu, Sirajudeen Gulam Razul, Chau Yuen

Figure 1 for TransPathNet: A Novel Two-Stage Framework for Indoor Radio Map Prediction

Figure 2 for TransPathNet: A Novel Two-Stage Framework for Indoor Radio Map Prediction

Figure 3 for TransPathNet: A Novel Two-Stage Framework for Indoor Radio Map Prediction

Abstract:Accurate indoor pathloss prediction is crucial for optimizing wireless communication in indoor settings, where diverse materials and complex electromagnetic interactions pose significant modeling challenges. This paper introduces TransPathNet, a novel two-stage deep learning framework that leverages transformer-based feature extraction and multiscale convolutional attention decoding to generate high-precision indoor radio pathloss maps. TransPathNet demonstrates state-of-the-art performance in the ICASSP 2025 Indoor Pathloss Radio Map Prediction Challenge, achieving an overall Root Mean Squared Error (RMSE) of 10.397 dB on the challenge full test set and 9.73 dB on the challenge Kaggle test set, showing excellent generalization capabilities across different indoor geometries, frequencies, and antenna patterns. Our project page, including the associated code, is available at https://lixin.ai/TransPathNet/.

* Accepted to ICASSP 2025

Via

Access Paper or Ask Questions

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Jan 25, 2025

Weikang Meng, Yadan Luo, Xin Li, Dongmei Jiang, Zheng Zhang

Abstract:Linear attention has emerged as a promising alternative to softmax-based attention, leveraging kernelized feature maps to reduce complexity from quadratic to linear in sequence length. However, the non-negative constraint on feature maps and the relaxed exponential function used in approximation lead to significant information loss compared to the original query-key dot products, resulting in less discriminative attention maps with higher entropy. To address the missing interactions driven by negative values in query-key pairs, we propose a polarity-aware linear attention mechanism that explicitly models both same-signed and opposite-signed query-key interactions, ensuring comprehensive coverage of relational information. Furthermore, to restore the spiky properties of attention maps, we provide a theoretical analysis proving the existence of a class of element-wise functions (with positive first and second derivatives) that can reduce entropy in the attention distribution. For simplicity, and recognizing the distinct contributions of each dimension, we employ a learnable power function for rescaling, allowing strong and weak attention signals to be effectively separated. Extensive experiments demonstrate that the proposed PolaFormer improves performance on various vision tasks, enhancing both expressiveness and efficiency by up to 4.6%.

Via

Access Paper or Ask Questions

Large Language Models for Knowledge Graph Embedding Techniques, Methods, and Challenges: A Survey

Jan 14, 2025

Bingchen Liu, Xin Li

Figure 1 for Large Language Models for Knowledge Graph Embedding Techniques, Methods, and Challenges: A Survey

Figure 2 for Large Language Models for Knowledge Graph Embedding Techniques, Methods, and Challenges: A Survey

Figure 3 for Large Language Models for Knowledge Graph Embedding Techniques, Methods, and Challenges: A Survey

Figure 4 for Large Language Models for Knowledge Graph Embedding Techniques, Methods, and Challenges: A Survey

Abstract:Large Language Models (LLMs) have attracted a lot of attention in various fields due to their superior performance, aiming to train hundreds of millions or more parameters on large amounts of text data to understand and generate natural language. As the superior performance of LLMs becomes apparent, they are increasingly being applied to knowledge graph embedding (KGE) related tasks to improve the processing results. As a deep learning model in the field of Natural Language Processing (NLP), it learns a large amount of textual data to predict the next word or generate content related to a given text. However, LLMs have recently been invoked to varying degrees in different types of KGE related scenarios such as multi-modal KGE and open KGE according to their task characteristics. In this paper, we investigate a wide range of approaches for performing LLMs-related tasks in different types of KGE scenarios. To better compare the various approaches, we summarize each KGE scenario in a classification. In addition to the categorization methods, we provide a tabular overview of the methods and their source code links for a more direct comparison. In the article we also discuss the applications in which the methods are mainly used and suggest several forward-looking directions for the development of this new research area.

Via

Access Paper or Ask Questions

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Jan 09, 2025

Ronghao Dang, Yuqian Yuan, Wenqi Zhang, Yifei Xin, Boqiang Zhang, Long Li, Liuyi Wang, Qinyang Zeng, Xin Li, Lidong Bing

Figure 1 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Figure 2 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Figure 3 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Figure 4 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Abstract:The enhancement of generalization in robots by large vision-language models (LVLMs) is increasingly evident. Therefore, the embodied cognitive abilities of LVLMs based on egocentric videos are of great interest. However, current datasets for embodied video question answering lack comprehensive and systematic evaluation frameworks. Critical embodied cognitive issues, such as robotic self-cognition, dynamic scene perception, and hallucination, are rarely addressed. To tackle these challenges, we propose ECBench, a high-quality benchmark designed to systematically evaluate the embodied cognitive abilities of LVLMs. ECBench features a diverse range of scene video sources, open and varied question formats, and 30 dimensions of embodied cognition. To ensure quality, balance, and high visual dependence, ECBench uses class-independent meticulous human annotation and multi-round question screening strategies. Additionally, we introduce ECEval, a comprehensive evaluation system that ensures the fairness and rationality of the indicators. Utilizing ECBench, we conduct extensive evaluations of proprietary, open-source, and task-specific LVLMs. ECBench is pivotal in advancing the embodied cognitive capabilities of LVLMs, laying a solid foundation for developing reliable core models for embodied agents. All data and code are available at https://github.com/Rh-Dang/ECBench.

Via

Access Paper or Ask Questions