Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bing Wang

Learning A Zero-shot Occupancy Network from Vision Foundation Models via Self-supervised Adaptation

Mar 10, 2025

Sihao Lin, Daqi Liu, Ruochong Fu, Dongrui Liu, Andy Song, Hongwei Xie, Zhihui Li, Bing Wang, Xiaojun Chang

Abstract:Estimating the 3D world from 2D monocular images is a fundamental yet challenging task due to the labour-intensive nature of 3D annotations. To simplify label acquisition, this work proposes a novel approach that bridges 2D vision foundation models (VFMs) with 3D tasks by decoupling 3D supervision into an ensemble of image-level primitives, e.g., semantic and geometric components. As a key motivator, we leverage the zero-shot capabilities of vision-language models for image semantics. However, due to the notorious ill-posed problem - multiple distinct 3D scenes can produce identical 2D projections, directly inferring metric depth from a monocular image in a zero-shot manner is unsuitable. In contrast, 2D VFMs provide promising sources of relative depth, which theoretically aligns with metric depth when properly scaled and offset. Thus, we adapt the relative depth derived from VFMs into metric depth by optimising the scale and offset using temporal consistency, also known as novel view synthesis, without access to ground-truth metric depth. Consequently, we project the semantics into 3D space using the reconstructed metric depth, thereby providing 3D supervision. Extensive experiments on nuScenes and SemanticKITTI demonstrate the effectiveness of our framework. For instance, the proposed method surpasses the current state-of-the-art by 3.34% mIoU on nuScenes for voxel occupancy prediction.

* preprint

Via

Access Paper or Ask Questions

Separating Drone Point Clouds From Complex Backgrounds by Cluster Filter -- Technical Report for CVPR 2024 UG2 Challenge

Dec 22, 2024

Hanfang Liang, Jinming Hu, Xiaohuan Ling, Bing Wang

Figure 1 for Separating Drone Point Clouds From Complex Backgrounds by Cluster Filter -- Technical Report for CVPR 2024 UG2 Challenge

Figure 2 for Separating Drone Point Clouds From Complex Backgrounds by Cluster Filter -- Technical Report for CVPR 2024 UG2 Challenge

Figure 3 for Separating Drone Point Clouds From Complex Backgrounds by Cluster Filter -- Technical Report for CVPR 2024 UG2 Challenge

Figure 4 for Separating Drone Point Clouds From Complex Backgrounds by Cluster Filter -- Technical Report for CVPR 2024 UG2 Challenge

Abstract:The increasing deployment of small drones as tools of conflict and disruption has amplified their threat, highlighting the urgent need for effective anti-drone measures. However, the compact size of most drones presents a significant challenge, as traditional supervised point cloud or image-based object detection methods often fail to identify such small objects effectively. This paper proposes a simple UAV detection method using an unsupervised pipeline. It uses spatial-temporal sequence processing to fuse multiple lidar datasets effectively, tracking and determining the position of UAVs, so as to detect and track UAVs in challenging environments. Our method performs front and rear background segmentation of point clouds through a global-local sequence clusterer and parses point cloud data from both the spatial-temporal density and spatial-temporal voxels of the point cloud. Furthermore, a scoring mechanism for point cloud moving targets is proposed, using time series detection to improve accuracy and efficiency. We used the MMAUD dataset, and our method achieved 4th place in the CVPR 2024 UG2+ Challenge, confirming the effectiveness of our method in practical applications.

* 7 pages, 4 figures

Via

Access Paper or Ask Questions

Towards Automated Cross-domain Exploratory Data Analysis through Large Language Models

Dec 10, 2024

Jun-Peng Zhu, Boyan Niu, Peng cai, Zheming Ni, Jianwei Wan, Kai Xu, Jiajun Huang, Shengbo Ma, Bing Wang, Xuan Zhou(+4 more)

Figure 1 for Towards Automated Cross-domain Exploratory Data Analysis through Large Language Models

Figure 2 for Towards Automated Cross-domain Exploratory Data Analysis through Large Language Models

Figure 3 for Towards Automated Cross-domain Exploratory Data Analysis through Large Language Models

Figure 4 for Towards Automated Cross-domain Exploratory Data Analysis through Large Language Models

Abstract:Exploratory data analysis (EDA), coupled with SQL, is essential for data analysts involved in data exploration and analysis. However, data analysts often encounter two primary challenges: (1) the need to craft SQL queries skillfully, and (2) the requirement to generate suitable visualization types that enhance the interpretation of query results. Due to its significance, substantial research efforts have been made to explore different approaches to address these challenges, including leveraging large language models (LLMs). However, existing methods fail to meet real-world data exploration requirements primarily due to (1) complex database schema; (2) unclear user intent; (3) limited cross-domain generalization capability; and (4) insufficient end-to-end text-to-visualization capability. This paper presents TiInsight, an automated SQL-based cross-domain exploratory data analysis system. First, we propose hierarchical data context (i.e., HDC), which leverages LLMs to summarize the contexts related to the database schema, which is crucial for open-world EDA systems to generalize across data domains. Second, the EDA system is divided into four components (i.e., stages): HDC generation, question clarification and decomposition, text-to-SQL generation (i.e., TiSQL), and data visualization (i.e., TiChart). Finally, we implemented an end-to-end EDA system with a user-friendly GUI interface in the production environment at PingCAP. We have also open-sourced all APIs of TiInsight to facilitate research within the EDA community. Through extensive evaluations by a real-world user study, we demonstrate that TiInsight offers remarkable performance compared to human experts. Specifically, TiSQL achieves an execution accuracy of 86.3% on the Spider dataset using GPT-4. It also demonstrates state-of-the-art performance on the Bird dataset.

* 14 pages, 10 figures. Submitted to SIGMOD 2025

Via

Access Paper or Ask Questions

Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification

Nov 19, 2024

Guangchi Fang, Bing Wang

Figure 1 for Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification

Figure 2 for Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification

Figure 3 for Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification

Figure 4 for Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification

Abstract:In this study, we explore the essential challenge of fast scene optimization for Gaussian Splatting. Through a thorough analysis of the geometry modeling process, we reveal that dense point clouds can be effectively reconstructed early in optimization through Gaussian representations. This insight leads to our approach of aggressive Gaussian densification, which provides a more efficient alternative to conventional progressive densification methods. By significantly increasing the number of critical Gaussians, we enhance the model capacity to capture dense scene geometry at the early stage of optimization. This strategy is seamlessly integrated into the Mini-Splatting densification and simplification framework, enabling rapid convergence without compromising quality. Additionally, we introduce visibility culling within Gaussian Splatting, leveraging per-view Gaussian importance as precomputed visibility to accelerate the optimization process. Our Mini-Splatting2 achieves a balanced trade-off among optimization time, the number of Gaussians, and rendering quality, establishing a strong baseline for future Gaussian-Splatting-based works. Our work sets the stage for more efficient, high-quality 3D scene modeling in real-world applications, and the code will be made available no matter acceptance.

Via

Access Paper or Ask Questions

PointCG: Self-supervised Point Cloud Learning via Joint Completion and Generation

Nov 09, 2024

Yun Liu, Peng Li, Xuefeng Yan, Liangliang Nan, Bing Wang, Honghua Chen, Lina Gong, Wei Zhao, Mingqiang Wei

Figure 1 for PointCG: Self-supervised Point Cloud Learning via Joint Completion and Generation

Figure 2 for PointCG: Self-supervised Point Cloud Learning via Joint Completion and Generation

Figure 3 for PointCG: Self-supervised Point Cloud Learning via Joint Completion and Generation

Figure 4 for PointCG: Self-supervised Point Cloud Learning via Joint Completion and Generation

Abstract:The core of self-supervised point cloud learning lies in setting up appropriate pretext tasks, to construct a pre-training framework that enables the encoder to perceive 3D objects effectively. In this paper, we integrate two prevalent methods, masked point modeling (MPM) and 3D-to-2D generation, as pretext tasks within a pre-training framework. We leverage the spatial awareness and precise supervision offered by these two methods to address their respective limitations: ambiguous supervision signals and insensitivity to geometric information. Specifically, the proposed framework, abbreviated as PointCG, consists of a Hidden Point Completion (HPC) module and an Arbitrary-view Image Generation (AIG) module. We first capture visible points from arbitrary views as inputs by removing hidden points. Then, HPC extracts representations of the inputs with an encoder and completes the entire shape with a decoder, while AIG is used to generate rendered images based on the visible points' representations. Extensive experiments demonstrate the superiority of the proposed method over the baselines in various downstream tasks. Our code will be made available upon acceptance.

Via

Access Paper or Ask Questions

Physics-informed Shadowgraph Network: An End-to-end Density Field Reconstruction Method

Nov 02, 2024

Xutun Wang, Yuchen Zhang, Zidong Li, Haocheng Wen, Bing Wang

Figure 1 for Physics-informed Shadowgraph Network: An End-to-end Density Field Reconstruction Method

Figure 2 for Physics-informed Shadowgraph Network: An End-to-end Density Field Reconstruction Method

Figure 3 for Physics-informed Shadowgraph Network: An End-to-end Density Field Reconstruction Method

Figure 4 for Physics-informed Shadowgraph Network: An End-to-end Density Field Reconstruction Method

Abstract:This study presents a novel approach for quantificationally reconstructing density fields from shadowgraph images using physics-informed neural networks

Via

Access Paper or Ask Questions

Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys

Oct 27, 2024

Lu Wang, Hongchan Chen, Bing Wang, Qian Li, Qun Luo, Yuexing Han

Figure 1 for Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys

Figure 2 for Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys

Figure 3 for Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys

Figure 4 for Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys

Abstract:In the field of materials science, exploring the relationship between composition, microstructure, and properties has long been a critical research focus. The mechanical performance of solid-solution Mg-Gd alloys is significantly influenced by Gd content, dendritic structures, and the presence of secondary phases. To better analyze and predict the impact of these factors, this study proposes a multimodal fusion learning framework based on image processing and deep learning techniques. This framework integrates both elemental composition and microstructural features to accurately predict the Vickers hardness of solid-solution Mg-Gd alloys. Initially, deep learning methods were employed to extract microstructural information from a variety of solid-solution Mg-Gd alloy images obtained from literature and experiments. This provided precise grain size and secondary phase microstructural features for performance prediction tasks. Subsequently, these quantitative analysis results were combined with Gd content information to construct a performance prediction dataset. Finally, a regression model based on the Transformer architecture was used to predict the Vickers hardness of Mg-Gd alloys. The experimental results indicate that the Transformer model performs best in terms of prediction accuracy, achieving an R^2 value of 0.9. Additionally, SHAP analysis identified critical values for four key features affecting the Vickers hardness of Mg-Gd alloys, providing valuable guidance for alloy design. These findings not only enhance the understanding of alloy performance but also offer theoretical support for future material design and optimization.

Via

Access Paper or Ask Questions

Physics informed Shadowgraph Density Field Reconstruction

Oct 26, 2024

Xutun Wang, Yuchen Zhang, Zidong Li, Haocheng Wen, Bing Wang

Figure 1 for Physics informed Shadowgraph Density Field Reconstruction

Figure 2 for Physics informed Shadowgraph Density Field Reconstruction

Figure 3 for Physics informed Shadowgraph Density Field Reconstruction

Figure 4 for Physics informed Shadowgraph Density Field Reconstruction

Abstract:This study presents a novel approach to reconstructing density fields from shadowgraph images using a physics-informed framework. By integrating traditional shadowgraph imaging techniques with physics-informed neural networks (PINNs), we effectively capture refractive index variations within complex flow fields. The proposed method addresses the inherent challenges of shadowgraphy, such as noise and limited spatial resolution, enabling accurate visualization of fluid dynamics. Experimental results demonstrate the feasibility and robustness of our approach, with significant agreement observed between the reconstructed density fields and experimental measurements. This research contributes to the advancement of non-intrusive diagnostic techniques in fluid mechanics and enhances our understanding of flow structures in various applications.

Via

Access Paper or Ask Questions

ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models

Oct 24, 2024

Hengxiang Zhang, Hongfu Gao, Qiang Hu, Guanhua Chen, Lili Yang, Bingyi Jing, Hongxin Wei, Bing Wang, Haifeng Bai, Lei Yang

Figure 1 for ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models

Figure 2 for ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models

Figure 3 for ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models

Figure 4 for ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models

Abstract:With the rapid development of Large language models (LLMs), understanding the capabilities of LLMs in identifying unsafe content has become increasingly important. While previous works have introduced several benchmarks to evaluate the safety risk of LLMs, the community still has a limited understanding of current LLMs' capability to recognize illegal and unsafe content in Chinese contexts. In this work, we present a Chinese safety benchmark (ChineseSafe) to facilitate research on the content safety of large language models. To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography, and variant/homophonic words. Moreover, we employ two methods to evaluate the legal risks of popular LLMs, including open-sourced models and APIs. The results reveal that many LLMs exhibit vulnerability to certain types of safety issues, leading to legal risks in China. Our work provides a guideline for developers and researchers to facilitate the safety of LLMs. Our results are also available at https://huggingface.co/spaces/SUSTech/ChineseSafe-Benchmark.

Via

Access Paper or Ask Questions

Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration

Oct 08, 2024

Xueyang Kang, Zhaoliang Luan, Kourosh Khoshelham, Bing Wang

Abstract:Point cloud registration is a foundational task for 3D alignment and reconstruction applications. While both traditional and learning-based registration approaches have succeeded, leveraging the intrinsic symmetry of point cloud data, including rotation equivariance, has received insufficient attention. This prohibits the model from learning effectively, resulting in a requirement for more training data and increased model complexity. To address these challenges, we propose a graph neural network model embedded with a local Spherical Euclidean 3D equivariance property through SE(3) message passing based propagation. Our model is composed mainly of a descriptor module, equivariant graph layers, match similarity, and the final regression layers. Such modular design enables us to utilize sparsely sampled input points and initialize the descriptor by self-trained or pre-trained geometric feature descriptors easily. Experiments conducted on the 3DMatch and KITTI datasets exhibit the compelling and robust performance of our model compared to state-of-the-art approaches, while the model complexity remains relatively low at the same time.

* 18 main body pages, and 9 pages for supplementary part

Via

Access Paper or Ask Questions