Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hongyun Wang

GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering

Oct 16, 2025

Alexander Valverde, Brian Xu, Yuyin Zhou, Meng Xu, Hongyun Wang

Abstract:Scene reconstruction has emerged as a central challenge in computer vision, with approaches such as Neural Radiance Fields (NeRF) and Gaussian Splatting achieving remarkable progress. While Gaussian Splatting demonstrates strong performance on large-scale datasets, it often struggles to capture fine details or maintain realism in regions with sparse coverage, largely due to the inherent limitations of sparse 3D training data. In this work, we propose GauSSmart, a hybrid method that effectively bridges 2D foundational models and 3D Gaussian Splatting reconstruction. Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision from foundational models such as DINO, to enhance Gaussian-based scene reconstruction. By leveraging 2D segmentation priors and high-dimensional feature embeddings, our method guides the densification and refinement of Gaussian splats, improving coverage in underrepresented areas and preserving intricate structural details. We validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting in the majority of evaluated scenes. Our results demonstrate the significant potential of hybrid 2D-3D approaches, highlighting how the thoughtful combination of 2D foundational models with 3D reconstruction pipelines can overcome the limitations inherent in either approach alone.

Via

Access Paper or Ask Questions

Optimal Interference Signal for Masking an Acoustic Source

Aug 20, 2025

Hongyun Wang, Hong Zhou

Figure 1 for Optimal Interference Signal for Masking an Acoustic Source

Figure 2 for Optimal Interference Signal for Masking an Acoustic Source

Figure 3 for Optimal Interference Signal for Masking an Acoustic Source

Figure 4 for Optimal Interference Signal for Masking an Acoustic Source

Abstract:In an environment where acoustic privacy or deliberate signal obfuscation is desired, it is necessary to mask the acoustic signature generated in essential operations. We consider the problem of masking the effect of an acoustic source in a target region where possible detection sensors are located. Masking is achieved by placing interference signals near the acoustic source. We introduce a theoretical and computational framework for designing such interference signals with the goal of minimizing the residual amplitude in the target region. For the three-dimensional (3D) forced wave equation with spherical symmetry, we derive analytical quasi-steady periodic solutions for several canonical cases. We examine the phenomenon of self-masking where an acoustic source with certain spatial forcing profile masks itself from detection outside its forcing footprint. We then use superposition of spherically symmetric solutions to investigate masking in a given target region. We analyze and optimize the performance of using one or two point-forces deployed near the acoustic source for masking in the target region. For the general case where the spatial forcing profile of the acoustic source lacks spherical symmetry, we develop an efficient numerical method for solving the 3D wave equation. Potential applications of this work include undersea acoustic communication security, undersea vehicles stealth, and protection against acoustic surveillance.

* 40 pages, a preprint

Via

Access Paper or Ask Questions

LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation

Apr 20, 2025

Jiachen Li, Qing Xie, Xiaohan Yu, Hongyun Wang, Jinyu Xu, Yongjian Liu, Yongsheng Gao

Abstract:Zero-shot referring image segmentation aims to locate and segment the target region based on a referring expression, with the primary challenge of aligning and matching semantics across visual and textual modalities without training. Previous works address this challenge by utilizing Vision-Language Models and mask proposal networks for region-text matching. However, this paradigm may lead to incorrect target localization due to the inherent ambiguity and diversity of free-form referring expressions. To alleviate this issue, we present LGD (Leveraging Generative Descriptions), a framework that utilizes the advanced language generation capabilities of Multi-Modal Large Language Models to enhance region-text matching performance in Vision-Language Models. Specifically, we first design two kinds of prompts, the attribute prompt and the surrounding prompt, to guide the Multi-Modal Large Language Models in generating descriptions related to the crucial attributes of the referent object and the details of surrounding objects, referred to as attribute description and surrounding description, respectively. Secondly, three visual-text matching scores are introduced to evaluate the similarity between instance-level visual features and textual features, which determines the mask most associated with the referring expression. The proposed method achieves new state-of-the-art performance on three public datasets RefCOCO, RefCOCO+ and RefCOCOg, with maximum improvements of 9.97% in oIoU and 11.29% in mIoU compared to previous methods.

Via

Access Paper or Ask Questions