Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cheng Wang

Xiamen University, China

HotSpot: Screened Poisson Equation for Signed Distance Function Optimization

Nov 21, 2024

Zimo Wang, Cheng Wang, Taiki Yoshino, Sirui Tao, Ziyang Fu, Tzu-Mao Li

Figure 1 for HotSpot: Screened Poisson Equation for Signed Distance Function Optimization

Figure 2 for HotSpot: Screened Poisson Equation for Signed Distance Function Optimization

Figure 3 for HotSpot: Screened Poisson Equation for Signed Distance Function Optimization

Figure 4 for HotSpot: Screened Poisson Equation for Signed Distance Function Optimization

Abstract:We propose a method, HotSpot, for optimizing neural signed distance functions, based on a relation between the solution of a screened Poisson equation and the distance function. Existing losses such as the eikonal loss cannot guarantee the recovered implicit function to be a distance function, even when the implicit function satisfies the eikonal equation almost everywhere. Furthermore, the eikonal loss suffers from stability issues in optimization and the remedies that introduce area or divergence minimization can lead to oversmoothing. We address these challenges by designing a loss function that when minimized can converge to the true distance function, is stable, and naturally penalize large surface area. We provide theoretical analysis and experiments on both challenging 2D and 3D datasets and show that our method provide better surface reconstruction and more accurate distance approximation.

Via

Access Paper or Ask Questions

BiDense: Binarization for Dense Prediction

Nov 15, 2024

Rui Yin, Haotong Qin, Yulun Zhang, Wenbo Li, Yong Guo, Jianjun Zhu, Cheng Wang, Biao Jia

Figure 1 for BiDense: Binarization for Dense Prediction

Figure 2 for BiDense: Binarization for Dense Prediction

Figure 3 for BiDense: Binarization for Dense Prediction

Figure 4 for BiDense: Binarization for Dense Prediction

Abstract:Dense prediction is a critical task in computer vision. However, previous methods often require extensive computational resources, which hinders their real-world application. In this paper, we propose BiDense, a generalized binary neural network (BNN) designed for efficient and accurate dense prediction tasks. BiDense incorporates two key techniques: the Distribution-adaptive Binarizer (DAB) and the Channel-adaptive Full-precision Bypass (CFB). The DAB adaptively calculates thresholds and scaling factors for binarization, effectively retaining more information within BNNs. Meanwhile, the CFB facilitates full-precision bypassing for binary convolutional layers undergoing various channel size transformations, which enhances the propagation of real-valued signals and minimizes information loss. By leveraging these techniques, BiDense preserves more real-valued information, enabling more accurate and detailed dense predictions in BNNs. Extensive experiments demonstrate that our framework achieves performance levels comparable to full-precision models while significantly reducing memory usage and computational costs.

Via

Access Paper or Ask Questions

Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Nov 13, 2024

Youan Cong, Cheng Wang, Pritom Saha Akash, Kevin Chen-Chuan Chang

Figure 1 for Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Figure 2 for Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Figure 3 for Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Figure 4 for Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Abstract:We introduce the Extract-Refine-Retrieve-Read (ERRR) framework, a novel approach designed to bridge the pre-retrieval information gap in Retrieval-Augmented Generation (RAG) systems through query optimization tailored to meet the specific knowledge requirements of Large Language Models (LLMs). Unlike conventional query optimization techniques used in RAG, the ERRR framework begins by extracting parametric knowledge from LLMs, followed by using a specialized query optimizer for refining these queries. This process ensures the retrieval of only the most pertinent information essential for generating accurate responses. Moreover, to enhance flexibility and reduce computational costs, we propose a trainable scheme for our pipeline that utilizes a smaller, tunable model as the query optimizer, which is refined through knowledge distillation from a larger teacher model. Our evaluations on various question-answering (QA) datasets and with different retrieval systems show that ERRR consistently outperforms existing baselines, proving to be a versatile and cost-effective module for improving the utility and accuracy of RAG systems.

Via

Access Paper or Ask Questions

V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion

Nov 13, 2024

Xun Huang, Jinlong Wang, Qiming Xia, Siheng Chen, Bisheng Yang, Cheng Wang, Chenglu Wen

Figure 1 for V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion

Figure 2 for V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion

Figure 3 for V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion

Figure 4 for V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion

Abstract:Current Vehicle-to-Everything (V2X) systems have significantly enhanced 3D object detection using LiDAR and camera data. However, these methods suffer from performance degradation in adverse weather conditions. The weatherrobust 4D radar provides Doppler and additional geometric information, raising the possibility of addressing this challenge. To this end, we present V2X-R, the first simulated V2X dataset incorporating LiDAR, camera, and 4D radar. V2X-R contains 12,079 scenarios with 37,727 frames of LiDAR and 4D radar point clouds, 150,908 images, and 170,859 annotated 3D vehicle bounding boxes. Subsequently, we propose a novel cooperative LiDAR-4D radar fusion pipeline for 3D object detection and implement it with various fusion strategies. To achieve weather-robust detection, we additionally propose a Multi-modal Denoising Diffusion (MDD) module in our fusion pipeline. MDD utilizes weather-robust 4D radar feature as a condition to prompt the diffusion model to denoise noisy LiDAR features. Experiments show that our LiDAR-4D radar fusion pipeline demonstrates superior performance in the V2X-R dataset. Over and above this, our MDD module further improved the performance of basic fusion model by up to 5.73%/6.70% in foggy/snowy conditions with barely disrupting normal performance. The dataset and code will be publicly available at: https://github.com/ylwhxht/V2X-R.

Via

Access Paper or Ask Questions

Brillouin photonics engine in the thin-film lithium niobate platform

Nov 10, 2024

Kaixuan Ye, Hanke Feng, Randy te Morsche, Akhileshwar Mishra, Yvan Klaver, Chuangchuang Wei, Zheng Zheng, Akshay Keloth, Ahmet Tarık Işık, Zhaoxi Chen(+2 more)

Figure 1 for Brillouin photonics engine in the thin-film lithium niobate platform

Figure 2 for Brillouin photonics engine in the thin-film lithium niobate platform

Figure 3 for Brillouin photonics engine in the thin-film lithium niobate platform

Figure 4 for Brillouin photonics engine in the thin-film lithium niobate platform

Abstract:Stimulated Brillouin scattering (SBS) is revolutionizing low-noise lasers and microwave photonic systems. However, despite extensive explorations of a low-loss and versatile integrated platform for Brillouin photonic circuits, current options fall short due to limited technological scalability or inadequate SBS gain. Here we introduce the thin-film lithium niobate (TFLN) platform as the go-to choice for integrated Brillouin photonics applications. We report the angle-dependent strong SBS gain in this platform, which can overcome the intrinsic propagation loss. Furthermore, we demonstrate the first stimulated Brillouin laser in TFLN with a tuning range > 20 nm and utilize it to achieve high-purity RF signal generation with an intrinsic linewidth of 9 Hz. Finally, we devise a high-rejection Brillouin-based microwave photonic notch filter, for the first time, integrating an SBS spiral, an on-chip modulator, and a tunable ring all within the same platform. This TFLN-based Brillouin photonics engine uniquely combines the scalability of this platform and the versatility of SBS. Moreover, it bridges SBS with other functionalities in the TFLN platform, unlocking new possibilities for Brillouin-based applications with unparalleled performances.

Via

Access Paper or Ask Questions

Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration

Nov 04, 2024

Kezheng Xiong, Haoen Xiang, Qingshan Xu, Chenglu Wen, Siqi Shen, Jonathan Li, Cheng Wang

Figure 1 for Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration

Figure 2 for Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration

Figure 3 for Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration

Figure 4 for Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration

Abstract:Point cloud registration, a fundamental task in 3D vision, has achieved remarkable success with learning-based methods in outdoor environments. Unsupervised outdoor point cloud registration methods have recently emerged to circumvent the need for costly pose annotations. However, they fail to establish reliable optimization objectives for unsupervised training, either relying on overly strong geometric assumptions, or suffering from poor-quality pseudo-labels due to inadequate integration of low-level geometric and high-level contextual information. We have observed that in the feature space, latent new inlier correspondences tend to cluster around respective positive anchors that summarize features of existing inliers. Motivated by this observation, we propose a novel unsupervised registration method termed INTEGER to incorporate high-level contextual information for reliable pseudo-label mining. Specifically, we propose the Feature-Geometry Coherence Mining module to dynamically adapt the teacher for each mini-batch of data during training and discover reliable pseudo-labels by considering both high-level feature representations and low-level geometric cues. Furthermore, we propose Anchor-Based Contrastive Learning to facilitate contrastive learning with anchors for a robust feature space. Lastly, we introduce a Mixed-Density Student to learn density-invariant features, addressing challenges related to density variation and low overlap in the outdoor scenario. Extensive experiments on KITTI and nuScenes datasets demonstrate that our INTEGER achieves competitive performance in terms of accuracy and generalizability.

* Accepted by NeurIPS2024

Via

Access Paper or Ask Questions

P4GCN: Vertical Federated Social Recommendation with Privacy-Preserving Two-Party Graph Convolution Networks

Oct 16, 2024

Zheng Wang, Wanwan Wang, Yimin Huang, Zhaopeng Peng, Ziqi Yang, Cheng Wang, Xiaoliang Fan

Figure 1 for P4GCN: Vertical Federated Social Recommendation with Privacy-Preserving Two-Party Graph Convolution Networks

Figure 2 for P4GCN: Vertical Federated Social Recommendation with Privacy-Preserving Two-Party Graph Convolution Networks

Figure 3 for P4GCN: Vertical Federated Social Recommendation with Privacy-Preserving Two-Party Graph Convolution Networks

Figure 4 for P4GCN: Vertical Federated Social Recommendation with Privacy-Preserving Two-Party Graph Convolution Networks

Abstract:In recent years, graph neural networks (GNNs) have been commonly utilized for social recommendation systems. However, real-world scenarios often present challenges related to user privacy and business constraints, inhibiting direct access to valuable social information from other platforms. While many existing methods have tackled matrix factorization-based social recommendations without direct social data access, developing GNN-based federated social recommendation models under similar conditions remains largely unexplored. To address this issue, we propose a novel vertical federated social recommendation method leveraging privacy-preserving two-party graph convolution networks (P4GCN) to enhance recommendation accuracy without requiring direct access to sensitive social information. First, we introduce a Sandwich-Encryption module to ensure comprehensive data privacy during the collaborative computing process. Second, we provide a thorough theoretical analysis of the privacy guarantees, considering the participation of both curious and honest parties. Extensive experiments on four real-world datasets demonstrate that P4GCN outperforms state-of-the-art methods in terms of recommendation accuracy. The code is available at https://github.com/WwZzz/P4GCN.

Via

Access Paper or Ask Questions

Federated Graph Learning for Cross-Domain Recommendation

Oct 10, 2024

Ziqi Yang, Zhaopeng Peng, Zihui Wang, Jianzhong Qi, Chaochao Chen, Weike Pan, Chenglu Wen, Cheng Wang, Xiaoliang Fan

Figure 1 for Federated Graph Learning for Cross-Domain Recommendation

Figure 2 for Federated Graph Learning for Cross-Domain Recommendation

Figure 3 for Federated Graph Learning for Cross-Domain Recommendation

Figure 4 for Federated Graph Learning for Cross-Domain Recommendation

Abstract:Cross-domain recommendation (CDR) offers a promising solution to the data sparsity problem by enabling knowledge transfer across source and target domains. However, many recent CDR models overlook crucial issues such as privacy as well as the risk of negative transfer (which negatively impact model performance), especially in multi-domain settings. To address these challenges, we propose FedGCDR, a novel federated graph learning framework that securely and effectively leverages positive knowledge from multiple source domains. First, we design a positive knowledge transfer module that ensures privacy during inter-domain knowledge transmission. This module employs differential privacy-based knowledge extraction combined with a feature mapping mechanism, transforming source domain embeddings from federated graph attention networks into reliable domain knowledge. Second, we design a knowledge activation module to filter out potential harmful or conflicting knowledge from source domains, addressing the issues of negative transfer. This module enhances target domain training by expanding the graph of the target domain to generate reliable domain attentions and fine-tunes the target model for improved negative knowledge filtering and more accurate predictions. We conduct extensive experiments on 16 popular domains of the Amazon dataset, demonstrating that FedGCDR significantly outperforms state-of-the-art methods.

* Accepted by NeurIPS'24

Via

Access Paper or Ask Questions

Calibrating Verbalized Probabilities for Large Language Models

Oct 09, 2024

Cheng Wang, Gyuri Szarvas, Georges Balazs, Pavel Danchenko, Patrick Ernst

Abstract:Calibrating verbalized probabilities presents a novel approach for reliably assessing and leveraging outputs from black-box Large Language Models (LLMs). Recent methods have demonstrated improved calibration by applying techniques like Platt scaling or temperature scaling to the confidence scores generated by LLMs. In this paper, we explore the calibration of verbalized probability distributions for discriminative tasks. First, we investigate the capability of LLMs to generate probability distributions over categorical labels. We theoretically and empirically identify the issue of re-softmax arising from the scaling of verbalized probabilities, and propose using the invert softmax trick to approximate the "logit" by inverting verbalized probabilities. Through extensive evaluation on three public datasets, we demonstrate: (1) the robust capability of LLMs in generating class distributions, and (2) the effectiveness of the invert softmax trick in estimating logits, which, in turn, facilitates post-calibration adjustments.

* 21 pages

Via

Access Paper or Ask Questions

HiSC4D: Human-centered interaction and 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR

Sep 09, 2024

Yudi Dai, Zhiyong Wang, Xiping Lin, Chenglu Wen, Lan Xu, Siqi Shen, Yuexin Ma, Cheng Wang

Figure 1 for HiSC4D: Human-centered interaction and 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR

Figure 2 for HiSC4D: Human-centered interaction and 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR

Figure 3 for HiSC4D: Human-centered interaction and 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR

Figure 4 for HiSC4D: Human-centered interaction and 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR

Abstract:We introduce HiSC4D, a novel Human-centered interaction and 4D Scene Capture method, aimed at accurately and efficiently creating a dynamic digital world, containing large-scale indoor-outdoor scenes, diverse human motions, rich human-human interactions, and human-environment interactions. By utilizing body-mounted IMUs and a head-mounted LiDAR, HiSC4D can capture egocentric human motions in unconstrained space without the need for external devices and pre-built maps. This affords great flexibility and accessibility for human-centered interaction and 4D scene capturing in various environments. Taking into account that IMUs can capture human spatially unrestricted poses but are prone to drifting for long-period using, and while LiDAR is stable for global localization but rough for local positions and orientations, HiSC4D employs a joint optimization method, harmonizing all sensors and utilizing environment cues, yielding promising results for long-term capture in large scenes. To promote research of egocentric human interaction in large scenes and facilitate downstream tasks, we also present a dataset, containing 8 sequences in 4 large scenes (200 to 5,000 $m^2$), providing 36k frames of accurate 4D human motions with SMPL annotations and dynamic scenes, 31k frames of cropped human point clouds, and scene mesh of the environment. A variety of scenarios, such as the basketball gym and commercial street, alongside challenging human motions, such as daily greeting, one-on-one basketball playing, and tour guiding, demonstrate the effectiveness and the generalization ability of HiSC4D. The dataset and code will be publicated on www.lidarhumanmotion.net/hisc4d available for research purposes.

* 17 pages, 10 figures, Jornal

Via

Access Paper or Ask Questions