Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hao Zhao

Generalized Orthogonal Chirp Division Multiplexing in Doubly Selective Channels

Aug 31, 2024

Yun Liu, Hao Zhao, Huazhen Yao, Zeng Hu, Yinming Cui, Dehuan Wan

Figure 1 for Generalized Orthogonal Chirp Division Multiplexing in Doubly Selective Channels

Figure 2 for Generalized Orthogonal Chirp Division Multiplexing in Doubly Selective Channels

Figure 3 for Generalized Orthogonal Chirp Division Multiplexing in Doubly Selective Channels

Figure 4 for Generalized Orthogonal Chirp Division Multiplexing in Doubly Selective Channels

Abstract:In recent years, orthogonal chirp division modulation (OCDM) has gained attention as a robust communication waveform due to its strong resistance to both time-domain and frequency-domain interference. However, similar to orthogonal frequency division multiplexing (OFDM), OCDM suffers from a high peak-to-average power ratio (PAPR), resulting in increased hardware costs and reduced energy efficiency of the transmitter's power amplifiers. In this work, we introduce a novel unitary transform called the Generalized Discrete Fresnel Transform (GDFnT) and propose a new waveform based on this transform, named Generalized Orthogonal Chirp Division Modulation (GOCDM). In GOCDM, data symbols from the constellation diagram are independently placed in the Generalized Fresnel (GF) domain. We derive the GF-domain channel matrix for the GOCDM system under time-frequency doubly selective channels and leverages the sparsity of the GF-domain channel matrix to design an iterative receiver based on the message-passing algorithm. Simulation results demonstrate that GOCDM achieves better PAPR performance than OCDM without compromising bit error rate (BER) performance.

Via

Access Paper or Ask Questions

Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

Aug 27, 2024

Saining Zhang, Baijun Ye, Xiaoxue Chen, Yuantao Chen, Zongzheng Zhang, Cheng Peng, Yongliang Shi, Hao Zhao

Figure 1 for Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

Figure 2 for Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

Figure 3 for Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

Figure 4 for Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

Abstract:Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuitively, the data from the drone's perspective can provide a complementary viewpoint for the data from the ground vehicle's perspective, enhancing the completeness of scene reconstruction and rendering. However, training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and does not demonstrate remarkable improvements in performance on road views. In order to enhance the novel view synthesis of road views and to effectively use the aerial information, we design an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did. We are the first to introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images, weighting the contribution of each pixel to the training process. Additionally, to systematically quantify evaluation metrics, we assemble a high-quality synthesized dataset comprising both aerial and ground images for road scenes.

* BMVC2024 Project Page: https://sainingzhang.github.io/project/uc-gs/ Code: https://github.com/SainingZhang/uc-gs/

Via

Access Paper or Ask Questions

Training-Free Model Merging for Multi-target Domain Adaptation

Jul 18, 2024

Wenyi Li, Huan-ang Gao, Mingju Gao, Beiwen Tian, Rong Zhi, Hao Zhao

Figure 1 for Training-Free Model Merging for Multi-target Domain Adaptation

Figure 2 for Training-Free Model Merging for Multi-target Domain Adaptation

Figure 3 for Training-Free Model Merging for Multi-target Domain Adaptation

Figure 4 for Training-Free Model Merging for Multi-target Domain Adaptation

Abstract:In this paper, we study multi-target domain adaptation of scene understanding models. While previous methods achieved commendable results through inter-domain consistency losses, they often assumed unrealistic simultaneous access to images from all target domains, overlooking constraints such as data transfer bandwidth limitations and data privacy concerns. Given these challenges, we pose the question: How to merge models adapted independently on distinct domains while bypassing the need for direct access to training data? Our solution to this problem involves two components, merging model parameters and merging model buffers (i.e., normalization layer statistics). For merging model parameters, empirical analyses of mode connectivity surprisingly reveal that linear merging suffices when employing the same pretrained backbone weights for adapting separate models. For merging model buffers, we model the real-world distribution with a Gaussian prior and estimate new statistics from the buffers of separately trained models. Our method is simple yet effective, achieving comparable performance with data combination training baselines, while eliminating the need for accessing training data. Project page: https://air-discover.github.io/ModelMerging

* Accepted to ECCV 2024

Via

Access Paper or Ask Questions

FairDiff: Fair Segmentation with Point-Image Diffusion

Jul 08, 2024

Wenyi Li, Haoran Xu, Guiyu Zhang, Huan-ang Gao, Mingju Gao, Mengyu Wang, Hao Zhao

Figure 1 for FairDiff: Fair Segmentation with Point-Image Diffusion

Figure 2 for FairDiff: Fair Segmentation with Point-Image Diffusion

Figure 3 for FairDiff: Fair Segmentation with Point-Image Diffusion

Figure 4 for FairDiff: Fair Segmentation with Point-Image Diffusion

Abstract:Fairness is an important topic for medical image analysis, driven by the challenge of unbalanced training data among diverse target groups and the societal demand for equitable medical quality. In response to this issue, our research adopts a data-driven strategy-enhancing data balance by integrating synthetic images. However, in terms of generating synthetic images, previous works either lack paired labels or fail to precisely control the boundaries of synthetic images to be aligned with those labels. To address this, we formulate the problem in a joint optimization manner, in which three networks are optimized towards the goal of empirical risk minimization and fairness maximization. On the implementation side, our solution features an innovative Point-Image Diffusion architecture, which leverages 3D point clouds for improved control over mask boundaries through a point-mask-image synthesis pipeline. This method outperforms significantly existing techniques in synthesizing scanning laser ophthalmoscopy (SLO) fundus images. By combining synthetic data with real data during the training phase using a proposed Equal Scale approach, our model achieves superior fairness segmentation performance compared to the state-of-the-art fairness learning models. Code is available at https://github.com/wenyi-li/FairDiff.

* Accepted to MICCAI 2024

Via

Access Paper or Ask Questions

Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning

Jul 01, 2024

Haobo Song, Hao Zhao, Soumajit Majumder, Tao Lin

Abstract:Fine-tuning large pre-trained foundation models, such as the 175B GPT-3, has attracted more attention for downstream tasks recently. While parameter-efficient fine-tuning methods have been proposed and proven effective without retraining all model parameters, their performance is limited by the capacity of incremental modules, especially under constrained parameter budgets. \\ To overcome this challenge, we propose CapaBoost, a simple yet effective strategy that enhances model capacity by leveraging low-rank updates through parallel weight modules in target layers. By applying static random masks to the shared weight matrix, CapaBoost constructs a diverse set of weight matrices, effectively increasing the rank of incremental weights without adding parameters. Notably, our approach can be seamlessly integrated into various existing parameter-efficient fine-tuning methods. We extensively validate the efficacy of CapaBoost through experiments on diverse downstream tasks, including natural language understanding, question answering, and image classification. Our results demonstrate significant improvements over baselines, without incurring additional computation or storage costs. Our code is available at \url{https://github.com/LINs-lab/CapaBoost}.

* Accepted at ICLR 2024. Code at https://github.com/LINs-lab/CapaBoost

Via

Access Paper or Ask Questions

Is In-Context Learning Sufficient for Instruction Following in LLMs?

May 30, 2024

Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

Figure 1 for Is In-Context Learning Sufficient for Instruction Following in LLMs?

Figure 2 for Is In-Context Learning Sufficient for Instruction Following in LLMs?

Figure 3 for Is In-Context Learning Sufficient for Instruction Following in LLMs?

Figure 4 for Is In-Context Learning Sufficient for Instruction Following in LLMs?

Abstract:In-context learning (ICL) allows LLMs to learn from examples without changing their weights, which is a particularly promising capability for long-context LLMs that can potentially learn from many examples. Recently, Lin et al. (2024) proposed URIAL, a method using only three in-context examples to align base LLMs, achieving non-trivial instruction following performance. In this work, we show that, while effective, ICL alignment with URIAL still underperforms compared to instruction fine-tuning on established benchmarks such as MT-Bench and AlpacaEval 2.0 (LC), especially with more capable base LMs. Unlike for tasks such as classification, translation, or summarization, adding more ICL demonstrations for long-context LLMs does not systematically improve instruction following performance. To address this limitation, we derive a greedy selection approach for ICL examples that noticeably improves performance, yet without bridging the gap to instruction fine-tuning. Finally, we provide a series of ablation studies to better understand the reasons behind the remaining gap, and we show how some aspects of ICL depart from the existing knowledge and are specific to the instruction tuning setting. Overall, our work advances the understanding of ICL as an alignment technique. We provide our code at https://github.com/tml-epfl/icl-alignment.

* Preprint. Code at https://github.com/tml-epfl/icl-alignment

Via

Access Paper or Ask Questions

Camera Relocalization in Shadow-free Neural Radiance Fields

May 23, 2024

Shiyao Xu, Caiyun Liu, Yuantao Chen, Zhenxin Zhu, Zike Yan, Yongliang Shi, Hao Zhao, Guyue Zhou

Figure 1 for Camera Relocalization in Shadow-free Neural Radiance Fields

Figure 2 for Camera Relocalization in Shadow-free Neural Radiance Fields

Figure 3 for Camera Relocalization in Shadow-free Neural Radiance Fields

Figure 4 for Camera Relocalization in Shadow-free Neural Radiance Fields

Abstract:Camera relocalization is a crucial problem in computer vision and robotics. Recent advancements in neural radiance fields (NeRFs) have shown promise in synthesizing photo-realistic images. Several works have utilized NeRFs for refining camera poses, but they do not account for lighting changes that can affect scene appearance and shadow regions, causing a degraded pose optimization process. In this paper, we propose a two-staged pipeline that normalizes images with varying lighting and shadow conditions to improve camera relocalization. We implement our scene representation upon a hash-encoded NeRF which significantly boosts up the pose optimization process. To account for the noisy image gradient computing problem in grid-based NeRFs, we further propose a re-devised truncated dynamic low-pass filter (TDLF) and a numerical gradient averaging technique to smoothen the process. Experimental results on several datasets with varying lighting conditions demonstrate that our method achieves state-of-the-art results in camera relocalization under varying lighting conditions. Code and data will be made publicly available.

* Accepted by ICRA 2024. 8 pages, 5 figures, 3 tables. Codes and dataset: https://github.com/hnrna/ShadowfreeNeRF-CameraReloc

Via

Access Paper or Ask Questions

Blending Distributed NeRFs with Tri-stage Robust Pose Optimization

May 05, 2024

Baijun Ye, Caiyun Liu, Xiaoyu Ye, Yuantao Chen, Yuhai Wang, Zike Yan, Yongliang Shi, Hao Zhao, Guyue Zhou

Abstract:Due to the limited model capacity, leveraging distributed Neural Radiance Fields (NeRFs) for modeling extensive urban environments has become a necessity. However, current distributed NeRF registration approaches encounter aliasing artifacts, arising from discrepancies in rendering resolutions and suboptimal pose precision. These factors collectively deteriorate the fidelity of pose estimation within NeRF frameworks, resulting in occlusion artifacts during the NeRF blending stage. In this paper, we present a distributed NeRF system with tri-stage pose optimization. In the first stage, precise poses of images are achieved by bundle adjusting Mip-NeRF 360 with a coarse-to-fine strategy. In the second stage, we incorporate the inverting Mip-NeRF 360, coupled with the truncated dynamic low-pass filter, to enable the achievement of robust and precise poses, termed Frame2Model optimization. On top of this, we obtain a coarse transformation between NeRFs in different coordinate systems. In the third stage, we fine-tune the transformation between NeRFs by Model2Model pose optimization. After obtaining precise transformation parameters, we proceed to implement NeRF blending, showcasing superior performance metrics in both real-world and simulation scenarios. Codes and data will be publicly available at https://github.com/boilcy/Distributed-NeRF.

Via

Access Paper or Ask Questions

Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids

May 03, 2024

Junchen Liu, Wenbo Hu, Zhuo Yang, Jianteng Chen, Guoliang Wang, Xiaoxue Chen, Yantong Cai, Huan-ang Gao, Hao Zhao

Abstract:Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively and efficiently characterize anisotropic areas induced by the cone-casting procedure. This paper introduces a Ripmap-Encoded Platonic Solid representation to precisely and efficiently featurize 3D anisotropic areas, achieving high-fidelity anti-aliasing renderings. Central to our approach are two key components: Platonic Solid Projection and Ripmap encoding. The Platonic Solid Projection factorizes the 3D space onto the unparalleled faces of a certain Platonic solid, such that the anisotropic 3D areas can be projected onto planes with distinguishable characterization. Meanwhile, each face of the Platonic solid is encoded by the Ripmap encoding, which is constructed by anisotropically pre-filtering a learnable feature grid, to enable featurzing the projected anisotropic areas both precisely and efficiently by the anisotropic area-sampling. Extensive experiments on both well-established synthetic datasets and a newly captured real-world dataset demonstrate that our Rip-NeRF attains state-of-the-art rendering quality, particularly excelling in the fine details of repetitive structures and textures, while maintaining relatively swift training times.

* SIGGRAPH 2024, Project page: https://junchenliu77.github.io/Rip-NeRF , Code: https://github.com/JunchenLiu77/Rip-NeRF

Via

Access Paper or Ask Questions

Spectrally Pruned Gaussian Fields with Neural Compensation

May 01, 2024

Runyi Yang, Zhenxin Zhu, Zhou Jiang, Baijun Ye, Xiaoxue Chen, Yifei Zhang, Yuantao Chen, Jian Zhao, Hao Zhao

Abstract:Recently, 3D Gaussian Splatting, as a novel 3D representation, has garnered attention for its fast rendering speed and high rendering quality. However, this comes with high memory consumption, e.g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. We credit this high memory footprint to the lack of consideration for the relationship between primitives. In this paper, we propose a memory-efficient Gaussian field named SUNDAE with spectral pruning and neural compensation. On one hand, we construct a graph on the set of Gaussian primitives to model their relationship and design a spectral down-sampling module to prune out primitives while preserving desired signals. On the other hand, to compensate for the quality loss of pruning Gaussians, we exploit a lightweight neural network head to mix splatted features, which effectively compensates for quality losses while capturing the relationship between primitives in its weights. We demonstrate the performance of SUNDAE with extensive results. For example, SUNDAE can achieve 26.80 PSNR at 145 FPS using 104 MB memory while the vanilla Gaussian splatting algorithm achieves 25.60 PSNR at 160 FPS using 523 MB memory, on the Mip-NeRF360 dataset. Codes are publicly available at https://runyiyang.github.io/projects/SUNDAE/.

* Code: https://github.com/RunyiYang/SUNDAE Project page: https://runyiyang.github.io/projects/SUNDAE/

Via

Access Paper or Ask Questions