Abstract:Neural Radiance Fields (NeRF) achieve impressive 3D representation learning and novel view synthesis results with high-quality multi-view images as input. However, motion blur in images often occurs in low-light and high-speed motion scenes, which significantly degrade the reconstruction quality of NeRF. Previous deblurring NeRF methods are struggling to estimate information during the exposure time, unable to accurately model the motion blur. In contrast, the bio-inspired event camera measuring intensity changes with high temporal resolution makes up this information deficiency. In this paper, we propose Event-driven Bundle Adjustment for Deblurring Neural Radiance Fields (EBAD-NeRF) to jointly optimize the learnable poses and NeRF parameters by leveraging the hybrid event-RGB data. An intensity-change-metric event loss and a photo-metric blur loss are introduced to strengthen the explicit modeling of camera motion blur. Experiment results on both synthetic data and real captured data demonstrate that EBAD-NeRF can obtain accurate camera poses during the exposure time and learn sharper 3D representations compared to prior works.
Abstract:In the mobile internet era, the Online Food Ordering Service (OFOS) emerges as an integral component of inclusive finance owing to the convenience it brings to people. OFOS platforms offer dynamic allocation incentives to users and merchants through diverse marketing campaigns to encourage payments while maintaining the platforms' budget efficiency. Despite significant progress, the marketing domain continues to face two primary challenges: (i) how to allocate a limited budget with greater efficiency, demanding precision in predicting users' monotonic response (i.e. sensitivity) to incentives, and (ii) ensuring spatio-temporal adaptability and robustness in diverse marketing campaigns across different times and locations. To address these issues, we propose a Constrained Monotonic Adaptive Network (CoMAN) method for spatio-temporal perception within marketing pricing. Specifically, we capture spatio-temporal preferences within attribute features through two foundational spatio-temporal perception modules. To further enhance catching the user sensitivity differentials to incentives across varied times and locations, we design modules for learning spatio-temporal convexity and concavity as well as for expressing sensitivity functions. CoMAN can achieve a more efficient allocation of incentive investments during pricing, thus increasing the conversion rate and orders while maintaining budget efficiency. Extensive offline and online experimental results within our diverse marketing campaigns demonstrate the effectiveness of the proposed approach while outperforming the monotonic state-of-the-art method.
Abstract:Since the secrecy rate (SR) performance improvement obtained by secure directional modulation (DM) network is limited, an active intelligent reflective surface (IRS)-assisted DM network is considered to attain a high SR. To address the SR maximization problem, a novel method based on Lagrangian dual transform and closed-form fractional programming algorithm (LDT-CFFP) is proposed, where the solutions to base station (BS) beamforming vectors and IRS reflection coefficient matrix are achieved. However, the computational complexity of LDT-CFFP method is high . To reduce its complexity, a blocked IRS-assisted DM network is designed. To meet the requirements of the network performance, a power allocation (PA) strategy is proposed and adopted in the system. Specifically, the system power between BS and IRS, as well as the transmission power for confidential messages (CM) and artificial noise (AN) from the BS, are allocated separately. Then we put forward null-space projection (NSP) method, maximum-ratio-reflecting (MRR) algorithm and PA strategy (NSP-MRR-PA) to solve the SR maximization problem. The CF solutions to BS beamforming vectors and IRS reflection coefficient matrix are respectively attained via NSP and MRR algorithms. For the PA factors, we take advantage of exhaustive search (ES) algorithm, particle swarm optimization (PSO) and simulated annealing (SA) algorithm to search for the solutions. From simulation results, it is verified that the LDT-CFFP method derives a higher SR gain over NSP-MRR-PA method. For NSP-MRR-PA method, the number of IRS units in each block possesses a significant SR performance. In addition, the application PA strategies, namely ES, PSO, SA methods outperforms the other PA strategies with fixed PA factors.
Abstract:With the benefit of deep learning techniques, recent researches have made significant progress in image compression artifacts reduction. Despite their improved performances, prevailing methods only focus on learning a mapping from the compressed image to the original one but ignore the intrinsic attributes of the given compressed images, which greatly harms the performance of downstream parsing tasks. Different from these methods, we propose to decouple the intrinsic attributes into two complementary features for artifacts reduction,ie, the compression-insensitive features to regularize the high-level semantic representations during training and the compression-sensitive features to be aware of the compression degree. To achieve this, we first employ adversarial training to regularize the compressed and original encoded features for retaining high-level semantics, and we then develop the compression quality-aware feature encoder for compression-sensitive features. Based on these dual complementary features, we propose a Dual Awareness Guidance Network (DAGN) to utilize these awareness features as transformation guidance during the decoding phase. In our proposed DAGN, we develop a cross-feature fusion module to maintain the consistency of compression-insensitive features by fusing compression-insensitive features into the artifacts reduction baseline. Our method achieves an average 2.06 dB PSNR gains on BSD500, outperforming state-of-the-art methods, and only requires 29.7 ms to process one image on BSD500. Besides, the experimental results on LIVE1 and LIU4K also demonstrate the efficiency, effectiveness, and superiority of the proposed method in terms of quantitative metrics, visual quality, and downstream machine vision tasks.
Abstract:As one of the emerging challenges in Automated Machine Learning, the Hardware-aware Neural Architecture Search (HW-NAS) tasks can be treated as black-box multi-objective optimization problems (MOPs). An important application of HW-NAS is real-time semantic segmentation, which plays a pivotal role in autonomous driving scenarios. The HW-NAS for real-time semantic segmentation inherently needs to balance multiple optimization objectives, including model accuracy, inference speed, and hardware-specific considerations. Despite its importance, benchmarks have yet to be developed to frame such a challenging task as multi-objective optimization. To bridge the gap, we introduce a tailored streamline to transform the task of HW-NAS for real-time semantic segmentation into standard MOPs. Building upon the streamline, we present a benchmark test suite, CitySeg/MOP, comprising fifteen MOPs derived from the Cityscapes dataset. The CitySeg/MOP test suite is integrated into the EvoXBench platform to provide seamless interfaces with various programming languages (e.g., Python and MATLAB) for instant fitness evaluations. We comprehensively assessed the CitySeg/MOP test suite on various multi-objective evolutionary algorithms, showcasing its versatility and practicality. Source codes are available at https://github.com/EMI-Group/evoxbench.
Abstract:Spike cameras, leveraging spike-based integration sampling and high temporal resolution, offer distinct advantages over standard cameras. However, existing approaches reliant on spike cameras often assume optimal illumination, a condition frequently unmet in real-world scenarios. To address this, we introduce SpikeNeRF, the first work that derives a NeRF-based volumetric scene representation from spike camera data. Our approach leverages NeRF's multi-view consistency to establish robust self-supervision, effectively eliminating erroneous measurements and uncovering coherent structures within exceedingly noisy input amidst diverse real-world illumination scenarios. The framework comprises two core elements: a spike generation model incorporating an integrate-and-fire neuron layer and parameters accounting for non-idealities, such as threshold variation, and a spike rendering loss capable of generalizing across varying illumination conditions. We describe how to effectively optimize neural radiance fields to render photorealistic novel views from the novel continuous spike stream, demonstrating advantages over other vision sensors in certain scenes. Empirical evaluations conducted on both real and novel realistically simulated sequences affirm the efficacy of our methodology. The dataset and source code are released at https://github.com/BIT-Vision/SpikeNeRF.
Abstract:In this paper, a hybrid IRS-aided amplify-and-forward (AF) relay wireless network is put forward, where the hybrid IRS is made up of passive and active elements. For maximum signal-to-noise ratio (SNR), a low-complexity method based on successive convex approximation and fractional programming (LC-SCA-FP) is proposed to jointly optimize the beamforming matrix at AF relay and the reflecting coefficient matrices at IRS. Simulation results verify that the rate achieved by the proposed LC-SCA-FP method surpass those of the benchmark schemes, namely the passive IRS-aided AF relay and only AF relay network.
Abstract:Due to its intrinsic ability to combat the double fading effect, the active intelligent reflective surface (IRS) becomes popular. The main feature of active IRS must be supplied by power, and the problem of how to allocate the total power between base station (BS) and IRS to fully explore the rate gain achieved by power allocation (PA) to remove the rate gap between existing PA strategies and optimal exhaustive search (ES) arises naturally. First, the signal-to-noise ratio (SNR) expression is derived to be a function of PA factor beta [0, 1]. Then, to improve the rate performance of the conventional gradient ascent (GA), an equal-spacing-multiple-point-initialization GA (ESMPI-GA) method is proposed. Due to its slow linear convergence from iterative GA, the proposed ESMPI-GA is high-complexity. Eventually, to reduce this high complexity, a low-complexity closed-form PA method with third-order Taylor expansion (TTE) centered at point beta0 = 0.5 is proposed. Simulation results show that the proposed ESMPI-GA harvests about 0.5 bit gain over conventional GA and 1.2 and 0.8 bits gain over existing methods like equal PA and Taylor polynomial approximation (TPA) for small-scale IRS, and the proposed TTE performs much better than TPA and fixed PA strategies using an extremely low complexity.
Abstract:Learning multi-label image recognition with incomplete annotation is gaining popularity due to its superior performance and significant labor savings when compared to training with fully labeled datasets. Existing literature mainly focuses on label completion and co-occurrence learning while facing difficulties with the most common single-positive label manner. To tackle this problem, we present a semantic contrastive bootstrapping (Scob) approach to gradually recover the cross-object relationships by introducing class activation as semantic guidance. With this learning guidance, we then propose a recurrent semantic masked transformer to extract iconic object-level representations and delve into the contrastive learning problems on multi-label classification tasks. We further propose a bootstrapping framework in an Expectation-Maximization fashion that iteratively optimizes the network parameters and refines semantic guidance to alleviate possible disturbance caused by wrong semantic guidance. Extensive experimental results demonstrate that the proposed joint learning framework surpasses the state-of-the-art models by a large margin on four public multi-label image recognition benchmarks. Codes can be found at https://github.com/iCVTEAM/Scob.
Abstract:Few-shot learning aims to recognize novel queries with limited support samples by learning from base knowledge. Recent progress in this setting assumes that the base knowledge and novel query samples are distributed in the same domains, which are usually infeasible for realistic applications. Toward this issue, we propose to address the cross-domain few-shot learning problem where only extremely few samples are available in target domains. Under this realistic setting, we focus on the fast adaptation capability of meta-learners by proposing an effective dual adaptive representation alignment approach. In our approach, a prototypical feature alignment is first proposed to recalibrate support instances as prototypes and reproject these prototypes with a differentiable closed-form solution. Therefore feature spaces of learned knowledge can be adaptively transformed to query spaces by the cross-instance and cross-prototype relations. Besides the feature alignment, we further present a normalized distribution alignment module, which exploits prior statistics of query samples for solving the covariant shifts among the support and query samples. With these two modules, a progressive meta-learning framework is constructed to perform the fast adaptation with extremely few-shot samples while maintaining its generalization capabilities. Experimental evidence demonstrates our approach achieves new state-of-the-art results on 4 CDFSL benchmarks and 4 fine-grained cross-domain benchmarks.