Abstract:The fields of imaging in the nighttime dynamic and other extremely dark conditions have seen impressive and transformative advancements in recent years, partly driven by the rise of novel sensing approaches, e.g., near-infrared (NIR) cameras with high sensitivity and event cameras with minimal blur. However, inappropriate exposure ratios of near-infrared cameras make them susceptible to distortion and blur. Event cameras are also highly sensitive to weak signals at night yet prone to interference, often generating substantial noise and significantly degrading observations and analysis. Herein, we develop a new framework for low-light imaging combined with NIR imaging and event-based techniques, named synergistic neuromorphic imaging, which can jointly achieve NIR image deblurring and event denoising. Harnessing cross-modal features of NIR images and visible events via spectral consistency and higher-order interaction, the NIR images and events are simultaneously fused, enhanced, and bootstrapped. Experiments on real and realistically simulated sequences demonstrate the effectiveness of our method and indicate better accuracy and robustness than other methods in practical scenarios. This study gives impetus to enhance both NIR images and events, which paves the way for high-fidelity low-light imaging and neuromorphic reasoning.
Abstract:Lensless imaging offers a lightweight, compact alternative to traditional lens-based systems, ideal for exploration in space-constrained environments. However, the absence of a focusing lens and limited lighting in such environments often result in low-light conditions, where the measurements suffer from complex noise interference due to insufficient capture of photons. This study presents a robust reconstruction method for high-quality imaging in low-light scenarios, employing two complementary perspectives: model-driven and data-driven. First, we apply a physic-model-driven perspective to reconstruct in the range space of the pseudo-inverse of the measurement model as a first guidance to extract information in the noisy measurements. Then, we integrate a generative-model based perspective to suppress residual noises as the second guidance to suppress noises in the initial noisy results. Specifically, a learnable Wiener filter-based module generates an initial noisy reconstruction. Then, for fast and, more importantly, stable generation of the clear image from the noisy version, we implement a modified conditional generative diffusion module. This module converts the raw image into the latent wavelet domain for efficiency and uses modified bidirectional training processes for stabilization. Simulations and real-world experiments demonstrate substantial improvements in overall visual quality, advancing lensless imaging in challenging low-light environments.
Abstract:Recent advancements in 3D Gaussian Splatting (3D-GS) have demonstrated the potential of using 3D Gaussian primitives for high-speed, high-fidelity, and cost-efficient novel view synthesis from continuously calibrated input views. However, conventional methods require high-frame-rate dense and high-quality sharp images, which are time-consuming and inefficient to capture, especially in dynamic environments. Event cameras, with their high temporal resolution and ability to capture asynchronous brightness changes, offer a promising alternative for more reliable scene reconstruction without motion blur. In this paper, we propose SweepEvGS, a novel hardware-integrated method that leverages event cameras for robust and accurate novel view synthesis across various imaging settings from a single sweep. SweepEvGS utilizes the initial static frame with dense event streams captured during a single camera sweep to effectively reconstruct detailed scene views. We also introduce different real-world hardware imaging systems for real-world data collection and evaluation for future research. We validate the robustness and efficiency of SweepEvGS through experiments in three different imaging settings: synthetic objects, real-world macro-level, and real-world micro-level view synthesis. Our results demonstrate that SweepEvGS surpasses existing methods in visual rendering quality, rendering speed, and computational efficiency, highlighting its potential for dynamic practical applications.
Abstract:3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role in applications such as robotics, 3D generation, and 3D editing. Recent methods harness the powerful Vision Language Models (VLMs) for 2D-to-3D knowledge distillation, achieving zero-shot 3D part segmentation. However, these methods are limited by their reliance on text prompts, which restricts the scalability to large-scale unlabeled datasets and the flexibility in handling part ambiguities. In this work, we introduce SAMPart3D, a scalable zero-shot 3D part segmentation framework that segments any 3D object into semantic parts at multiple granularities, without requiring predefined part label sets as text prompts. For scalability, we use text-agnostic vision foundation models to distill a 3D feature extraction backbone, allowing scaling to large unlabeled 3D datasets to learn rich 3D priors. For flexibility, we distill scale-conditioned part-aware 3D features for 3D part segmentation at multiple granularities. Once the segmented parts are obtained from the scale-conditioned part-aware 3D features, we use VLMs to assign semantic labels to each part based on the multi-view renderings. Compared to previous methods, our SAMPart3D can scale to the recent large-scale 3D object dataset Objaverse and handle complex, non-ordinary objects. Additionally, we contribute a new 3D part segmentation benchmark to address the lack of diversity and complexity of objects and parts in existing benchmarks. Experiments show that our SAMPart3D significantly outperforms existing zero-shot 3D part segmentation methods, and can facilitate various applications such as part-level editing and interactive segmentation.
Abstract:Computational neuromorphic imaging (CNI) with event cameras offers advantages such as minimal motion blur and enhanced dynamic range, compared to conventional frame-based methods. Existing event-based radiance field rendering methods are built on neural radiance field, which is computationally heavy and slow in reconstruction speed. Motivated by the two aspects, we introduce Ev-GS, the first CNI-informed scheme to infer 3D Gaussian splatting from a monocular event camera, enabling efficient novel view synthesis. Leveraging 3D Gaussians with pure event-based supervision, Ev-GS overcomes challenges such as the detection of fast-moving objects and insufficient lighting. Experimental results show that Ev-GS outperforms the method that takes frame-based signals as input by rendering realistic views with reduced blurring and improved visual quality. Moreover, it demonstrates competitive reconstruction quality and reduced computing occupancy compared to existing methods, which paves the way to a highly efficient CNI approach for signal processing.
Abstract:Neuromorphic imaging is a bio-inspired technique that imitates the human retina to sense variations in a dynamic scene. It responds to pixel-level brightness changes by asynchronous streaming events and boasts microsecond temporal precision over a high dynamic range, yielding blur-free recordings under extreme illumination. Nevertheless, such a modality falls short in spatial resolution and leads to a low level of visual richness and clarity. Pursuing hardware upgrades is expensive and might cause compromised performance due to more burdens on computational requirements. Another option is to harness offline, plug-in-play neuromorphic super-resolution solutions. However, existing ones, which demand substantial sample volumes for lengthy training on massive computing resources, are largely restricted by real data availability owing to the current imperfect high-resolution devices, as well as the randomness and variability of motion. To tackle these challenges, we introduce the first self-supervised neuromorphic super-resolution prototype. It can be self-adaptive to per input source from any low-resolution camera to estimate an optimal, high-resolution counterpart of any scale, without the need of side knowledge and prior training. Evaluated on downstream event-driven tasks, such a simple yet effective method can obtain competitive results against the state-of-the-arts, significantly promoting flexibility but not sacrificing accuracy. It also delivers enhancements for inferior natural images and optical micrographs acquired under non-ideal imaging conditions, breaking through the limitations that are challenging to overcome with traditional frame techniques. In the current landscape where the use of high-resolution cameras for event-based sensing remains an open debate, our solution serves as a cost-efficient and practical alternative, paving the way for more intelligent imaging systems.
Abstract:Phase recovery, calculating the phase of a light wave from its intensity measurements, is essential for various applications, such as coherent diffraction imaging, adaptive optics, and biomedical imaging. It enables the reconstruction of an object's refractive index distribution or topography as well as the correction of imaging system aberrations. In recent years, deep learning has been proven to be highly effective in addressing phase recovery problems. Two main deep learning phase recovery strategies are data-driven (DD) with supervised learning mode and physics-driven (PD) with self-supervised learning mode. DD and PD achieve the same goal in different ways and lack the necessary study to reveal similarities and differences. Therefore, in this paper, we comprehensively compare these two deep learning phase recovery strategies in terms of time consumption, accuracy, generalization ability, ill-posedness adaptability, and prior capacity. What's more, we propose a co-driven (CD) strategy of combining datasets and physics for the balance of high- and low-frequency information. The codes for DD, PD, and CD are publicly available at https://github.com/kqwang/DLPR.
Abstract:The ability of snapshot compressive imaging (SCI) systems to efficiently capture high-dimensional (HD) data depends on the advent of novel optical designs to sample the HD data as two-dimensional (2D) compressed measurements. Nonetheless, the traditional SCI scheme is fundamentally limited, due to the complete disregard for high-level information in the sampling process. To tackle this issue, in this paper, we pave the first mile toward the advanced design of adaptive coding masks for SCI. Specifically, we propose an efficient and effective algorithm to generate coding masks with the assistance of saliency detection, in a low-cost and low-power fashion. Experiments demonstrate the effectiveness and efficiency of our approach. Code is available at: https://github.com/IndigoPurple/SASA
Abstract:Neuromorphic imaging reacts to per-pixel brightness changes of a dynamic scene with high temporal precision and responds with asynchronous streaming events as a result. It also often supports a simultaneous output of an intensity image. Nevertheless, the raw events typically involve a great amount of noise due to the high sensitivity of the sensor, while capturing fast-moving objects at low frame rates results in blurry images. These deficiencies significantly degrade human observation and machine processing. Fortunately, the two information sources are inherently complementary -- events with microsecond temporal resolution, which are triggered by the edges of objects that are recorded in latent sharp images, can supply rich motion details missing from the blurry images. In this work, we bring the two types of data together and propose a simple yet effective unifying algorithm to jointly reconstruct blur-free images and noise-robust events, where an event-regularized prior offers auxiliary motion features for blind deblurring, and image gradients serve as a reference to regulate neuromorphic noise removal. Extensive evaluations on real and synthetic samples present our superiority over other competing methods in restoration quality and greater robustness to some challenging realistic scenarios. Our solution gives impetus to the improvement of both sensing data and paves the way for highly accurate neuromorphic reasoning and analysis.
Abstract:Bio-inspired neuromorphic cameras asynchronously record pixel brightness changes and generate sparse event streams. They can capture dynamic scenes with little motion blur and more details in extreme illumination conditions. Due to the multidimensional address-event structure, most existing vision algorithms cannot properly handle asynchronous event streams. While several event representations and processing methods have been developed to address such an issue, they are typically driven by a large number of events, leading to substantial overheads in runtime and memory. In this paper, we propose a new graph representation of the event data and couple it with a Graph Transformer to perform accurate neuromorphic classification. Extensive experiments show that our approach leads to better results and excels at the challenging realistic situations where only a small number of events and limited computational resources are available, paving the way for neuromorphic applications embedded into mobile facilities.