Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kaiwei Wang

EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data

Oct 29, 2024

Zhonghua Yi, Hao Shi, Qi Jiang, Kailun Yang, Ze Wang, Diyang Gu, Yufan Zhang, Kaiwei Wang

Figure 1 for EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data

Figure 2 for EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data

Figure 3 for EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data

Figure 4 for EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data

Abstract:Event cameras, with high temporal resolution and high dynamic range, have limited research on the inter-modality local feature extraction and matching of event-image data. We propose EI-Nexus, an unmediated and flexible framework that integrates two modality-specific keypoint extractors and a feature matcher. To achieve keypoint extraction across viewpoint and modality changes, we bring Local Feature Distillation (LFD), which transfers the viewpoint consistency from a well-learned image extractor to the event extractor, ensuring robust feature correspondence. Furthermore, with the help of Context Aggregation (CA), a remarkable enhancement is observed in feature matching. We further establish the first two inter-modality feature matching benchmarks, MVSEC-RPE and EC-RPE, to assess relative pose estimation on event-image data. Our approach outperforms traditional methods that rely on explicit modal transformation, offering more unmediated and adaptable feature extraction and matching, achieving better keypoint similarity and state-of-the-art results on the MVSEC-RPE and EC-RPE benchmarks. The source code and benchmarks will be made publicly available at https://github.com/ZhonghuaYi/EI-Nexus_official.

* Accepted to WACV 2025. The source code and benchmarks will be made publicly available at https://github.com/ZhonghuaYi/EI-Nexus_official

Via

Access Paper or Ask Questions

E-3DGS: Gaussian Splatting with Exposure and Motion Events

Oct 22, 2024

Xiaoting Yin, Hao Shi, Yuhan Bao, Zhenshan Bing, Yiyi Liao, Kailun Yang, Kaiwei Wang

Figure 1 for E-3DGS: Gaussian Splatting with Exposure and Motion Events

Figure 2 for E-3DGS: Gaussian Splatting with Exposure and Motion Events

Figure 3 for E-3DGS: Gaussian Splatting with Exposure and Motion Events

Figure 4 for E-3DGS: Gaussian Splatting with Exposure and Motion Events

Abstract:Estimating Neural Radiance Fields (NeRFs) from images captured under optimal conditions has been extensively explored in the vision community. However, robotic applications often face challenges such as motion blur, insufficient illumination, and high computational overhead, which adversely affect downstream tasks like navigation, inspection, and scene visualization. To address these challenges, we propose E-3DGS, a novel event-based approach that partitions events into motion (from camera or object movement) and exposure (from camera exposure), using the former to handle fast-motion scenes and using the latter to reconstruct grayscale images for high-quality training and optimization of event-based 3D Gaussian Splatting (3DGS). We introduce a novel integration of 3DGS with exposure events for high-quality reconstruction of explicit scene representations. Our versatile framework can operate on motion events alone for 3D reconstruction, enhance quality using exposure events, or adopt a hybrid mode that balances quality and effectiveness by optimizing with initial exposure events followed by high-speed motion events. We also introduce EME-3D, a real-world 3D dataset with exposure events, motion events, camera calibration parameters, and sparse point clouds. Our method is faster and delivers better reconstruction quality than event-based NeRF while being more cost-effective than NeRF methods that combine event and RGB data by using a single event sensor. By combining motion and exposure events, E-3DGS sets a new benchmark for event-based 3D reconstruction with robust performance in challenging conditions and lower hardware demands. The source code and dataset will be available at https://github.com/MasterHow/E-3DGS.

* The source code and dataset will be available at https://github.com/MasterHow/E-3DGS

Via

Access Paper or Ask Questions

P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty

Sep 16, 2024

Yufan Zhang, Kailun Yang, Ze Wang, Kaiwei Wang

Figure 1 for P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty

Figure 2 for P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty

Figure 3 for P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty

Figure 4 for P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty

Abstract:This paper presents P2U-SLAM, a visual Simultaneous Localization And Mapping (SLAM) system with a wide Field of View (FoV) camera, which utilizes pose uncertainty and point uncertainty. While the wide FoV enables considerable repetitive observations of historical map points for matching cross-view features, the data properties of the historical map points and the poses of historical keyframes have changed during the optimization process. The neglect of data property changes triggers the absence of a partial information matrix in optimization and leads to the risk of long-term positioning performance degradation. The purpose of our research is to reduce the risk of the wide field of view visual input to the SLAM system. Based on the conditional probability model, this work reveals the definite impact of the above data properties changes on the optimization process, concretizes it as point uncertainty and pose uncertainty, and gives a specific mathematical form. P2U-SLAM respectively embeds point uncertainty and pose uncertainty into the tracking module and local mapping, and updates these uncertainties after each optimization operation including local mapping, map merging, and loop closing. We present an exhaustive evaluation in 27 sequences from two popular public datasets with wide-FoV visual input. P2U-SLAM shows excellent performance compared with other state-of-the-art methods. The source code will be made publicly available at https://github.com/BambValley/P2U-SLAM.

* The source code will be made publicly available at https://github.com/BambValley/P2U-SLAM

Via

Access Paper or Ask Questions

Towards Single-Lens Controllable Depth-of-Field Imaging via All-in-Focus Aberration Correction and Monocular Depth Estimation

Sep 15, 2024

Xiaolong Qian, Qi Jiang, Yao Gao, Shaohua Gao, Zhonghua Yi, Lei Sun, Kai Wei, Haifeng Li, Kailun Yang, Kaiwei Wang(+1 more)

Figure 1 for Towards Single-Lens Controllable Depth-of-Field Imaging via All-in-Focus Aberration Correction and Monocular Depth Estimation

Figure 2 for Towards Single-Lens Controllable Depth-of-Field Imaging via All-in-Focus Aberration Correction and Monocular Depth Estimation

Figure 3 for Towards Single-Lens Controllable Depth-of-Field Imaging via All-in-Focus Aberration Correction and Monocular Depth Estimation

Figure 4 for Towards Single-Lens Controllable Depth-of-Field Imaging via All-in-Focus Aberration Correction and Monocular Depth Estimation

Abstract:Controllable Depth-of-Field (DoF) imaging commonly produces amazing visual effects based on heavy and expensive high-end lenses. However, confronted with the increasing demand for mobile scenarios, it is desirable to achieve a lightweight solution with Minimalist Optical Systems (MOS). This work centers around two major limitations of MOS, i.e., the severe optical aberrations and uncontrollable DoF, for achieving single-lens controllable DoF imaging via computational methods. A Depth-aware Controllable DoF Imaging (DCDI) framework is proposed equipped with All-in-Focus (AiF) aberration correction and monocular depth estimation, where the recovered image and corresponding depth map are utilized to produce imaging results under diverse DoFs of any high-end lens via patch-wise convolution. To address the depth-varying optical degradation, we introduce a Depth-aware Degradation-adaptive Training (DA2T) scheme. At the dataset level, a Depth-aware Aberration MOS (DAMOS) dataset is established based on the simulation of Point Spread Functions (PSFs) under different object distances. Additionally, we design two plug-and-play depth-aware mechanisms to embed depth information into the aberration image recovery for better tackling depth-aware degradation. Furthermore, we propose a storage-efficient Omni-Lens-Field model to represent the 4D PSF library of various lenses. With the predicted depth map, recovered image, and depth-aware PSF map inferred by Omni-Lens-Field, single-lens controllable DoF imaging is achieved. Comprehensive experimental results demonstrate that the proposed framework enhances the recovery performance, and attains impressive single-lens controllable DoF imaging results, providing a seminal baseline for this field. The source code and the established dataset will be publicly available at https://github.com/XiaolongQian/DCDI.

* The source code and the established dataset will be publicly available at https://github.com/XiaolongQian/DCDI

Via

Access Paper or Ask Questions

A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation

Sep 09, 2024

Qi Jiang, Yao Gao, Shaohua Gao, Zhonghua Yi, Lei Sun, Hao Shi, Kailun Yang, Kaiwei Wang, Jian Bai

Figure 1 for A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation

Figure 2 for A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation

Figure 3 for A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation

Figure 4 for A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation

Abstract:Emerging universal Computational Aberration Correction (CAC) paradigms provide an inspiring solution to light-weight and high-quality imaging without repeated data preparation and model training to accommodate new lens designs. However, the training databases in these approaches, i.e., the lens libraries (LensLibs), suffer from their limited coverage of real-world aberration behaviors. In this work, we set up an OmniLens framework for universal CAC, considering both the generalization ability and flexibility. OmniLens extends the idea of universal CAC to a broader concept, where a base model is trained for three cases, including zero-shot CAC with the pre-trained model, few-shot CAC with a little lens-specific data for fine-tuning, and domain adaptive CAC using domain adaptation for lens-descriptions-unknown lens. In terms of OmniLens's data foundation, we first propose an Evolution-based Automatic Optical Design (EAOD) pipeline to construct LensLib automatically, coined AODLib, whose diversity is enriched by an evolution framework, with comprehensive constraints and a hybrid optimization strategy for achieving realistic aberration behaviors. For network design, we introduce the guidance of high-quality codebook priors to facilitate zero-shot CAC and few-shot CAC, which enhances the model's generalization ability, while also boosting its convergence in a few-shot case. Furthermore, based on the statistical observation of dark channel priors in optical degradation, we design an unsupervised regularization term to adapt the base model to the target descriptions-unknown lens using its aberration images without ground truth. We validate OmniLens on 4 manually designed low-end lenses with various structures and aberration behaviors. Remarkably, the base model trained on AODLib exhibits strong generalization capabilities, achieving 97% of the lens-specific performance in a zero-shot setting.

Via

Access Paper or Ask Questions

SF-TIM: A Simple Framework for Enhancing Quadrupedal Robot Jumping Agility by Combining Terrain Imagination and Measurement

Aug 01, 2024

Ze Wang, Yang Li, Long Xu, Hao Shi, Zunwang Ma, Zhen Chu, Chao Li, Fei Gao, Kailun Yang, Kaiwei Wang

Figure 1 for SF-TIM: A Simple Framework for Enhancing Quadrupedal Robot Jumping Agility by Combining Terrain Imagination and Measurement

Figure 2 for SF-TIM: A Simple Framework for Enhancing Quadrupedal Robot Jumping Agility by Combining Terrain Imagination and Measurement

Figure 3 for SF-TIM: A Simple Framework for Enhancing Quadrupedal Robot Jumping Agility by Combining Terrain Imagination and Measurement

Figure 4 for SF-TIM: A Simple Framework for Enhancing Quadrupedal Robot Jumping Agility by Combining Terrain Imagination and Measurement

Abstract:Dynamic jumping on high platforms and over gaps differentiates legged robots from wheeled counterparts. Compared to walking on rough terrains, dynamic locomotion on abrupt surfaces requires fusing proprioceptive and exteroceptive perception for explosive movements. In this paper, we propose SF-TIM (Simple Framework combining Terrain Imagination and Measurement), a single-policy method that enhances quadrupedal robot jumping agility, while preserving their fundamental blind walking capabilities. In addition, we introduce a terrain-guided reward design specifically to assist quadrupedal robots in high jumping, improving their performance in this task. To narrow the simulation-to-reality gap in quadrupedal robot learning, we introduce a stable and high-speed elevation map generation framework, enabling zero-shot simulation-to-reality transfer of locomotion ability. Our algorithm has been deployed and validated on both the small-/large-size quadrupedal robots, demonstrating its effectiveness in real-world applications: the robot has successfully traversed various high platforms and gaps, showing the robustness of our proposed approach. A demo video has been made available at https://flysoaryun.github.io/SF-TIM.

* A demo video has been made available at https://flysoaryun.github.io/SF-TIM

Via

Access Paper or Ask Questions

Microsaccade-inspired Event Camera for Robotics

May 28, 2024

Botao He, Ze Wang, Yuan Zhou, Jingxi Chen, Chahat Deep Singh, Haojia Li, Yuman Gao, Shaojie Shen, Kaiwei Wang, Yanjun Cao(+4 more)

Abstract:Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras' output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore challenging to solve algorithmically. Human vision deals with perceptual fading using the active mechanism of small involuntary eye movements, the most prominent ones called microsaccades. By moving the eyes constantly and slightly during fixation, microsaccades can substantially maintain texture stability and persistence. Inspired by microsaccades, we designed an event-based perception system capable of simultaneously maintaining low reaction time and stable texture. In this design, a rotating wedge prism was mounted in front of the aperture of an event camera to redirect light and trigger events. The geometrical optics of the rotating wedge prism allows for algorithmic compensation of the additional rotational motion, resulting in a stable texture appearance and high informational output independent of external motion. The hardware device and software solution are integrated into a system, which we call Artificial MIcrosaccade-enhanced EVent camera (AMI-EV). Benchmark comparisons validate the superior data quality of AMI-EV recordings in scenarios where both standard cameras and event cameras fail to deliver. Various real-world experiments demonstrate the potential of the system to facilitate robotics perception both for low-level and high-level vision tasks.

* Published on Science Robotics June 2024 issue

Via

Access Paper or Ask Questions

Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

May 05, 2024

Shaohua Gao, Qi Jiang, Yiqi Liao, Yi Qiu, Wanglei Ying, Kailun Yang, Kaiwei Wang, Benhao Zhang, Jian Bai

Figure 1 for Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

Figure 2 for Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

Figure 3 for Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

Figure 4 for Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

Abstract:We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360{\deg}x(35{\deg}~110{\deg}) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 lenses. Moreover, we establish a physical structure model of PAL using the ray tracing method and study the influence of its physical parameters on compactness ratio. In addition, for the evaluation of local tolerances of annular surfaces, we propose a tolerance analysis method suitable for ASPAL. This analytical method can effectively analyze surface irregularities on annular surfaces and provide clear guidance on manufacturing tolerances for ASPAL. Benefiting from high-precision glass molding and injection molding aspheric lens manufacturing techniques, we finally manufactured 20 ASPALs in small batches. The weight of an ASPAL prototype is only 8.5 g. Our framework provides promising insights for the application of panoramic systems in space and weight-constrained environmental sensing scenarios such as intelligent security, micro-UAVs, and micro-robots.

* Accepted to Optics & Laser Technology

Via

Access Paper or Ask Questions

Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

Apr 30, 2024

Yao Gao, Qi Jiang, Shaohua Gao, Lei Sun, Kailun Yang, Kaiwei Wang

Figure 1 for Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

Figure 2 for Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

Figure 3 for Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

Figure 4 for Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

Abstract:The popularity of mobile vision creates a demand for advanced compact computational imaging systems, which call for the development of both a lightweight optical system and an effective image reconstruction model. Recently, joint design pipelines come to the research forefront, where the two significant components are simultaneously optimized via data-driven learning to realize the optimal system design. However, the effectiveness of these designs largely depends on the initial setup of the optical system, complicated by a non-convex solution space that impedes reaching a globally optimal solution. In this work, we present Global Search Optics (GSO) to automatically design compact computational imaging systems through two parts: (i) Fused Optimization Method for Automatic Optical Design (OptiFusion), which searches for diverse initial optical systems under certain design specifications; and (ii) Efficient Physic-aware Joint Optimization (EPJO), which conducts parallel joint optimization of initial optical systems and image reconstruction networks with the consideration of physical constraints, culminating in the selection of the optimal solution. Extensive experimental results on the design of three-piece (3P) sphere computational imaging systems illustrate that the GSO serves as a transformative end-to-end lens design paradigm for superior global optimal structure searching ability, which provides compact computational imaging systems with higher imaging quality compared to traditional methods. The source code will be made publicly available at https://github.com/wumengshenyou/GSO.

* The source code will be made publicly available at https://github.com/wumengshenyou/GSO

Via

Access Paper or Ask Questions

MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Apr 19, 2024

Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang

Figure 1 for MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Figure 2 for MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Figure 3 for MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Figure 4 for MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Abstract:LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model, termed MambaMOS. Firstly, we develop a novel embedding module, the Time Clue Bootstrapping Embedding (TCBE), to enhance the coupling of temporal and spatial information in point clouds and alleviate the issue of overlooked temporal clues. Secondly, we introduce the Motion-aware State Space Model (MSSM) to endow the model with the capacity to understand the temporal correlations of the same object across different time steps. Specifically, MSSM emphasizes the motion states of the same object at different time steps through two distinct temporal modeling and correlation steps. We utilize an improved state space model to represent these motion differences, significantly modeling the motion states. Finally, extensive experiments on the SemanticKITTI-MOS and KITTI-Road benchmarks demonstrate that the proposed MambaMOS achieves state-of-the-art performance. The source code of this work will be made publicly available at https://github.com/Terminal-K/MambaMOS.

* The source code will be made publicly available at https://github.com/Terminal-K/MambaMOS

Via

Access Paper or Ask Questions