Alert button
Picture for Laurent Kneip

Laurent Kneip

Alert button

RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery

Sep 19, 2023
Jiaxin Wei, Xibin Song, Weizhe Liu, Laurent Kneip, Hongdong Li, Pan Ji

While showing promising results, recent RGB-D camera-based category-level object pose estimation methods have restricted applications due to the heavy reliance on depth sensors. RGB-only methods provide an alternative to this problem yet suffer from inherent scale ambiguity stemming from monocular observations. In this paper, we propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information, mainly facilitating the search for inlier 2D-3D correspondence. Meanwhile, a separate branch is designed to directly recover the metric scale of the object based on category-level statistics. Finally, we advocate using the RANSAC-P$n$P algorithm to robustly solve for 6D object pose. Extensive experiments have been conducted on both synthetic and real datasets, demonstrating the superior performance of our method over previous state-of-the-art RGB-based approaches, especially in terms of rotation accuracy.

Viaarxiv icon

MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation

Aug 17, 2023
Jiaqi Yang, Yucong Chen, Xiangting Meng, Chenxin Yan, Min Li, Ran Chen, Lige Liu, Tao Sun, Laurent Kneip

Figure 1 for MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation
Figure 2 for MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation
Figure 3 for MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation
Figure 4 for MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation

We propose a novel framework for RGB-based category-level 6D object pose and size estimation. Our approach relies on the prediction of normalized object coordinate space (NOCS), which serves as an efficient and effective object canonical representation that can be extracted from RGB images. Unlike previous approaches that heavily relied on additional depth readings as input, our novelty lies in leveraging multi-view information, which is commonly available in practical scenarios where a moving camera continuously observes the environment. By introducing multi-view constraints, we can obtain accurate camera pose and depth estimation from a monocular dense SLAM framework. Additionally, by incorporating constraints on the camera relative pose, we can apply trimming strategies and robust pose averaging on the multi-view object poses, resulting in more accurate and robust estimations of category-level object poses even in the absence of direct depth readings. Furthermore, we introduce a novel NOCS prediction network that significantly improves performance. Our experimental results demonstrate the strong performance of our proposed method, even comparable to state-of-the-art RGB-D methods across public dataset sequences. Additionally, we showcase the generalization ability of our method by evaluating it on self-collected datasets.

Viaarxiv icon

Revisiting Event-based Video Frame Interpolation

Jul 24, 2023
Jiaben Chen, Yichen Zhu, Dongze Lian, Jiaqi Yang, Yifu Wang, Renrui Zhang, Xinhang Liu, Shenhan Qian, Laurent Kneip, Shenghua Gao

Figure 1 for Revisiting Event-based Video Frame Interpolation
Figure 2 for Revisiting Event-based Video Frame Interpolation
Figure 3 for Revisiting Event-based Video Frame Interpolation
Figure 4 for Revisiting Event-based Video Frame Interpolation

Dynamic vision sensors or event cameras provide rich complementary information for video frame interpolation. Existing state-of-the-art methods follow the paradigm of combining both synthesis-based and warping networks. However, few of those methods fully respect the intrinsic characteristics of events streams. Given that event cameras only encode intensity changes and polarity rather than color intensities, estimating optical flow from events is arguably more difficult than from RGB information. We therefore propose to incorporate RGB information in an event-guided optical flow refinement strategy. Moreover, in light of the quasi-continuous nature of the time signals provided by event cameras, we propose a divide-and-conquer strategy in which event-based intermediate frame synthesis happens incrementally in multiple simplified stages rather than in a single, long stage. Extensive experiments on both synthetic and real-world datasets show that these modifications lead to more reliable and realistic intermediate frame results than previous video frame interpolation methods. Our findings underline that a careful consideration of event characteristics such as high temporal density and elevated noise benefits interpolation accuracy.

* Accepted by IROS2023 Project Site: https://jiabenchen.github.io/revisit_event 
Viaarxiv icon

Scale jump-aware pose graph relaxation for monocular SLAM with re-initializations

Jul 23, 2023
Runze Yuan, Ran Cheng, Lige Liu, Tao Sun, Laurent Kneip

Figure 1 for Scale jump-aware pose graph relaxation for monocular SLAM with re-initializations
Figure 2 for Scale jump-aware pose graph relaxation for monocular SLAM with re-initializations
Figure 3 for Scale jump-aware pose graph relaxation for monocular SLAM with re-initializations
Figure 4 for Scale jump-aware pose graph relaxation for monocular SLAM with re-initializations

Pose graph relaxation has become an indispensable addition to SLAM enabling efficient global registration of sensor reference frames under the objective of satisfying pair-wise relative transformation constraints. The latter may be given by incremental motion estimation or global place recognition. While the latter case enables loop closures and drift compensation, care has to be taken in the monocular case in which local estimates of structure and displacements can differ from reality not just in terms of noise, but also in terms of a scale factor. Owing to the accumulation of scale propagation errors, this scale factor is drifting over time, hence scale-drift aware pose graph relaxation has been introduced. We extend this idea to cases in which the relative scale between subsequent sensor frames is unknown, a situation that can easily occur if monocular SLAM enters re-initialization and no reliable overlap between successive local maps can be identified. The approach is realized by a hybrid pose graph formulation that combines the regular similarity consistency terms with novel, scale-blind constraints. We apply the technique to the practically relevant case of small indoor service robots capable of effectuating purely rotational displacements, a condition that can easily cause tracking failures. We demonstrate that globally consistent trajectories can be recovered even if multiple re-initializations occur along the loop, and present an in-depth study of success and failure cases.

* 8 pages, 23 figures, International Conference on Intelligent Robots and Systems 2023 
Viaarxiv icon

Cross-modal Place Recognition in Image Databases using Event-based Sensors

Jul 03, 2023
Xiang Ji, Jiaxin Wei, Yifu Wang, Huiliang Shang, Laurent Kneip

Figure 1 for Cross-modal Place Recognition in Image Databases using Event-based Sensors
Figure 2 for Cross-modal Place Recognition in Image Databases using Event-based Sensors
Figure 3 for Cross-modal Place Recognition in Image Databases using Event-based Sensors
Figure 4 for Cross-modal Place Recognition in Image Databases using Event-based Sensors

Visual place recognition is an important problem towards global localization in many robotics tasks. One of the biggest challenges is that it may suffer from illumination or appearance changes in surrounding environments. Event cameras are interesting alternatives to frame-based sensors as their high dynamic range enables robust perception in difficult illumination conditions. However, current event-based place recognition methods only rely on event information, which restricts downstream applications of VPR. In this paper, we present the first cross-modal visual place recognition framework that is capable of retrieving regular images from a database given an event query. Our method demonstrates promising results with respect to the state-of-the-art frame-based and event-based methods on the Brisbane-Event-VPR dataset under different scenarios. We also verify the effectiveness of the combination of retrieval and classification, which can boost performance by a large margin.

Viaarxiv icon

Accelerating Globally Optimal Consensus Maximization in Geometric Vision

Apr 11, 2023
Xinyue Zhang, Liangzu Peng, Wanting Xu, Laurent Kneip

Figure 1 for Accelerating Globally Optimal Consensus Maximization in Geometric Vision
Figure 2 for Accelerating Globally Optimal Consensus Maximization in Geometric Vision
Figure 3 for Accelerating Globally Optimal Consensus Maximization in Geometric Vision
Figure 4 for Accelerating Globally Optimal Consensus Maximization in Geometric Vision

Branch-and-bound-based consensus maximization stands out due to its important ability of retrieving the globally optimal solution to outlier-affected geometric problems. However, while the discovery of such solutions caries high scientific value, its application in practical scenarios is often prohibited by its computational complexity growing exponentially as a function of the dimensionality of the problem at hand. In this work, we convey a novel, general technique that allows us to branch over an $n-1$ dimensional space for an n-dimensional problem. The remaining degree of freedom can be solved globally optimally within each bound calculation by applying the efficient interval stabbing technique. While each individual bound derivation is harder to compute owing to the additional need for solving a sorting problem, the reduced number of intervals and tighter bounds in practice lead to a significant reduction in the overall number of required iterations. Besides an abstract introduction of the approach, we present applications to three fundamental geometric computer vision problems: camera resectioning, relative camera pose estimation, and point set registration. Through our exhaustive tests, we demonstrate significant speed-up factors at times exceeding two orders of magnitude, thereby increasing the viability of globally optimal consensus maximizers in online application scenarios.

Viaarxiv icon

Multi-embodiment Legged Robot Control as a Sequence Modeling Problem

Dec 18, 2022
Chen Yu, Weinan Zhang, Hang Lai, Zheng Tian, Laurent Kneip, Jun Wang

Figure 1 for Multi-embodiment Legged Robot Control as a Sequence Modeling Problem
Figure 2 for Multi-embodiment Legged Robot Control as a Sequence Modeling Problem
Figure 3 for Multi-embodiment Legged Robot Control as a Sequence Modeling Problem
Figure 4 for Multi-embodiment Legged Robot Control as a Sequence Modeling Problem

Robots are traditionally bounded by a fixed embodiment during their operational lifetime, which limits their ability to adapt to their surroundings. Co-optimizing control and morphology of a robot, however, is often inefficient due to the complex interplay between the controller and morphology. In this paper, we propose a learning-based control method that can inherently take morphology into consideration such that once the control policy is trained in the simulator, it can be easily deployed to robots with different embodiments in the real world. In particular, we present the Embodiment-aware Transformer (EAT), an architecture that casts this control problem as conditional sequence modeling. EAT outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired robot embodiment, past states, and actions, our EAT model can generate future actions that best fit the current robot embodiment. Experimental results show that EAT can outperform all other alternatives in embodiment-varying tasks, and succeed in an example of real-world evolution tasks: stepping down a stair through updating the morphology alone. We hope that EAT will inspire a new push toward real-world evolution across many domains, where algorithms like EAT can blaze a trail by bridging the field of evolutionary robotics and big data sequence modeling.

Viaarxiv icon

Fast geometric trim fitting using partial incremental sorting and accumulation

Sep 05, 2022
Min Li, Laurent Kneip

Figure 1 for Fast geometric trim fitting using partial incremental sorting and accumulation
Figure 2 for Fast geometric trim fitting using partial incremental sorting and accumulation
Figure 3 for Fast geometric trim fitting using partial incremental sorting and accumulation
Figure 4 for Fast geometric trim fitting using partial incremental sorting and accumulation

We present an algorithmic contribution to improve the efficiency of robust trim-fitting in outlier affected geometric regression problems. The method heavily relies on the quick sort algorithm, and we present two important insights. First, partial sorting is sufficient for the incremental calculation of the x-th percentile value. Second, the normal equations in linear fitting problems may be updated incrementally by logging swap operations across the x-th percentile boundary during sorting. Besides linear fitting problems, we demonstrate how the technique can be additionally applied to closed-form, non-linear energy minimization problems, thus enabling efficient trim fitting under geometrically optimal objectives. We apply our method to two distinct camera resectioning algorithms, and demonstrate highly efficient and reliable, geometric trim fitting.

* 9 pages, 7 figures, conference 
Viaarxiv icon

VECtor: A Versatile Event-Centric Benchmark for Multi-Sensor SLAM

Jul 04, 2022
Ling Gao, Yuxuan Liang, Jiaqi Yang, Shaoxun Wu, Chenyu Wang, Jiaben Chen, Laurent Kneip

Figure 1 for VECtor: A Versatile Event-Centric Benchmark for Multi-Sensor SLAM
Figure 2 for VECtor: A Versatile Event-Centric Benchmark for Multi-Sensor SLAM
Figure 3 for VECtor: A Versatile Event-Centric Benchmark for Multi-Sensor SLAM
Figure 4 for VECtor: A Versatile Event-Centric Benchmark for Multi-Sensor SLAM

Event cameras have recently gained in popularity as they hold strong potential to complement regular cameras in situations of high dynamics or challenging illumination. An important problem that may benefit from the addition of an event camera is given by Simultaneous Localization And Mapping (SLAM). However, in order to ensure progress on event-inclusive multi-sensor SLAM, novel benchmark sequences are needed. Our contribution is the first complete set of benchmark datasets captured with a multi-sensor setup containing an event-based stereo camera, a regular stereo camera, multiple depth sensors, and an inertial measurement unit. The setup is fully hardware-synchronized and underwent accurate extrinsic calibration. All sequences come with ground truth data captured by highly accurate external reference devices such as a motion capture system. Individual sequences include both small and large-scale environments, and cover the specific challenges targeted by dynamic vision sensors.

* IEEE Robotics and Automation Letters, 2022  
Viaarxiv icon

Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database

Jul 04, 2022
Jiaxin Wei, Lan Hu, Chenyu Wang, Laurent Kneip

Figure 1 for Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database
Figure 2 for Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database
Figure 3 for Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database
Figure 4 for Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database

We present a new solution to the fine-grained retrieval of clean CAD models from a large-scale database in order to recover detailed object shape geometries for RGBD scans. Unlike previous work simply indexing into a moderately small database using an object shape descriptor and accepting the top retrieval result, we argue that in the case of a large-scale database a more accurate model may be found within a neighborhood of the descriptor. More importantly, we propose that the distinctiveness deficiency of shape descriptors at the instance level can be compensated by a geometry-based re-ranking of its neighborhood. Our approach first leverages the discriminative power of learned representations to distinguish between different categories of models and then uses a novel robust point set distance metric to re-rank the CAD neighborhood, enabling fine-grained retrieval in a large shape database. Evaluation on a real-world dataset shows that our geometry-based re-ranking is a conceptually simple but highly effective method that can lead to a significant improvement in retrieval accuracy compared to the state-of-the-art.

* Accepted by IROS 2022 
Viaarxiv icon