Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiri Matas

BOP: Benchmark for 6D Object Pose Estimation

Aug 24, 2018

Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders Glent Buch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis(+6 more)

Figure 1 for BOP: Benchmark for 6D Object Pose Estimation

Figure 2 for BOP: Benchmark for 6D Object Pose Estimation

Figure 3 for BOP: Benchmark for 6D Object Pose Estimation

Figure 4 for BOP: Benchmark for 6D Object Pose Estimation

Abstract:We propose a benchmark for 6D pose estimation of a rigid object from a single RGB-D input image. The training data consists of a texture-mapped 3D object model or images of the object in known 6D poses. The benchmark comprises of: i) eight datasets in a unified format that cover different practical scenarios, including two new datasets focusing on varying lighting conditions, ii) an evaluation methodology with a pose-error function that deals with pose ambiguities, iii) a comprehensive evaluation of 15 diverse recent methods that captures the status quo of the field, and iv) an online evaluation system that is open for continuous submission of new results. The evaluation shows that methods based on point-pair features currently perform best, outperforming template matching methods, learning-based methods and methods based on 3D local features. The project website is available at bop.felk.cvut.cz.

* ECCV 2018

Via

Access Paper or Ask Questions

Performance Analysis and Robustification of Single-query 6-DoF Camera Pose Estimation

Aug 17, 2018

Junsheng Fu, Said Pertuz, Jiri Matas, Joni-Kristian Kämäräinen

Figure 1 for Performance Analysis and Robustification of Single-query 6-DoF Camera Pose Estimation

Figure 2 for Performance Analysis and Robustification of Single-query 6-DoF Camera Pose Estimation

Figure 3 for Performance Analysis and Robustification of Single-query 6-DoF Camera Pose Estimation

Figure 4 for Performance Analysis and Robustification of Single-query 6-DoF Camera Pose Estimation

Abstract:We consider a single-query 6-DoF camera pose estimation with reference images and a point cloud, i.e. the problem of estimating the position and orientation of a camera by using reference images and a point cloud. In this work, we perform a systematic comparison of three state-of-the-art strategies for 6-DoF camera pose estimation, i.e. feature-based, photometric-based and mutual-information-based approaches. The performance of the studied methods is evaluated on two standard datasets in terms of success rate, translation error and max orientation error. Building on the results analysis, we propose a hybrid approach that combines feature-based and mutual-information-based pose estimation methods since it provides complementary properties for pose estimation. Experiments show that (1) in cases with large environmental variance, the hybrid approach outperforms feature-based and mutual-information-based approaches by an average of 25.1% and 5.8% in terms of success rate, respectively; (2) in cases where query and reference images are captured at similar imaging conditions, the hybrid approach performs similarly as the feature-based approach, but outperforms both photometric-based and mutual-information-based approaches with a clear margin; (3) the feature-based approach is consistently more accurate than mutual-information-based and photometric-based approaches when at least 4 consistent matching points are found between the query and reference images.

Via

Access Paper or Ask Questions

Dichromatic Gray Pixel for Camera-agnostic Color Constancy

Jul 05, 2018

Yanlin Qian, Ke Chen, Jarno Nikkanen, Joni-Kristian Kämäräinen, Jiri Matas

Figure 1 for Dichromatic Gray Pixel for Camera-agnostic Color Constancy

Figure 2 for Dichromatic Gray Pixel for Camera-agnostic Color Constancy

Figure 3 for Dichromatic Gray Pixel for Camera-agnostic Color Constancy

Figure 4 for Dichromatic Gray Pixel for Camera-agnostic Color Constancy

Abstract:We present a statistical color constancy method that relies on novel gray pixel detection and mean shift clustering. The method, called Mean Shifted Grey Pixel -- MSGP -- is compact, easy to compute and requires no training. Experiments on different datasets show that the proposed approach outperforms state-of-the-art methods in the camera-agnostic scenario. In the setting where the camera is known, MSGP outperforms all statistical methods.

* updated and submitted to ACCV 2018

Via

Access Paper or Ask Questions

Fast Motion Deblurring for Feature Detection and Matching Using Inertial Measurements

May 22, 2018

Janne Mustaniemi, Juho Kannala, Simo Särkkä, Jiri Matas, Janne Heikkilä

Figure 1 for Fast Motion Deblurring for Feature Detection and Matching Using Inertial Measurements

Figure 2 for Fast Motion Deblurring for Feature Detection and Matching Using Inertial Measurements

Figure 3 for Fast Motion Deblurring for Feature Detection and Matching Using Inertial Measurements

Figure 4 for Fast Motion Deblurring for Feature Detection and Matching Using Inertial Measurements

Abstract:Many computer vision and image processing applications rely on local features. It is well-known that motion blur decreases the performance of traditional feature detectors and descriptors. We propose an inertial-based deblurring method for improving the robustness of existing feature detectors and descriptors against the motion blur. Unlike most deblurring algorithms, the method can handle spatially-variant blur and rolling shutter distortion. Furthermore, it is capable of running in real-time contrary to state-of-the-art algorithms. The limitations of inertial-based blur estimation are taken into account by validating the blur estimates using image data. The evaluation shows that when the method is used with traditional feature detector and descriptor, it increases the number of detected keypoints, provides higher repeatability and improves the localization accuracy. We also demonstrate that such features will lead to more accurate and complete reconstructions when used in the application of 3D visual reconstruction.

Via

Access Paper or Ask Questions

Improving CNN classifiers by estimating test-time priors

May 21, 2018

Milan Sulc, Jiri Matas

Figure 1 for Improving CNN classifiers by estimating test-time priors

Figure 2 for Improving CNN classifiers by estimating test-time priors

Figure 3 for Improving CNN classifiers by estimating test-time priors

Figure 4 for Improving CNN classifiers by estimating test-time priors

Abstract:The problem of different training and test set class priors is addressed in the context of CNN classifiers. An EM-based algorithm for test-time class priors estimation is evaluated on fine-grained computer vision problems for both the batch and on-line situations. Experimental results show a significant improvement on the fine-grained classification tasks using the known evaluation-time priors, increasing the top-1 accuracy by 4.0% on the FGVC iNaturalist 2018 validation set and by 3.9% on the FGVCx Fungi 2018 validation set. Iterative estimation of test-time priors on the PlantCLEF 2017 dataset increased the image classification accuracy by 3.4%, allowing a single CNN model to achieve state-of-the-art results and outperform the competition-winning ensemble of 12 CNNs.

Via

Access Paper or Ask Questions

DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

Apr 03, 2018

Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, Jiri Matas

Figure 1 for DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

Figure 2 for DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

Figure 3 for DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

Figure 4 for DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

Abstract:We present DeblurGAN, an end-to-end learned method for motion deblurring. The learning is based on a conditional GAN and the content loss . DeblurGAN achieves state-of-the art performance both in the structural similarity measure and visual appearance. The quality of the deblurring model is also evaluated in a novel way on a real-world problem -- object detection on (de-)blurred images. The method is 5 times faster than the closest competitor -- DeepDeblur. We also introduce a novel method for generating synthetic motion blurred images from sharp ones, allowing realistic dataset augmentation. The model, code and the dataset are available at https://github.com/KupynOrest/DeblurGAN

* CVPR 2018 camera-ready

Via

Access Paper or Ask Questions

MAGSAC: marginalizing sample consensus

Mar 20, 2018

Daniel Barath, Jiri Matas

Figure 1 for MAGSAC: marginalizing sample consensus

Figure 2 for MAGSAC: marginalizing sample consensus

Figure 3 for MAGSAC: marginalizing sample consensus

Figure 4 for MAGSAC: marginalizing sample consensus

Abstract:A method called sigma-consensus is proposed to eliminate the need for a user-defined inlier-outlier threshold in RANSAC. Instead of estimating sigma, it is marginalized over a range of noise scales using a Bayesian estimator, i.e. the optimized model is obtained as the weighted average using the posterior probabilities as weights. Applying sigma-consensus, two methods are proposed: (i) a post-processing step which always improved the model quality on a wide range of vision problems without noticeable deterioration in processing time, i.e. at most 1-2 milliseconds; and (ii) a locally optimized RANSAC, called LO-MAGSAC, which includes sigma-consensus to the local optimization of LO-RANSAC. The method is superior to the state-of-the-art in terms of geometric accuracy on publicly available real world datasets for epipolar geometry (F and E), homography and affine transformation estimation.

Via

Access Paper or Ask Questions

E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text

Jan 30, 2018

Yash Patel, Michal Bušta, Jiri Matas

Figure 1 for E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text

Figure 2 for E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text

Figure 3 for E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text

Figure 4 for E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text

Abstract:An end-to-end method for multi-language scene text localization, recognition and script identification is proposed. The approach is based on a set of convolutional neural nets. The method, called E2E-MLT, achieves state-of-the-art performance for both joint localization and script identification in natural images and in cropped word script identification. E2E-MLT is the first published multi-language OCR for scene text. The experiments show that obtaining accurate multi-language multi-script annotations is a challenging problem.

Via

Access Paper or Ask Questions

Working hard to know your neighbor's margins: Local descriptor learning loss

Jan 12, 2018

Anastasiya Mishchuk, Dmytro Mishkin, Filip Radenovic, Jiri Matas

Figure 1 for Working hard to know your neighbor's margins: Local descriptor learning loss

Figure 2 for Working hard to know your neighbor's margins: Local descriptor learning loss

Figure 3 for Working hard to know your neighbor's margins: Local descriptor learning loss

Figure 4 for Working hard to know your neighbor's margins: Local descriptor learning loss

Abstract:We introduce a novel loss for learning local feature descriptors which is inspired by the Lowe's matching criterion for SIFT. We show that the proposed loss that maximizes the distance between the closest positive and closest negative patch in the batch is better than complex regularization methods; it works well for both shallow and deep convolution network architectures. Applying the novel loss to the L2Net CNN architecture results in a compact descriptor -- it has the same dimensionality as SIFT (128) that shows state-of-art performance in wide baseline stereo, patch verification and instance retrieval benchmarks. It is fast, computing a descriptor takes about 1 millisecond on a low-end GPU.

* Post-NIPS-2017 update. Better hyperparameters and better results on HPatches + Brown dataset, + couple of references

Via

Access Paper or Ask Questions

Graph-Cut RANSAC

Nov 16, 2017

Daniel Barath, Jiri Matas

Abstract:A novel method for robust estimation, called Graph-Cut RANSAC, GC-RANSAC in short, is introduced. To separate inliers and outliers, it runs the graph-cut algorithm in the local optimization (LO) step which is applied when a so-far-the-best model is found. The proposed LO step is conceptually simple, easy to implement, globally optimal and efficient. GC-RANSAC is shown experimentally, both on synthesized tests and real image pairs, to be more geometrically accurate than state-of-the-art methods on a range of problems, e.g. line fitting, homography, affine transformation, fundamental and essential matrix estimation. It runs in real-time for many problems at a speed approximately equal to that of the less accurate alternatives (in milliseconds on standard CPU).

Via

Access Paper or Ask Questions