Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiri Matas

Text Recognition -- Real World Data and Where to Find Them

Jul 17, 2020

Klára Janoušková, Jiri Matas, Lluis Gomez, Dimosthenis Karatzas

Figure 1 for Text Recognition -- Real World Data and Where to Find Them

Figure 2 for Text Recognition -- Real World Data and Where to Find Them

Figure 3 for Text Recognition -- Real World Data and Where to Find Them

Figure 4 for Text Recognition -- Real World Data and Where to Find Them

Abstract:We present a method for exploiting weakly annotated images to improve text extraction pipelines. The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions. The proposed method includes matching of imprecise transcription to weak annotations and edit distance guided neighbourhood search. It produces nearly error-free, localised instances of scene text, which we treat as "pseudo ground truth" (PGT). We apply the method to two weakly-annotated datasets. Training with the extracted PGT consistently improves the accuracy of a state of the art recognition model, by 3.7~\% on average, across different benchmark datasets (image domains) and 24.5~\% on one of the weakly annotated datasets.

* 10 pages

Via

Access Paper or Ask Questions

Learning Surrogates via Deep Embedding

Jul 17, 2020

Yash Patel, Tomas Hodan, Jiri Matas

Figure 1 for Learning Surrogates via Deep Embedding

Figure 2 for Learning Surrogates via Deep Embedding

Figure 3 for Learning Surrogates via Deep Embedding

Figure 4 for Learning Surrogates via Deep Embedding

Abstract:This paper proposes a technique for training a neural network by minimizing a surrogate loss that approximates the target evaluation metric, which may be non-differentiable. The surrogate is learned via a deep embedding where the Euclidean distance between the prediction and the ground truth corresponds to the value of the evaluation metric. The effectiveness of the proposed technique is demonstrated in a post-tuning setup, where a trained model is tuned using the learned surrogate. Without a significant computational overhead and any bells and whistles, improvements are demonstrated on challenging and practical tasks of scene-text recognition and detection. In the recognition task, the model is tuned using a surrogate approximating the edit distance metric and achieves up to $39\%$ relative improvement in the total edit distance. In the detection task, the surrogate approximates the intersection over union metric for rotated bounding boxes and yields up to $4.25\%$ relative improvement in the $F_{1}$ score.

* ECCV 2020 camera-ready version

Via

Access Paper or Ask Questions

Guiding Monocular Depth Estimation Using Depth-Attention Volume

Apr 06, 2020

Lam Huynh, Phong Nguyen-Ha, Jiri Matas, Esa Rahtu, Janne Heikkila

Figure 1 for Guiding Monocular Depth Estimation Using Depth-Attention Volume

Figure 2 for Guiding Monocular Depth Estimation Using Depth-Attention Volume

Figure 3 for Guiding Monocular Depth Estimation Using Depth-Attention Volume

Figure 4 for Guiding Monocular Depth Estimation Using Depth-Attention Volume

Abstract:Recovering the scene depth from a single image is an ill-posed problem that requires additional priors, often referred to as monocular depth cues, to disambiguate different 3D interpretations. In recent works, those priors have been learned in an end-to-end manner from large datasets by using deep neural networks. In this paper, we propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments. This is achieved by incorporating a non-local coplanarity constraint to the network with a novel attention mechanism called depth-attention volume (DAV). Experiments on two popular indoor datasets, namely NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results while using only a fraction of the number of parameters needed by the competing methods.

Via

Access Paper or Ask Questions

EPOS: Estimating 6D Pose of Objects with Symmetries

Apr 01, 2020

Tomas Hodan, Daniel Barath, Jiri Matas

Figure 1 for EPOS: Estimating 6D Pose of Objects with Symmetries

Figure 2 for EPOS: Estimating 6D Pose of Objects with Symmetries

Figure 3 for EPOS: Estimating 6D Pose of Objects with Symmetries

Figure 4 for EPOS: Estimating 6D Pose of Objects with Symmetries

Abstract:We present a new method for estimating the 6D pose of rigid objects with available 3D models from a single RGB input image. The method is applicable to a broad range of objects, including challenging ones with global or partial symmetries. An object is represented by compact surface fragments which allow handling symmetries in a systematic manner. Correspondences between densely sampled pixels and the fragments are predicted using an encoder-decoder network. At each pixel, the network predicts: (i) the probability of each object's presence, (ii) the probability of the fragments given the object's presence, and (iii) the precise 3D location on each fragment. A data-dependent number of corresponding 3D locations is selected per pixel, and poses of possibly multiple object instances are estimated using a robust and efficient variant of the PnP-RANSAC algorithm. In the BOP Challenge 2019, the method outperforms all RGB and most RGB-D and D methods on the T-LESS and LM-O datasets. On the YCB-V dataset, it is superior to all competitors, with a large margin over the second-best RGB method. Source code is at: cmp.felk.cvut.cz/epos.

* Accepted to CVPR 2020

Via

Access Paper or Ask Questions

A Benchmark for Temporal Color Constancy

Mar 08, 2020

Yanlin Qian, Jani Käpylä, Joni-Kristian Kämäräinen, Samu Koskinen, Jiri Matas

Figure 1 for A Benchmark for Temporal Color Constancy

Figure 2 for A Benchmark for Temporal Color Constancy

Figure 3 for A Benchmark for Temporal Color Constancy

Figure 4 for A Benchmark for Temporal Color Constancy

Abstract:Temporal Color Constancy (CC) is a recently proposed approach that challenges the conventional single-frame color constancy. The conventional approach is to use a single frame - shot frame - to estimate the scene illumination color. In temporal CC, multiple frames from the view finder sequence are used to estimate the color. However, there are no realistic large scale temporal color constancy datasets for method evaluation. In this work, a new temporal CC benchmark is introduced. The benchmark comprises of (1) 600 real-world sequences recorded with a high-resolution mobile phone camera, (2) a fixed train-test split which ensures consistent evaluation, and (3) a baseline method which achieves high accuracy in the new benchmark and the dataset used in previous works. Results for more than 20 well-known color constancy methods including the recent state-of-the-arts are reported in our experiments.

* 16 pages, 6 figures

Via

Access Paper or Ask Questions

Image Matching across Wide Baselines: From Paper to Practice

Mar 03, 2020

Yuhe Jin, Dmytro Mishkin, Anastasiia Mishchuk, Jiri Matas, Pascal Fua, Kwang Moo Yi, Eduard Trulls

Figure 1 for Image Matching across Wide Baselines: From Paper to Practice

Figure 2 for Image Matching across Wide Baselines: From Paper to Practice

Figure 3 for Image Matching across Wide Baselines: From Paper to Practice

Figure 4 for Image Matching across Wide Baselines: From Paper to Practice

Abstract:We introduce a comprehensive benchmark for local features and robust estimation algorithms, focusing on the downstream task -- the accuracy of the reconstructed camera pose -- as our primary metric. Our pipeline's modular structure allows us to easily integrate, configure, and combine methods and heuristics. We demonstrate this by embedding dozens of popular algorithms and evaluating them, from seminal works to the cutting edge of machine learning research. We show that with proper settings, classical solutions may still outperform the perceived state of the art. Besides establishing the actual state of the art, the experiments conducted in this paper reveal unexpected properties of SfM pipelines that can be exploited to help improve their performance, for both algorithmic and learned methods. Data and code are online https://github.com/vcg-uvic/image-matching-benchmark, providing an easy-to-use and flexible framework for the benchmarking of local feature and robust estimation methods, both alongside and against top-performing methods. This work provides the basis for an open challenge on wide-baseline image matching https://vision.uvic.ca/image-matching-challenge .

* 19 pages, 25 figures

Via

Access Paper or Ask Questions

MAGSAC++, a fast, reliable and accurate robust estimator

Dec 11, 2019

Daniel Barath, Jana Noskova, Maksym Ivashechkin, Jiri Matas

Figure 1 for MAGSAC++, a fast, reliable and accurate robust estimator

Figure 2 for MAGSAC++, a fast, reliable and accurate robust estimator

Figure 3 for MAGSAC++, a fast, reliable and accurate robust estimator

Figure 4 for MAGSAC++, a fast, reliable and accurate robust estimator

Abstract:A new method for robust estimation, MAGSAC++, is proposed. It introduces a new model quality (scoring) function that does not require the inlier-outlier decision, and a novel marginalization procedure formulated as an iteratively re-weighted least-squares approach. We also propose a new sampler, Progressive NAPSAC, for RANSAC-like robust estimators. Exploiting the fact that nearby points often originate from the same model in real-world data, it finds local structures earlier than global samplers. The progressive transition from local to global sampling does not suffer from the weaknesses of purely localized samplers. On six publicly available real-world datasets for homography and fundamental matrix fitting, MAGSAC++ produces results superior to state-of-the-art robust methods. It is faster, more geometrically accurate and fails less often.

* arXiv admin note: substantial text overlap with arXiv:1906.02295

Via

Access Paper or Ask Questions

DAL -- A Deep Depth-aware Long-term Tracker

Dec 02, 2019

Yanlin Qian, Alan Lukežič, Matej Kristan, Joni-Kristian Kämäräinen, Jiri Matas

Figure 1 for DAL -- A Deep Depth-aware Long-term Tracker

Figure 2 for DAL -- A Deep Depth-aware Long-term Tracker

Figure 3 for DAL -- A Deep Depth-aware Long-term Tracker

Figure 4 for DAL -- A Deep Depth-aware Long-term Tracker

Abstract:The best RGBD trackers provide high accuracy but are slow to run. On the other hand, the best RGB trackers are fast but clearly inferior on the RGBD datasets. In this work, we propose a deep depth-aware long-term tracker that achieves state-of-the-art RGBD tracking performance and is fast to run. We reformulate deep discriminative correlation filter (DCF) to embed the depth information into deep features. Moreover, the same depth-aware correlation filter is used for target re-detection. Comprehensive evaluations show that the proposed tracker achieves state-of-the-art performance on the Princeton RGBD, STC, and the newly-released CDTB benchmarks and runs 20 fps.

* 10 pages

Via

Access Paper or Ask Questions

Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects

Nov 25, 2019

Denys Rozumnyi, Jan Kotera, Filip Sroubek, Jiri Matas

Figure 1 for Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects

Figure 2 for Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects

Figure 3 for Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects

Figure 4 for Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects

Abstract:We propose a novel method that tracks fast moving objects, mainly non-uniform spherical, in full 6 degrees of freedom, estimating simultaneously their 3D motion trajectory, 3D pose and object appearance changes with a time step that is a fraction of the video frame exposure time. The sub-frame object localization and appearance estimation allows realistic temporal super-resolution and precise shape estimation. The method, called TbD-3D (Tracking by Deblatting in 3D) relies on a novel reconstruction algorithm which solves a piece-wise deblurring and matting problem. The 3D rotation is estimated by minimizing the reprojection error. As a second contribution, we present a new challenging dataset with fast moving objects that change their appearance and distance to the camera. High speed camera recordings with zero lag between frame exposures were used to generate videos with different frame rates annotated with ground-truth trajectory and pose.

Via

Access Paper or Ask Questions

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019

Jul 01, 2019

Nibal Nayef, Yash Patel, Michal Busta, Pinaki Nath Chowdhury, Dimosthenis Karatzas, Wafa Khlif, Jiri Matas, Umapada Pal, Jean-Christophe Burie, Cheng-lin Liu(+1 more)

Figure 1 for ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019

Figure 2 for ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019

Figure 3 for ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019

Figure 4 for ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019

Abstract:With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense. With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of the RRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a large scale multi-lingual synthetic dataset to assist the training, and a baseline End-to-End recognition method. The real dataset consists of 20,000 images containing text from 10 languages. The challenge has 4 tasks covering various aspects of multi-lingual scene text: (a) text detection, (b) cropped word script classification, (c) joint text detection and script classification and (d) end-to-end detection and recognition. In total, the competition received 60 submissions from the research and industrial communities. This paper presents the dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge.

* ICDAR'19 camera-ready version. Competition available at https://rrc.cvc.uab.es/?ch=15. The first two authors contributed equally

Via

Access Paper or Ask Questions