Alert button
Picture for Giorgos Tolias

Giorgos Tolias

Alert button

Training Ensembles with Inliers and Outliers for Semi-supervised Active Learning

Jul 07, 2023
Vladan Stojnić, Zakaria Laskar, Giorgos Tolias

Deep active learning in the presence of outlier examples poses a realistic yet challenging scenario. Acquiring unlabeled data for annotation requires a delicate balance between avoiding outliers to conserve the annotation budget and prioritizing useful inlier examples for effective training. In this work, we present an approach that leverages three highly synergistic components, which are identified as key ingredients: joint classifier training with inliers and outliers, semi-supervised learning through pseudo-labeling, and model ensembling. Our work demonstrates that ensembling significantly enhances the accuracy of pseudo-labeling and improves the quality of data acquisition. By enabling semi-supervision through the joint training process, where outliers are properly handled, we observe a substantial boost in classifier accuracy through the use of all available unlabeled examples. Notably, we reveal that the integration of joint training renders explicit outlier detection unnecessary; a conventional component for acquisition in prior work. The three key components align seamlessly with numerous existing approaches. Through empirical evaluations, we showcase that their combined use leads to a performance increase. Remarkably, despite its simplicity, our proposed approach outperforms all other methods in terms of performance. Code: https://github.com/vladan-stojnic/active-outliers

Viaarxiv icon

The 2023 Video Similarity Dataset and Challenge

Jun 15, 2023
Ed Pizzi, Giorgos Kordopatis-Zilos, Hiral Patel, Gheorghe Postelnicu, Sugosh Nagavara Ravindra, Akshay Gupta, Symeon Papadopoulos, Giorgos Tolias, Matthijs Douze

Figure 1 for The 2023 Video Similarity Dataset and Challenge
Figure 2 for The 2023 Video Similarity Dataset and Challenge
Figure 3 for The 2023 Video Similarity Dataset and Challenge
Figure 4 for The 2023 Video Similarity Dataset and Challenge

This work introduces a dataset, benchmark, and challenge for the problem of video copy detection and localization. The problem comprises two distinct but related tasks: determining whether a query video shares content with a reference video ("detection"), and additionally temporally localizing the shared content within each video ("localization"). The benchmark is designed to evaluate methods on these two tasks, and simulates a realistic needle-in-haystack setting, where the majority of both query and reference videos are "distractors" containing no copied content. We propose a metric that reflects both detection and localization accuracy. The associated challenge consists of two corresponding tracks, each with restrictions that reflect real-world settings. We provide implementation code for evaluation and baselines. We also analyze the results and methods of the top submissions to the challenge. The dataset, baseline methods and evaluation code is publicly available and will be discussed at a dedicated CVPR'23 workshop.

Viaarxiv icon

HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer

May 05, 2023
Shuzhe Wang, Zakaria Laskar, Iaroslav Melekhov, Xiaotian Li, Yi Zhao, Giorgos Tolias, Juho Kannala

Figure 1 for HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer
Figure 2 for HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer
Figure 3 for HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer
Figure 4 for HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer

Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments. It sets a new state-of-the-art for single-image localization on the 7-Scenes, 12 Scenes, Cambridge Landmarks datasets, and the combined indoor scenes.

Viaarxiv icon

Self-Supervised Video Similarity Learning

Apr 06, 2023
Giorgos Kordopatis-Zilos, Giorgos Tolias, Christos Tzelepis, Ioannis Kompatsiaris, Ioannis Patras, Symeon Papadopoulos

Figure 1 for Self-Supervised Video Similarity Learning
Figure 2 for Self-Supervised Video Similarity Learning
Figure 3 for Self-Supervised Video Similarity Learning
Figure 4 for Self-Supervised Video Similarity Learning

We introduce S$^2$VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning. Here, in contrast to prior work, SSL is used to perform video similarity learning and address multiple retrieval and detection tasks at once with no use of labeled data. This is achieved by learning via instance-discrimination with task-tailored augmentations and the widely used InfoNCE loss together with an additional loss operating jointly on self-similarity and hard-negative similarity. We benchmark our method on tasks where video relevance is defined with varying granularity, ranging from video copies to videos depicting the same incident or event. We learn a single universal model that achieves state-of-the-art performance on all tasks, surpassing previously proposed methods that use labeled data. The code and pretrained models are publicly available at: \url{https://github.com/gkordo/s2vs}

Viaarxiv icon

Rethinking matching-based few-shot action recognition

Mar 28, 2023
Juliette Bertrand, Yannis Kalantidis, Giorgos Tolias

Figure 1 for Rethinking matching-based few-shot action recognition
Figure 2 for Rethinking matching-based few-shot action recognition
Figure 3 for Rethinking matching-based few-shot action recognition
Figure 4 for Rethinking matching-based few-shot action recognition

Few-shot action recognition, i.e. recognizing new action classes given only a few examples, benefits from incorporating temporal information. Prior work either encodes such information in the representation itself and learns classifiers at test time, or obtains frame-level features and performs pairwise temporal matching. We first evaluate a number of matching-based approaches using features from spatio-temporal backbones, a comparison missing from the literature, and show that the gap in performance between simple baselines and more complicated methods is significantly reduced. Inspired by this, we propose Chamfer++, a non-temporal matching function that achieves state-of-the-art results in few-shot action recognition. We show that, when starting from temporal features, our parameter-free and interpretable approach can outperform all other matching-based and classifier methods for one-shot action recognition on three common datasets without using temporal information in the matching stage. Project page: https://jbertrand89.github.io/matching-based-fsar

* Accepted at SCIA 2023 
Viaarxiv icon

Large-to-small Image Resolution Asymmetry in Deep Metric Learning

Oct 11, 2022
Pavel Suma, Giorgos Tolias

Figure 1 for Large-to-small Image Resolution Asymmetry in Deep Metric Learning
Figure 2 for Large-to-small Image Resolution Asymmetry in Deep Metric Learning
Figure 3 for Large-to-small Image Resolution Asymmetry in Deep Metric Learning
Figure 4 for Large-to-small Image Resolution Asymmetry in Deep Metric Learning

Deep metric learning for vision is trained by optimizing a representation network to map (non-)matching image pairs to (non-)similar representations. During testing, which typically corresponds to image retrieval, both database and query examples are processed by the same network to obtain the representation used for similarity estimation and ranking. In this work, we explore an asymmetric setup by light-weight processing of the query at a small image resolution to enable fast representation extraction. The goal is to obtain a network for database examples that is trained to operate on large resolution images and benefits from fine-grained image details, and a second network for query examples that operates on small resolution images but preserves a representation space aligned with that of the database network. We achieve this with a distillation approach that transfers knowledge from a fixed teacher network to a student via a loss that operates per image and solely relies on coupled augmentations without the use of any labels. In contrast to prior work that explores such asymmetry from the point of view of different network architectures, this work uses the same architecture but modifies the image resolution. We conclude that resolution asymmetry is a better way to optimize the performance/efficiency trade-off than architecture asymmetry. Evaluation is performed on three standard deep metric learning benchmarks, namely CUB200, Cars196, and SOP. Code: https://github.com/pavelsuma/raml

* WACV 2023 
Viaarxiv icon

Edge Augmentation for Large-Scale Sketch Recognition without Sketches

Feb 26, 2022
Nikos Efthymiadis, Giorgos Tolias, Ondrej Chum

Figure 1 for Edge Augmentation for Large-Scale Sketch Recognition without Sketches
Figure 2 for Edge Augmentation for Large-Scale Sketch Recognition without Sketches
Figure 3 for Edge Augmentation for Large-Scale Sketch Recognition without Sketches
Figure 4 for Edge Augmentation for Large-Scale Sketch Recognition without Sketches

This work addresses scaling up the sketch classification task into a large number of categories. Collecting sketches for training is a slow and tedious process that has so far precluded any attempts to large-scale sketch recognition. We overcome the lack of training sketch data by exploiting labeled collections of natural images that are easier to obtain. To bridge the domain gap we present a novel augmentation technique that is tailored to the task of learning sketch recognition from a training set of natural images. Randomization is introduced in the parameters of edge detection and edge selection. Natural images are translated to a pseudo-novel domain called "randomized Binary Thin Edges" (rBTE), which is used as a training domain instead of natural images. The ability to scale up is demonstrated by training CNN-based sketch recognition of more than 2.5 times larger number of categories than used previously. For this purpose, a dataset of natural images from 874 categories is constructed by combining a number of popular computer vision datasets. The categories are selected to be suitable for sketch recognition. To estimate the performance, a subset of 393 categories with sketches is also collected.

Viaarxiv icon

Results and findings of the 2021 Image Similarity Challenge

Feb 08, 2022
Zoë Papakipos, Giorgos Tolias, Tomas Jenicek, Ed Pizzi, Shuhei Yokoo, Wenhao Wang, Yifan Sun, Weipu Zhang, Yi Yang, Sanjay Addicam, Sergio Manuel Papadakis, Cristian Canton Ferrer, Ondrej Chum, Matthijs Douze

Figure 1 for Results and findings of the 2021 Image Similarity Challenge
Figure 2 for Results and findings of the 2021 Image Similarity Challenge
Figure 3 for Results and findings of the 2021 Image Similarity Challenge
Figure 4 for Results and findings of the 2021 Image Similarity Challenge

The 2021 Image Similarity Challenge introduced a dataset to serve as a new benchmark to evaluate recent image copy detection methods. There were 200 participants to the competition. This paper presents a quantitative and qualitative analysis of the top submissions. It appears that the most difficult image transformations involve either severe image crops or hiding into unrelated images, combined with local pixel perturbations. The key algorithmic elements in the winning submissions are: training on strong augmentations, self-supervised learning, score normalization, explicit overlay detection, and global descriptor matching followed by pairwise image comparison.

Viaarxiv icon

The Met Dataset: Instance-level Recognition for Artworks

Feb 03, 2022
Nikolaos-Antonios Ypsilantis, Noa Garcia, Guangxing Han, Sarah Ibrahimi, Nanne Van Noord, Giorgos Tolias

Figure 1 for The Met Dataset: Instance-level Recognition for Artworks
Figure 2 for The Met Dataset: Instance-level Recognition for Artworks
Figure 3 for The Met Dataset: Instance-level Recognition for Artworks
Figure 4 for The Met Dataset: Instance-level Recognition for Artworks

This work introduces a dataset for large-scale instance-level recognition in the domain of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes. We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhibit with photos taken under studio conditions. Testing is primarily performed on photos taken by museum guests depicting exhibits, which introduces a distribution shift between training and testing. Testing is additionally performed on a set of images not related to Met exhibits making the task resemble an out-of-distribution detection problem. The proposed benchmark follows the paradigm of other recent datasets for instance-level recognition on different domains to encourage research on domain independent approaches. A number of suitable approaches are evaluated to offer a testbed for future comparisons. Self-supervised and supervised contrastive learning are effectively combined to train the backbone which is used for non-parametric classification that is shown as a promising direction. Dataset webpage: http://cmp.felk.cvut.cz/met/

Viaarxiv icon

Recall@k Surrogate Loss with Large Batches and Similarity Mixup

Aug 25, 2021
Yash Patel, Giorgos Tolias, Jiri Matas

Figure 1 for Recall@k Surrogate Loss with Large Batches and Similarity Mixup
Figure 2 for Recall@k Surrogate Loss with Large Batches and Similarity Mixup
Figure 3 for Recall@k Surrogate Loss with Large Batches and Similarity Mixup
Figure 4 for Recall@k Surrogate Loss with Large Batches and Similarity Mixup

Direct optimization, by gradient descent, of an evaluation metric, is not possible when it is non-differentiable, which is the case for recall in retrieval. In this work, a differentiable surrogate loss for the recall is proposed. Using an implementation that sidesteps the hardware constraints of the GPU memory, the method trains with a very large batch size, which is essential for metrics computed on the entire retrieval database. It is assisted by an efficient mixup approach that operates on pairwise scalar similarities and virtually increases the batch size further. When used for deep metric learning, the proposed method achieves state-of-the-art results in several image retrieval benchmarks. For instance-level recognition, the method outperforms similar approaches that train using an approximation of average precision. The implementation will be made public.

Viaarxiv icon