Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bernt Schiele

A Meta-Learning Approach to Predicting Performance and Data Requirements

Mar 02, 2023
Achin Jain, Gurumurthy Swaminathan, Paolo Favaro, Hao Yang, Avinash Ravichandran, Hrayr Harutyunyan, Alessandro Achille, Onkar Dabeer, Bernt Schiele, Ashwin Swaminathan, Stefano Soatto

Figure 1 for A Meta-Learning Approach to Predicting Performance and Data Requirements

Figure 2 for A Meta-Learning Approach to Predicting Performance and Data Requirements

Figure 3 for A Meta-Learning Approach to Predicting Performance and Data Requirements

Figure 4 for A Meta-Learning Approach to Predicting Performance and Data Requirements

We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset (e.g., 5 samples per class) for extrapolation. This is because the log-performance error against the log-dataset size follows a nonlinear progression in the few-shot regime followed by a linear progression in the high-shot regime. We introduce a novel piecewise power law (PPL) that handles the two data regimes differently. To estimate the parameters of the PPL, we introduce a random forest regressor trained via meta learning that generalizes across classification/detection tasks, ResNet/ViT based architectures, and random/pre-trained initializations. The PPL improves the performance estimation on average by 37% across 16 classification and 33% across 10 detection datasets, compared to the power law. We further extend the PPL to provide a confidence bound and use it to limit the prediction horizon that reduces over-estimation of data by 76% on classification and 91% on detection datasets.

* CVPR 2023

Via

Access Paper or Ask Questions

No One Left Behind: Real-World Federated Class-Incremental Learning

Feb 02, 2023
Jiahua Dong, Yang Cong, Gan Sun, Yulun Zhang, Bernt Schiele, Dengxin Dai

Figure 1 for No One Left Behind: Real-World Federated Class-Incremental Learning

Figure 2 for No One Left Behind: Real-World Federated Class-Incremental Learning

Figure 3 for No One Left Behind: Real-World Federated Class-Incremental Learning

Figure 4 for No One Left Behind: Real-World Federated Class-Incremental Learning

Federated learning (FL) is a hot collaborative training framework via aggregating model parameters of decentralized local clients. However, most existing models unreasonably assume that data categories of FL framework are known and fxed in advance. It renders the global model to signifcantly degrade recognition performance on old categories (i.e., catastrophic forgetting), when local clients receive new categories consecutively under limited memory of storing old categories. Moreover, some new local clients that collect novel categories unseen by other clients may be introduced to the FL training irregularly, which further exacerbates the catastrophic forgetting on old categories. To tackle the above issues, we propose a novel Local-Global Anti-forgetting (LGA) model to address local and global catastrophic forgetting on old categories, which is a pioneering work to explore a global class-incremental model in the FL feld. Specifcally, considering tackling class imbalance of local client to surmount local forgetting, we develop a category-balanced gradient-adaptive compensation loss and a category gradient-induced semantic distillation loss. They can balance heterogeneous forgetting speeds of hard-to-forget and easy-to-forget old categories, while ensure intrinsic class relations consistency within different incremental tasks. Moreover, a proxy server is designed to tackle global forgetting caused by Non-IID class imbalance between different clients. It collects perturbed prototype images of new categories from local clients via prototype gradient communication under privacy preservation, and augments them via self-supervised prototype augmentation to choose the best old global model and improve local distillation gain. Experiments on representative datasets verify superior performance of our model against other comparison methods.

* 16 pages, 9 figures

Via

Access Paper or Ask Questions

SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning

Jan 26, 2023
Hao Chen, Ran Tao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Bhiksha Raj, Marios Savvides

Figure 1 for SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning

Figure 2 for SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning

Figure 3 for SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning

Figure 4 for SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning

The critical challenge of Semi-Supervised Learning (SSL) is how to effectively leverage the limited labeled data and massive unlabeled data to improve the model's generalization performance. In this paper, we first revisit the popular pseudo-labeling methods via a unified sample weighting formulation and demonstrate the inherent quantity-quality trade-off problem of pseudo-labeling with thresholding, which may prohibit learning. To this end, we propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training, effectively exploiting the unlabeled data. We derive a truncated Gaussian function to weight samples based on their confidence, which can be viewed as a soft version of the confidence threshold. We further enhance the utilization of weakly-learned classes by proposing a uniform alignment approach. In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.

* ICLR 2023

Via

Access Paper or Ask Questions

Holistically Explainable Vision Transformers

Jan 20, 2023
Moritz Böhle, Mario Fritz, Bernt Schiele

Figure 1 for Holistically Explainable Vision Transformers

Figure 2 for Holistically Explainable Vision Transformers

Figure 3 for Holistically Explainable Vision Transformers

Figure 4 for Holistically Explainable Vision Transformers

Transformers increasingly dominate the machine learning landscape across many tasks and domains, which increases the importance for understanding their outputs. While their attention modules provide partial insight into their inner workings, the attention scores have been shown to be insufficient for explaining the models as a whole. To address this, we propose B-cos transformers, which inherently provide holistic explanations for their decisions. Specifically, we formulate each model component - such as the multi-layer perceptrons, attention layers, and the tokenisation module - to be dynamic linear, which allows us to faithfully summarise the entire transformer via a single linear transform. We apply our proposed design to Vision Transformers (ViTs) and show that the resulting models, dubbed Bcos-ViTs, are highly interpretable and perform competitively to baseline ViTs on ImageNet. Code will be made available soon.

Via

Access Paper or Ask Questions

Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences

Jan 20, 2023
Xudong Hong, Asad Sayeed, Khushboo Mehra, Vera Demberg, Bernt Schiele

Figure 1 for Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences

Figure 2 for Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences

Figure 3 for Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences

Figure 4 for Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences

Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them. We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP). VWP contains almost 2K selected sequences of movie shots, each including 5-10 images. The image sequences are aligned with a total of 12K stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence. Our new image sequence collection and filtering process has allowed us to obtain stories that are more coherent and have more narrativity compared to previous work. We also propose a character-based story generation model driven by coherence as a strong baseline. Evaluations show that our generated stories are more coherent, visually grounded, and have more narrativity than stories generated with the current state-of-the-art model.

* Paper accepted by Transactions of the Association for Computational Linguistics (TACL). This is a pre-MIT Press publication version. 15 pages, 6 figures

Via

Access Paper or Ask Questions

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Jan 15, 2023
Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernt Schiele, Liwei Wang

Figure 1 for DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Figure 2 for DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Figure 3 for DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Figure 4 for DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Designing an efficient yet deployment-friendly 3D backbone to handle sparse point clouds is a fundamental problem in 3D object detection. Compared with the customized sparse convolution, the attention mechanism in Transformers is more appropriate for flexibly modeling long-range relationships and is easier to be deployed in real-world applications. However, due to the sparse characteristics of point clouds, it is non-trivial to apply a standard transformer on sparse points. In this paper, we present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D object detection. In order to efficiently process sparse points in parallel, we propose Dynamic Sparse Window Attention, which partitions a series of local regions in each window according to its sparsity and then computes the features of all regions in a fully parallel manner. To allow the cross-set connection, we design a rotated set partitioning strategy that alternates between two partitioning configurations in consecutive self-attention layers. To support effective downsampling and better encode geometric information, we also propose an attention-style 3D pooling module on sparse points, which is powerful and deployment-friendly without utilizing any customized CUDA operations. Our model achieves state-of-the-art performance on large-scale Waymo Open Dataset with remarkable gains. More importantly, DSVT can be easily deployed by TensorRT with real-time inference speed (27Hz). Code will be available at \url{https://github.com/Haiyang-W/DSVT}.

Via

Access Paper or Ask Questions

RMM: Reinforced Memory Management for Class-Incremental Learning

Jan 14, 2023
Yaoyao Liu, Bernt Schiele, Qianru Sun

Figure 1 for RMM: Reinforced Memory Management for Class-Incremental Learning

Figure 2 for RMM: Reinforced Memory Management for Class-Incremental Learning

Figure 3 for RMM: Reinforced Memory Management for Class-Incremental Learning

Figure 4 for RMM: Reinforced Memory Management for Class-Incremental Learning

Class-Incremental Learning (CIL) [40] trains classifiers under a strict memory budget: in each incremental phase, learning is done for new data, most of which is abandoned to free space for the next phase. The preserved data are exemplars used for replaying. However, existing methods use a static and ad hoc strategy for memory allocation, which is often sub-optimal. In this work, we propose a dynamic memory management strategy that is optimized for the incremental phases and different object classes. We call our method reinforced memory management (RMM), leveraging reinforcement learning. RMM training is not naturally compatible with CIL as the past, and future data are strictly non-accessible during the incremental phases. We solve this by training the policy function of RMM on pseudo CIL tasks, e.g., the tasks built on the data of the 0-th phase, and then applying it to target tasks. RMM propagates two levels of actions: Level-1 determines how to split the memory between old and new classes, and Level-2 allocates memory for each specific class. In essence, it is an optimizable and general method for memory management that can be used in any replaying-based CIL method. For evaluation, we plug RMM into two top-performing baselines (LUCIR+AANets and POD+AANets [30]) and conduct experiments on three benchmarks (CIFAR-100, ImageNet-Subset, and ImageNet-Full). Our results show clear improvements, e.g., boosting POD+AANets by 3.6%, 4.4%, and 1.9% in the 25-Phase settings of the above benchmarks, respectively.

* NeurIPS 2021

Via

Access Paper or Ask Questions

Online Hyperparameter Optimization for Class-Incremental Learning

Jan 11, 2023
Yaoyao Liu, Yingying Li, Bernt Schiele, Qianru Sun

Figure 1 for Online Hyperparameter Optimization for Class-Incremental Learning

Figure 2 for Online Hyperparameter Optimization for Class-Incremental Learning

Figure 3 for Online Hyperparameter Optimization for Class-Incremental Learning

Figure 4 for Online Hyperparameter Optimization for Class-Incremental Learning

Class-incremental learning (CIL) aims to train a classification model while the number of classes increases phase-by-phase. An inherent challenge of CIL is the stability-plasticity tradeoff, i.e., CIL models should keep stable to retain old knowledge and keep plastic to absorb new knowledge. However, none of the existing CIL models can achieve the optimal tradeoff in different data-receiving settings--where typically the training-from-half (TFH) setting needs more stability, but the training-from-scratch (TFS) needs more plasticity. To this end, we design an online learning method that can adaptively optimize the tradeoff without knowing the setting as a priori. Specifically, we first introduce the key hyperparameters that influence the trade-off, e.g., knowledge distillation (KD) loss weights, learning rates, and classifier types. Then, we formulate the hyperparameter optimization process as an online Markov Decision Process (MDP) problem and propose a specific algorithm to solve it. We apply local estimated rewards and a classic bandit algorithm Exp3 [4] to address the issues when applying online MDP methods to the CIL protocol. Our method consistently improves top-performing CIL methods in both TFH and TFS settings, e.g., boosting the average accuracy of TFH and TFS by 2.2 percentage points on ImageNet-Full, compared to the state-of-the-art [23].

* AAAI 2023 Oral. Code is available at https://class-il.mpi-inf.mpg.de/online/code/

Via

Access Paper or Ask Questions

Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

Jan 05, 2023
Nina Shvetsova, Felix Petersen, Anna Kukleva, Bernt Schiele, Hilde Kuehne

Figure 1 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

Figure 2 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

Figure 3 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

Figure 4 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

Contrastive learning has become a prominent ingredient in learning representations from unlabeled data. However, existing methods primarily consider pairwise relations. This paper proposes a new approach towards self-supervised contrastive learning based on Group Ordering Constraints (GroCo). The GroCo loss leverages the idea of comparing groups of positive and negative images instead of pairs of images. Building on the recent success of differentiable sorting algorithms, group ordering constraints enforce that the distances of all positive samples (a positive group) are smaller than the distances of all negative images (a negative group); thus, enforcing positive samples to gather around an anchor. This leads to a more holistic optimization of the local neighborhoods. We evaluate the proposed setting on a suite of competitive self-supervised learning benchmarks and show that our method is not only competitive to current methods in the case of linear probing but also leads to higher consistency in local representations, as can be seen from a significantly improved k-NN performance across all benchmarks.

Via

Access Paper or Ask Questions

Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

Dec 15, 2022
Anurag Das, Yongqin Xian, Yang He, Zeynep Akata, Bernt Schiele

Figure 1 for Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

Figure 2 for Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

Figure 3 for Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

Figure 4 for Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets. In this work, we show that coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models. Considering the urban scene segmentation scenario, we leverage cheap coarse annotations for real-world captured data, as well as synthetic data to train our model and show competitive performance compared with finely annotated real-world data. Specifically, we propose a coarse-to-fine self-training framework that generates pseudo labels for unlabeled regions of the coarsely annotated data, using synthetic data to improve predictions around the boundaries between semantic classes, and using cross-domain data augmentation to increase diversity. Our extensive experimental results on Cityscapes and BDD100k datasets demonstrate that our method achieves a significantly better performance vs annotation cost tradeoff, yielding a comparable performance to fully annotated data with only a small fraction of the annotation budget. Also, when used as pretraining, our framework performs better compared to the standard fully supervised setting.

* Accepted at WACV 2023

Via

Access Paper or Ask Questions