Alert button
Picture for Frank Hutter

Frank Hutter

Alert button

A General Framework for User-Guided Bayesian Optimization

Nov 24, 2023
Carl Hvarfner, Frank Hutter, Luigi Nardi

The optimization of expensive-to-evaluate black-box functions is prevalent in various scientific disciplines. Bayesian optimization is an automatic, general and sample-efficient method to solve these problems with minimal knowledge of the underlying function dynamics. However, the ability of Bayesian optimization to incorporate prior knowledge or beliefs about the function at hand in order to accelerate the optimization is limited, which reduces its appeal for knowledgeable practitioners with tight budgets. To allow domain experts to customize the optimization routine, we propose ColaBO, the first Bayesian-principled framework for incorporating prior beliefs beyond the typical kernel structure, such as the likely location of the optimizer or the optimal value. The generality of ColaBO makes it applicable across different Monte Carlo acquisition functions and types of user beliefs. We empirically demonstrate ColaBO's ability to substantially accelerate optimization when the prior information is accurate, and to retain approximately default performance when it is misleading.

* 18 pages, 11 figures 
Viaarxiv icon

New Horizons in Parameter Regularization: A Constraint Approach

Nov 15, 2023
Jörg K. H. Franke, Michael Hefenbrock, Gregor Koehler, Frank Hutter

This work presents constrained parameter regularization (CPR), an alternative to traditional weight decay. Instead of applying a constant penalty uniformly to all parameters, we enforce an upper bound on a statistical measure (e.g., the L$_2$-norm) of individual parameter groups. This reformulates learning as a constrained optimization problem. To solve this, we utilize an adaptation of the augmented Lagrangian method. Our approach allows for varying regularization strengths across different parameter groups, removing the need for explicit penalty coefficients in the regularization terms. CPR only requires two hyperparameters and introduces no measurable runtime overhead. We offer empirical evidence of CPR's effectiveness through experiments in the "grokking" phenomenon, image classification, and language modeling. Our findings show that CPR can counteract the effects of grokking, and it consistently matches or surpasses the performance of traditional weight decay.

Viaarxiv icon

Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks

Oct 31, 2023
Steven Adriaensen, Herilalaina Rakotoarison, Samuel Müller, Frank Hutter

Learning curve extrapolation aims to predict model performance in later epochs of training, based on the performance in earlier epochs. In this work, we argue that, while the inherent uncertainty in the extrapolation of learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of prior-data fitted neural networks (PFNs) in this context. A PFN is a transformer, pre-trained on data generated from a prior, to perform approximate Bayesian inference in a single forward pass. We propose LC-PFN, a PFN trained to extrapolate 10 million artificial right-censored learning curves generated from a parametric prior proposed in prior art using MCMC. We demonstrate that LC-PFN can approximate the posterior predictive distribution more accurately than MCMC, while being over 10 000 times faster. We also show that the same LC-PFN achieves competitive performance extrapolating a total of 20 000 real learning curves from four learning curve benchmarks (LCBench, NAS-Bench-201, Taskset, and PD1) that stem from training a wide range of model architectures (MLPs, CNNs, RNNs, and Transformers) on 53 different datasets with varying input modalities (tabular, image, text, and protein data). Finally, we investigate its potential in the context of model selection and find that a simple LC-PFN based predictive early stopping criterion obtains 2 - 6x speed-ups on 45 of these datasets, at virtually no overhead.

Viaarxiv icon

Managing AI Risks in an Era of Rapid Progress

Oct 26, 2023
Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

In this short consensus paper, we outline risks from upcoming, advanced AI systems. We examine large-scale social harms and malicious uses, as well as an irreversible loss of human control over autonomous AI systems. In light of rapid and continuing AI progress, we propose priorities for AI R&D and governance.

Viaarxiv icon

Hard View Selection for Contrastive Learning

Oct 05, 2023
Fabio Ferreira, Ivo Rapant, Frank Hutter

Figure 1 for Hard View Selection for Contrastive Learning
Figure 2 for Hard View Selection for Contrastive Learning
Figure 3 for Hard View Selection for Contrastive Learning
Figure 4 for Hard View Selection for Contrastive Learning

Many Contrastive Learning (CL) methods train their models to be invariant to different "views" of an image input for which a good data augmentation pipeline is crucial. While considerable efforts were directed towards improving pre-text tasks, architectures, or robustness (e.g., Siamese networks or teacher-softmax centering), the majority of these methods remain strongly reliant on the random sampling of operations within the image augmentation pipeline, such as the random resized crop or color distortion operation. In this paper, we argue that the role of the view generation and its effect on performance has so far received insufficient attention. To address this, we propose an easy, learning-free, yet powerful Hard View Selection (HVS) strategy designed to extend the random view generation to expose the pretrained model to harder samples during CL training. It encompasses the following iterative steps: 1) randomly sample multiple views and create pairs of two views, 2) run forward passes for each view pair on the currently trained model, 3) adversarially select the pair yielding the worst loss, and 4) run the backward pass with the selected pair. In our empirical analysis we show that under the hood, HVS increases task difficulty by controlling the Intersection over Union of views during pretraining. With only 300-epoch pretraining, HVS is able to closely rival the 800-epoch DINO baseline which remains very favorable even when factoring in the slowdown induced by the additional forwards of HVS. Additionally, HVS consistently achieves accuracy improvements on ImageNet between 0.55% and 1.9% on linear evaluation and similar improvements on transfer tasks across multiple CL methods, such as DINO, SimSiam, and SimCLR.

Viaarxiv icon

Towards Automated Design of Riboswitches

Jul 17, 2023
Frederic Runge, Jörg K. H. Franke, Frank Hutter

Figure 1 for Towards Automated Design of Riboswitches
Figure 2 for Towards Automated Design of Riboswitches
Figure 3 for Towards Automated Design of Riboswitches
Figure 4 for Towards Automated Design of Riboswitches

Experimental screening and selection pipelines for the discovery of novel riboswitches are expensive, time-consuming, and inefficient. Using computational methods to reduce the number of candidates for the screen could drastically decrease these costs. However, existing computational approaches do not fully satisfy all requirements for the design of such initial screening libraries. In this work, we present a new method, libLEARNA, capable of providing RNA focus libraries of diverse variable-length qualified candidates. Our novel structure-based design approach considers global properties as well as desired sequence and structure features. We demonstrate the benefits of our method by designing theophylline riboswitch libraries, following a previously published protocol, and yielding 30% more unique high-quality candidates.

* 9 pages, Accepted at the 2023 ICML Workshop on Computational Biology 
Viaarxiv icon

Scalable Deep Learning for RNA Secondary Structure Prediction

Jul 14, 2023
Jörg K. H. Franke, Frederic Runge, Frank Hutter

Figure 1 for Scalable Deep Learning for RNA Secondary Structure Prediction
Figure 2 for Scalable Deep Learning for RNA Secondary Structure Prediction
Figure 3 for Scalable Deep Learning for RNA Secondary Structure Prediction
Figure 4 for Scalable Deep Learning for RNA Secondary Structure Prediction

The field of RNA secondary structure prediction has made significant progress with the adoption of deep learning techniques. In this work, we present the RNAformer, a lean deep learning model using axial attention and recycling in the latent space. We gain performance improvements by designing the architecture for modeling the adjacency matrix directly in the latent space and by scaling the size of the model. Our approach achieves state-of-the-art performance on the popular TS0 benchmark dataset and even outperforms methods that use external information. Further, we show experimentally that the RNAformer can learn a biophysical model of the RNA folding process.

* Accepted at the 2023 ICML Workshop on Computational Biology. Honolulu, Hawaii, USA, 2023 
Viaarxiv icon

PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning

Jun 21, 2023
Neeratyoy Mallik, Edward Bergman, Carl Hvarfner, Danny Stoll, Maciej Janowski, Marius Lindauer, Luigi Nardi, Frank Hutter

Figure 1 for PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning
Figure 2 for PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning
Figure 3 for PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning
Figure 4 for PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning

Hyperparameters of Deep Learning (DL) pipelines are crucial for their downstream performance. While a large number of methods for Hyperparameter Optimization (HPO) have been developed, their incurred costs are often untenable for modern DL. Consequently, manual experimentation is still the most prevalent approach to optimize hyperparameters, relying on the researcher's intuition, domain knowledge, and cheap preliminary explorations. To resolve this misalignment between HPO algorithms and DL researchers, we propose PriorBand, an HPO algorithm tailored to DL, able to utilize both expert beliefs and cheap proxy tasks. Empirically, we demonstrate PriorBand's efficiency across a range of DL benchmarks and show its gains under informative expert input and robustness against poor expert beliefs

Viaarxiv icon

Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How

Jun 11, 2023
Sebastian Pineda Arango, Fabio Ferreira, Arlind Kadra, Frank Hutter, Josif Grabocka

Figure 1 for Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How
Figure 2 for Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How
Figure 3 for Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How
Figure 4 for Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How

With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset. In this paper, we propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it. Our method transfers knowledge about the performance of many pretrained models with multiple hyperparameter configurations on a series of datasets. To this aim, we evaluated over 20k hyperparameter configurations for finetuning 24 pretrained image classification models on 87 datasets to generate a large-scale meta-dataset. We meta-learn a multi-fidelity performance predictor on the learning curves of this meta-dataset and use it for fast hyperparameter optimization on new datasets. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset together with its optimal hyperparameters.

Viaarxiv icon

PFNs4BO: In-Context Learning for Bayesian Optimization

Jun 09, 2023
Samuel Müller, Matthias Feurer, Noah Hollmann, Frank Hutter

Figure 1 for PFNs4BO: In-Context Learning for Bayesian Optimization
Figure 2 for PFNs4BO: In-Context Learning for Bayesian Optimization
Figure 3 for PFNs4BO: In-Context Learning for Bayesian Optimization
Figure 4 for PFNs4BO: In-Context Learning for Bayesian Optimization

In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible surrogate for Bayesian Optimization (BO). PFNs are neural processes that are trained to approximate the posterior predictive distribution (PPD) through in-context learning on any prior distribution that can be efficiently sampled from. We describe how this flexibility can be exploited for surrogate modeling in BO. We use PFNs to mimic a naive Gaussian process (GP), an advanced GP, and a Bayesian Neural Network (BNN). In addition, we show how to incorporate further information into the prior, such as allowing hints about the position of optima (user priors), ignoring irrelevant dimensions, and performing non-myopic BO by learning the acquisition function. The flexibility underlying these extensions opens up vast possibilities for using PFNs for BO. We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. We publish code alongside trained models at https://github.com/automl/PFNs4BO.

* Accepted at ICML 2023 
Viaarxiv icon