Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vincent Fortuin

ProSpero: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods

May 28, 2025

Michal Kmicikiewicz, Vincent Fortuin, Ewa Szczurek

Abstract:Designing protein sequences of both high fitness and novelty is a challenging task in data-efficient protein engineering. Exploration beyond wild-type neighborhoods often leads to biologically implausible sequences or relies on surrogate models that lose fidelity in novel regions. Here, we propose ProSpero, an active learning framework in which a frozen pre-trained generative model is guided by a surrogate updated from oracle feedback. By integrating fitness-relevant residue selection with biologically-constrained Sequential Monte Carlo sampling, our approach enables exploration beyond wild-type neighborhoods while preserving biological plausibility. We show that our framework remains effective even when the surrogate is misspecified. ProSpero consistently outperforms or matches existing methods across diverse protein engineering tasks, retrieving sequences of both high fitness and novelty.

Via

Access Paper or Ask Questions

Sparse Gaussian Neural Processes

Apr 02, 2025

Tommy Rochussen, Vincent Fortuin

Abstract:Despite significant recent advances in probabilistic meta-learning, it is common for practitioners to avoid using deep learning models due to a comparative lack of interpretability. Instead, many practitioners simply use non-meta-models such as Gaussian processes with interpretable priors, and conduct the tedious procedure of training their model from scratch for each task they encounter. While this is justifiable for tasks with a limited number of data points, the cubic computational cost of exact Gaussian process inference renders this prohibitive when each task has many observations. To remedy this, we introduce a family of models that meta-learn sparse Gaussian process inference. Not only does this enable rapid prediction on new tasks with sparse Gaussian processes, but since our models have clear interpretations as members of the neural process family, it also allows manual elicitation of priors in a neural process for the first time. In meta-learning regimes for which the number of observed tasks is small or for which expert domain knowledge is available, this offers a crucial advantage.

* Proceedings of the 7th Symposium on Advances in Approximate Bayesian Inference, PMLR, 2025. 25 pages, 6 figures, 5 tables

Via

Access Paper or Ask Questions

Can Transformers Learn Full Bayesian Inference in Context?

Jan 28, 2025

Arik Reuter, Tim G. J. Rudner, Vincent Fortuin, David Rügamer

Abstract:Transformers have emerged as the dominant architecture in the field of deep learning, with a broad range of applications and remarkable in-context learning (ICL) capabilities. While not yet fully understood, ICL has already proved to be an intriguing phenomenon, allowing transformers to learn in context -- without requiring further training. In this paper, we further advance the understanding of ICL by demonstrating that transformers can perform full Bayesian inference for commonly used statistical models in context. More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows which enables us to infer complex posterior distributions for methods such as generalized linear models and latent factor models. Extensive experiments on real-world datasets demonstrate that our ICL approach yields posterior samples that are similar in quality to state-of-the-art MCMC or variational inference methods not operating in context.

Via

Access Paper or Ask Questions

OneProt: Towards Multi-Modal Protein Foundation Models

Nov 07, 2024

Klemens Flöge, Srisruthi Udayakumar, Johanna Sommer, Marie Piraud, Stefan Kesselheim, Vincent Fortuin, Stephan Günneman, Karel J van der Weg, Holger Gohlke, Alina Bazarova(+1 more)

Figure 1 for OneProt: Towards Multi-Modal Protein Foundation Models

Figure 2 for OneProt: Towards Multi-Modal Protein Foundation Models

Figure 3 for OneProt: Towards Multi-Modal Protein Foundation Models

Figure 4 for OneProt: Towards Multi-Modal Protein Foundation Models

Abstract:Recent AI advances have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal AI for proteins that integrates structural, sequence, alignment, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of modality encoders along protein sequences. It demonstrates strong performance in retrieval tasks and surpasses state-of-the-art methods in various downstream tasks, including metal ion binding classification, gene-ontology annotation, and enzyme function prediction. This work expands multi-modal capabilities in protein models, paving the way for applications in drug discovery, biocatalytic reaction planning, and protein engineering.

* 28 pages, 15 figures, 7 tables

Via

Access Paper or Ask Questions

Stein Variational Newton Neural Network Ensembles

Nov 04, 2024

Klemens Flöge, Mohammed Abdul Moeed, Vincent Fortuin

Figure 1 for Stein Variational Newton Neural Network Ensembles

Figure 2 for Stein Variational Newton Neural Network Ensembles

Figure 3 for Stein Variational Newton Neural Network Ensembles

Figure 4 for Stein Variational Newton Neural Network Ensembles

Abstract:Deep neural network ensembles are powerful tools for uncertainty quantification, which have recently been re-interpreted from a Bayesian perspective. However, current methods inadequately leverage second-order information of the loss landscape, despite the recent availability of efficient Hessian approximations. We propose a novel approximate Bayesian inference method that modifies deep ensembles to incorporate Stein Variational Newton updates. Our approach uniquely integrates scalable modern Hessian approximations, achieving faster convergence and more accurate posterior distribution approximations. We validate the effectiveness of our method on diverse regression and classification tasks, demonstrating superior performance with a significantly reduced number of training epochs compared to existing ensemble-based methods, while enhancing uncertainty quantification and robustness against overfitting.

* ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, 27 pages, 14 figures

Via

Access Paper or Ask Questions

Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation

Sep 25, 2024

Richard D. Paul, Alessio Quercia, Vincent Fortuin, Katharina Nöh, Hanno Scharr

Figure 1 for Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation

Figure 2 for Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation

Figure 3 for Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation

Figure 4 for Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation

Abstract:State-of-the-art computer vision tasks, like monocular depth estimation (MDE), rely heavily on large, modern Transformer-based architectures. However, their application in safety-critical domains demands reliable predictive performance and uncertainty quantification. While Bayesian neural networks provide a conceptually simple approach to serve those requirements, they suffer from the high dimensionality of the parameter space. Parameter-efficient fine-tuning (PEFT) methods, in particular low-rank adaptations (LoRA), have emerged as a popular strategy for adapting large-scale models to down-stream tasks by performing parameter inference on lower-dimensional subspaces. In this work, we investigate the suitability of PEFT methods for subspace Bayesian inference in large-scale Transformer-based vision models. We show that, indeed, combining BitFit, DiffFit, LoRA, and CoLoRA, a novel LoRA-inspired PEFT method, with Bayesian inference enables more robust and reliable predictive performance in MDE.

* Presented at UnCV Workshop at ECCV'24

Via

Access Paper or Ask Questions

FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Jul 18, 2024

Tristan Cinquin, Marvin Pförtner, Vincent Fortuin, Philipp Hennig, Robert Bamler

Figure 1 for FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Figure 2 for FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Figure 3 for FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Figure 4 for FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Abstract:Laplace approximations are popular techniques for endowing deep networks with epistemic uncertainty estimates as they can be applied without altering the predictions of the neural network, and they scale to large models and datasets. While the choice of prior strongly affects the resulting posterior distribution, computational tractability and lack of interpretability of weight space typically limit the Laplace approximation to isotropic Gaussian priors, which are known to cause pathological behavior as depth increases. As a remedy, we directly place a prior on function space. More precisely, since Lebesgue densities do not exist on infinite-dimensional function spaces, we have to recast training as finding the so-called weak mode of the posterior measure under a Gaussian process (GP) prior restricted to the space of functions representable by the neural network. Through the GP prior, one can express structured and interpretable inductive biases, such as regularity or periodicity, directly in function space, while still exploiting the implicit inductive biases that allow deep networks to generalize. After model linearization, the training objective induces a negative log-posterior density to which we apply a Laplace approximation, leveraging highly scalable methods from matrix-free linear algebra. Our method provides improved results where prior knowledge is abundant, e.g., in many scientific inference tasks. At the same time, it stays competitive for black-box regression and classification tasks where neural networks typically excel.

Via

Access Paper or Ask Questions

Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information

Jul 18, 2024

Fedor Sergeev, Paola Malsot, Gunnar Rätsch, Vincent Fortuin

Figure 1 for Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information

Figure 2 for Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information

Figure 3 for Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information

Figure 4 for Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information

Abstract:Knowing which features of a multivariate time series to measure and when is a key task in medicine, wearables, and robotics. Better acquisition policies can reduce costs while maintaining or even improving the performance of downstream predictors. Inspired by the maximization of conditional mutual information, we propose an approach to train acquirers end-to-end using only the downstream loss. We show that our method outperforms random acquisition policy, matches a model with an unrestrained budget, but does not yet overtake a static acquisition strategy. We highlight the assumptions and outline avenues for future work.

* Presented at the ICML 2024 Next Generation of Sequence Modeling Architectures (NGSM) Workshop

Via

Access Paper or Ask Questions

How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Jun 10, 2024

Agustinus Kristiadi, Felix Strieth-Kalthoff, Sriram Ganapathi Subramanian, Vincent Fortuin, Pascal Poupart, Geoff Pleiss

Figure 1 for How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Figure 2 for How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Figure 3 for How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Figure 4 for How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Abstract:Bayesian optimization (BO) is an integral part of automated scientific discovery -- the so-called self-driving lab -- where human inputs are ideally minimal or at least non-blocking. However, scientists often have strong intuition, and thus human feedback is still useful. Nevertheless, prior works in enhancing BO with expert feedback, such as by incorporating it in an offline or online but blocking (arrives at each BO iteration) manner, are incompatible with the spirit of self-driving labs. In this work, we study whether a small amount of randomly arriving expert feedback that is being incorporated in a non-blocking manner can improve a BO campaign. To this end, we run an additional, independent computing thread on top of the BO loop to handle the feedback-gathering process. The gathered feedback is used to learn a Bayesian preference model that can readily be incorporated into the BO thread, to steer its exploration-exploitation process. Experiments on toy and chemistry datasets suggest that even just a few intermittent, asynchronous expert feedback can be useful for improving or constraining BO. This can especially be useful for its implication in improving self-driving labs, e.g. making them more data-efficient and less costly.

* AABI 2024. Code: https://github.com/wiseodd/bo-async-feedback

Via

Access Paper or Ask Questions

Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

May 06, 2024

Emre Onal, Klemens Flöge, Emma Caldwell, Arsen Sheverdin, Vincent Fortuin

Abstract:Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration, particularly when fine-tuned on small datasets. To address these challenges, we propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian Stochastic Weight Averaging (SWAG), facilitating approximate Bayesian inference in LLMs. Through extensive testing across several Natural Language Processing (NLP) benchmarks, we demonstrate that our straightforward and computationally efficient approach improves model generalization and calibration. We further show that our method exhibits greater robustness against distribution shift, as reflected in its performance on out-of-distribution tasks.

* 14 pages, 1 figure, 2 tables

Via

Access Paper or Ask Questions