Alert button
Picture for Samuel Kaski

Samuel Kaski

Alert button

Estimating treatment effects from single-arm trials via latent-variable modeling

Nov 06, 2023
Manuel Haussmann, Tran Minh Son Le, Viivi Halla-aho, Samu Kurki, Jussi Leinonen, Miika Koskinen, Samuel Kaski, Harri Lähdesmäki

Randomized controlled trials (RCTs) are the accepted standard for treatment effect estimation but they can be infeasible due to ethical reasons and prohibitive costs. Single-arm trials, where all patients belong to the treatment group, can be a viable alternative but require access to an external control group. We propose an identifiable deep latent-variable model for this scenario that can also account for missing covariate observations by modeling their structured missingness patterns. Our method uses amortized variational inference to learn both group-specific and identifiable shared latent representations, which can subsequently be used for (i) patient matching if treatment outcomes are not available for the treatment group, or for (ii) direct treatment effect estimation assuming outcomes are available for both groups. We evaluate the model on a public benchmark as well as on a data set consisting of a published RCT study and real-world electronic health records. Compared to previous methods, our results show improved performance both for direct treatment effect estimation as well as for effect estimation via patient matching.

Viaarxiv icon

The Fundamental Dilemma of Bayesian Active Meta-learning

Oct 23, 2023
Sabina J. Sloman, Ayush Bharti, Samuel Kaski

Many applications involve estimation of parameters that generalize across multiple diverse, but related, data-scarce task environments. Bayesian active meta-learning, a form of sequential optimal experimental design, provides a framework for solving such problems. The active meta-learner's goal is to gain transferable knowledge (estimate the transferable parameters) in the presence of idiosyncratic characteristics of the current task (task-specific parameters). We show that in such a setting, greedy pursuit of this goal can actually hurt estimation of the transferable parameters (induce so-called negative transfer). The learner faces a dilemma akin to but distinct from the exploration--exploitation dilemma: should they spend their acquisition budget pursuing transferable knowledge, or identifying the current task-specific parameters? We show theoretically that some tasks pose an inevitable and arbitrarily large threat of negative transfer, and that task identification is critical to reducing this threat. Our results generalize to analysis of prior misspecification over nuisance parameters. Finally, we empirically illustrate circumstances that lead to negative transfer.

Viaarxiv icon

Causal Similarity-Based Hierarchical Bayesian Models

Oct 19, 2023
Sophie Wharrie, Samuel Kaski

The key challenge underlying machine learning is generalisation to new data. This work studies generalisation for datasets consisting of related tasks that may differ in causal mechanisms. For example, observational medical data for complex diseases suffers from heterogeneity in causal mechanisms of disease across patients, creating challenges for machine learning algorithms that need to generalise to new patients outside of the training dataset. Common approaches for learning supervised models with heterogeneous datasets include learning a global model for the entire dataset, learning local models for each tasks' data, or utilising hierarchical, meta-learning and multi-task learning approaches to learn how to generalise from data pooled across multiple tasks. In this paper we propose causal similarity-based hierarchical Bayesian models to improve generalisation to new tasks by learning how to pool data from training tasks with similar causal mechanisms. We apply this general modelling principle to Bayesian neural networks and compare a variety of methods for estimating causal task similarity (for both known and unknown causal models). We demonstrate the benefits of our approach and applicability to real world problems through a range of experiments on simulated and real data.

Viaarxiv icon

Understanding deep neural networks through the lens of their non-linearity

Oct 17, 2023
Quentin Bouniot, Ievgen Redko, Anton Mallasto, Charlotte Laclau, Karol Arndt, Oliver Struckmeier, Markus Heinonen, Ville Kyrki, Samuel Kaski

Figure 1 for Understanding deep neural networks through the lens of their non-linearity
Figure 2 for Understanding deep neural networks through the lens of their non-linearity
Figure 3 for Understanding deep neural networks through the lens of their non-linearity
Figure 4 for Understanding deep neural networks through the lens of their non-linearity

The remarkable success of deep neural networks (DNN) is often attributed to their high expressive power and their ability to approximate functions of arbitrary complexity. Indeed, DNNs are highly non-linear models, and activation functions introduced into them are largely responsible for this. While many works studied the expressive power of DNNs through the lens of their approximation capabilities, quantifying the non-linearity of DNNs or of individual activation functions remains an open problem. In this paper, we propose the first theoretically sound solution to track non-linearity propagation in deep neural networks with a specific focus on computer vision applications. Our proposed affinity score allows us to gain insights into the inner workings of a wide range of different architectures and learning paradigms. We provide extensive experimental results that highlight the practical utility of the proposed affinity score and its potential for long-reaching applications.

Viaarxiv icon

Compositional Sculpting of Iterative Generative Processes

Sep 28, 2023
Timur Garipov, Sebastiaan De Peuter, Ge Yang, Vikas Garg, Samuel Kaski, Tommi Jaakkola

High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance conditions. In this work, we propose Compositional Sculpting: a general approach for defining compositions of iterative generative processes. We then introduce a method for sampling from these compositions built on classifier guidance. We showcase ways to accomplish compositional sculpting in both GFlowNets and diffusion models. We highlight two binary operations $\unicode{x2014}$ the harmonic mean ($p_1 \otimes p_2$) and the contrast ($p_1 \unicode{x25D1}\,p_2$) between pairs, and the generalization of these operations to multiple component distributions. We offer empirical results on image and molecular generation tasks.

* Extended version of NeurIPS 2023 paper 
Viaarxiv icon

Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets

Sep 21, 2023
Tiago da Silva, Eliezer Silva, Adèle Ribeiro, António Góis, Dominik Heider, Samuel Kaski, Diego Mesquita

Figure 1 for Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets
Figure 2 for Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets
Figure 3 for Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets
Figure 4 for Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets

Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inference process. Surprisingly, while CD is a human-centered affair, no works have focused on building methods that both 1) output uncertainty estimates that can be verified by experts and 2) interact with those experts to iteratively refine CD. To solve these issues, we start by proposing to sample (causal) ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC), using generative flow networks. Then, we leverage the diversity in candidate graphs and introduce an optimal experimental design to iteratively probe the expert about the relations among variables, effectively reducing the uncertainty of our belief over ancestral graphs. Finally, we update our samples to incorporate human feedback via importance sampling. Importantly, our method does not require causal sufficiency (i.e., unobserved confounders may exist). Experiments with synthetic observational data show that our method can accurately sample from distributions over ancestral graphs and that we can greatly improve inference quality with human aid.

Viaarxiv icon

Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data

Aug 09, 2023
Lukas Prediger, Joonas Jälkö, Antti Honkela, Samuel Kaski

Figure 1 for Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data
Figure 2 for Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data
Figure 3 for Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data
Figure 4 for Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data

Consider a setting where multiple parties holding sensitive data aim to collaboratively learn population level statistics, but pooling the sensitive data sets is not possible. We propose a framework in which each party shares a differentially private synthetic twin of their data. We study the feasibility of combining such synthetic twin data sets for collaborative learning on real-world health data from the UK Biobank. We discover that parties engaging in the collaborative learning via shared synthetic data obtain more accurate estimates of target statistics compared to using only their local data. This finding extends to the difficult case of small heterogeneous data sets. Furthermore, the more parties participate, the larger and more consistent the improvements become. Finally, we find that data sharing can especially help parties whose data contain underrepresented groups to perform better-adjusted analysis for said groups. Based on our results we conclude that sharing of synthetic twins is a viable method for enabling learning from sensitive data without violating privacy constraints even if individual data sets are small or do not represent the overall population well. The setting of distributed sensitive data is often a bottleneck in biomedical research, which our study shows can be alleviated with privacy-preserving collaborative learning methods.

Viaarxiv icon

Practical Equivariances via Relational Conditional Neural Processes

Jun 19, 2023
Daolang Huang, Manuel Haussmann, Ulpu Remes, ST John, Grégoire Clarté, Kevin Sebastian Luck, Samuel Kaski, Luigi Acerbi

Conditional Neural Processes (CNPs) are a class of metalearning models popular for combining the runtime efficiency of amortized inference with reliable uncertainty quantification. Many relevant machine learning tasks, such as spatio-temporal modeling, Bayesian Optimization and continuous control, contain equivariances -- for example to translation -- which the model can exploit for maximal performance. However, prior attempts to include equivariances in CNPs do not scale effectively beyond two input dimensions. In this work, we propose Relational Conditional Neural Processes (RCNPs), an effective approach to incorporate equivariances into any neural process model. Our proposed method extends the applicability and impact of equivariant neural processes to higher dimensions. We empirically demonstrate the competitive performance of RCNPs on a large array of tasks naturally containing equivariances.

* 29 pages, 5 figures 
Viaarxiv icon

Input gradient diversity for neural network ensembles

Jun 05, 2023
Trung Trinh, Markus Heinonen, Luigi Acerbi, Samuel Kaski

Figure 1 for Input gradient diversity for neural network ensembles
Figure 2 for Input gradient diversity for neural network ensembles
Figure 3 for Input gradient diversity for neural network ensembles
Figure 4 for Input gradient diversity for neural network ensembles

Deep Ensembles (DEs) demonstrate improved accuracy, calibration and robustness to perturbations over single neural networks partly due to their functional diversity. Particle-based variational inference (ParVI) methods enhance diversity by formalizing a repulsion term based on a network similarity kernel. However, weight-space repulsion is inefficient due to over-parameterization, while direct function-space repulsion has been found to produce little improvement over DEs. To sidestep these difficulties, we propose First-order Repulsive Deep Ensemble (FoRDE), an ensemble learning method based on ParVI, which performs repulsion in the space of first-order input gradients. As input gradients uniquely characterize a function up to translation and are much smaller in dimension than the weights, this method guarantees that ensemble members are functionally different. Intuitively, diversifying the input gradients encourages each network to learn different features, which is expected to improve the robustness of an ensemble. Experiments on image classification datasets show that FoRDE significantly outperforms the gold-standard DEs and other ensemble methods in accuracy and calibration under covariate shift due to input perturbations.

* Under review 
Viaarxiv icon

Learning Robust Statistics for Simulation-based Inference under Model Misspecification

May 25, 2023
Daolang Huang, Ayush Bharti, Amauri Souza, Luigi Acerbi, Samuel Kaski

Figure 1 for Learning Robust Statistics for Simulation-based Inference under Model Misspecification
Figure 2 for Learning Robust Statistics for Simulation-based Inference under Model Misspecification
Figure 3 for Learning Robust Statistics for Simulation-based Inference under Model Misspecification
Figure 4 for Learning Robust Statistics for Simulation-based Inference under Model Misspecification

Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.

Viaarxiv icon