Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rebecca Willett

Embed and Emulate: Learning to estimate parameters of dynamical systems with uncertainty quantification

Nov 03, 2022

Ruoxi Jiang, Rebecca Willett

Abstract:This paper explores learning emulators for parameter estimation with uncertainty estimation of high-dimensional dynamical systems. We assume access to a computationally complex simulator that inputs a candidate parameter and outputs a corresponding multichannel time series. Our task is to accurately estimate a range of likely values of the underlying parameters. Standard iterative approaches necessitate running the simulator many times, which is computationally prohibitive. This paper describes a novel framework for learning feature embeddings of observed dynamics jointly with an emulator that can replace high-cost simulators for parameter estimation. Leveraging a contrastive learning approach, our method exploits intrinsic data properties within and across parameter and trajectory domains. On a coupled 396-dimensional multiscale Lorenz 96 system, our method significantly outperforms a typical parameter estimation method based on predefined metrics and a classical numerical simulator, and with only 1.19% of the baseline's computation time. Ablation studies highlight the potential of explicitly designing learned emulators for parameter estimation by leveraging contrastive learning.

* Accepted at NeurIPS 2022

Via

Access Paper or Ask Questions

Cloud Classification with Unsupervised Deep Learning

Sep 30, 2022

Takuya Kurihana, Ian Foster, Rebecca Willett, Sydney Jenkins, Kathryn Koenig, Ruby Werman, Ricardo Barros Lourenco, Casper Neo, Elisabeth Moyer

Figure 1 for Cloud Classification with Unsupervised Deep Learning

Figure 2 for Cloud Classification with Unsupervised Deep Learning

Figure 3 for Cloud Classification with Unsupervised Deep Learning

Figure 4 for Cloud Classification with Unsupervised Deep Learning

Abstract:We present a framework for cloud characterization that leverages modern unsupervised deep learning technologies. While previous neural network-based cloud classification models have used supervised learning methods, unsupervised learning allows us to avoid restricting the model to artificial categories based on historical cloud classification schemes and enables the discovery of novel, more detailed classifications. Our framework learns cloud features directly from radiance data produced by NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) satellite instrument, deriving cloud characteristics from millions of images without relying on pre-defined cloud types during the training process. We present preliminary results showing that our method extracts physically relevant information from radiance data and produces meaningful cloud classes.

* 5 pages, 6 figures, Proceedings for Climate Informatics Workshop 2019 Paris

Via

Access Paper or Ask Questions

NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction

Mar 16, 2022

Yi Ding, Avinash Rao, Hyebin Song, Rebecca Willett, Henry Hoffmann

Figure 1 for NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction

Figure 2 for NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction

Figure 3 for NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction

Figure 4 for NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction

Abstract:Datacenters execute large computational jobs, which are composed of smaller tasks. A job completes when all its tasks finish, so stragglers -- rare, yet extremely slow tasks -- are a major impediment to datacenter performance. Accurately predicting stragglers would enable proactive intervention, allowing datacenter operators to mitigate stragglers before they delay a job. While much prior work applies machine learning to predict computer system performance, these approaches rely on complete labels -- i.e., sufficient examples of all possible behaviors, including straggling and non-straggling -- or strong assumptions about the underlying latency distributions -- e.g., whether Gaussian or not. Within a running job, however, none of this information is available until stragglers have revealed themselves when they have already delayed the job. To predict stragglers accurately and early without labeled positive examples or assumptions on latency distributions, this paper presents NURD, a novel Negative-Unlabeled learning approach with Reweighting and Distribution-compensation that only trains on negative and unlabeled streaming data. The key idea is to train a predictor using finished tasks of non-stragglers to predict latency for unlabeled running tasks, and then reweight each unlabeled task's prediction based on a weighting function of its feature space. We evaluate NURD on two production traces from Google and Alibaba, and find that compared to the best baseline approach, NURD produces 2--11 percentage point increases in the F1 score in terms of prediction accuracy, and 4.7--8.8 percentage point improvements in job completion time.

Via

Access Paper or Ask Questions

The Role of Linear Layers in Nonlinear Interpolating Networks

Feb 02, 2022

Greg Ongie, Rebecca Willett

Figure 1 for The Role of Linear Layers in Nonlinear Interpolating Networks

Figure 2 for The Role of Linear Layers in Nonlinear Interpolating Networks

Figure 3 for The Role of Linear Layers in Nonlinear Interpolating Networks

Figure 4 for The Role of Linear Layers in Nonlinear Interpolating Networks

Abstract:This paper explores the implicit bias of overparameterized neural networks of depth greater than two layers. Our framework considers a family of networks of varying depth that all have the same capacity but different implicitly defined representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function; it reflects the function space bias associated with the architecture. Our results show that adding linear layers to a ReLU network yields a representation cost that reflects a complex interplay between the alignment and sparsity of ReLU units. Specifically, using a neural network to fit training data with minimum representation cost yields an interpolating function that is constant in directions perpendicular to a low-dimensional subspace on which a parsimonious interpolant exists.

Via

Access Paper or Ask Questions

Adaptive Differentially Private Empirical Risk Minimization

Oct 25, 2021

Xiaoxia Wu, Lingxiao Wang, Irina Cristali, Quanquan Gu, Rebecca Willett

Figure 1 for Adaptive Differentially Private Empirical Risk Minimization

Figure 2 for Adaptive Differentially Private Empirical Risk Minimization

Figure 3 for Adaptive Differentially Private Empirical Risk Minimization

Figure 4 for Adaptive Differentially Private Empirical Risk Minimization

Abstract:We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization. At each iteration, the random noise added to the gradient is optimally adapted to the stepsize; we name this process adaptive differentially private (ADP) learning. Given the same privacy budget, we prove that the ADP method considerably improves the utility guarantee compared to the standard differentially private method in which vanilla random noise is added. Our method is particularly useful for gradient-based algorithms with time-varying learning rates, including variants of AdaGrad (Duchi et al., 2011). We provide extensive numerical experiments to demonstrate the effectiveness of the proposed adaptive differentially private algorithm.

Via

Access Paper or Ask Questions

Auto-differentiable Ensemble Kalman Filters

Jul 19, 2021

Yuming Chen, Daniel Sanz-Alonso, Rebecca Willett

Figure 1 for Auto-differentiable Ensemble Kalman Filters

Figure 2 for Auto-differentiable Ensemble Kalman Filters

Figure 3 for Auto-differentiable Ensemble Kalman Filters

Figure 4 for Auto-differentiable Ensemble Kalman Filters

Abstract:Data assimilation is concerned with sequentially estimating a temporally-evolving state. This task, which arises in a wide range of scientific and engineering applications, is particularly challenging when the state is high-dimensional and the state-space dynamics are unknown. This paper introduces a machine learning framework for learning dynamical systems in data assimilation. Our auto-differentiable ensemble Kalman filters (AD-EnKFs) blend ensemble Kalman filters for state recovery with machine learning tools for learning the dynamics. In doing so, AD-EnKFs leverage the ability of ensemble Kalman filters to scale to high-dimensional states and the power of automatic differentiation to train high-dimensional surrogate models for the dynamics. Numerical results using the Lorenz-96 model show that AD-EnKFs outperform existing methods that use expectation-maximization or particle filters to merge data assimilation and machine learning. In addition, AD-EnKFs are easy to implement and require minimal tuning.

Via

Access Paper or Ask Questions

Pure Exploration in Kernel and Neural Bandits

Jun 22, 2021

Yinglun Zhu, Dongruo Zhou, Ruoxi Jiang, Quanquan Gu, Rebecca Willett, Robert Nowak

Figure 1 for Pure Exploration in Kernel and Neural Bandits

Figure 2 for Pure Exploration in Kernel and Neural Bandits

Figure 3 for Pure Exploration in Kernel and Neural Bandits

Abstract:We study pure exploration in bandits, where the dimension of the feature representation can be much larger than the number of arms. To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecifications. Our approach is conceptually very different from existing works that can either only handle low-dimensional linear bandits or passively deal with model misspecifications. We showcase the application of our approach to two pure exploration settings that were previously under-studied: (1) the reward function belongs to a possibly infinite-dimensional Reproducing Kernel Hilbert Space, and (2) the reward function is nonlinear and can be approximated by neural networks. Our main results provide sample complexity guarantees that only depend on the effective dimension of the feature spaces in the kernel or neural representations. Extensive experiments conducted on both synthetic and real-world datasets demonstrate the efficacy of our methods.

Via

Access Paper or Ask Questions

Prediction in the presence of response-dependent missing labels

Mar 25, 2021

Hyebin Song, Garvesh Raskutti, Rebecca Willett

Figure 1 for Prediction in the presence of response-dependent missing labels

Figure 2 for Prediction in the presence of response-dependent missing labels

Figure 3 for Prediction in the presence of response-dependent missing labels

Figure 4 for Prediction in the presence of response-dependent missing labels

Abstract:In a variety of settings, limitations of sensing technologies or other sampling mechanisms result in missing labels, where the likelihood of a missing label in the training set is an unknown function of the data. For example, satellites used to detect forest fires cannot sense fires below a certain size threshold. In such cases, training datasets consist of positive and pseudo-negative observations where pseudo-negative observations can be either true negatives or undetected positives with small magnitudes. We develop a new methodology and non-convex algorithm P(ositive) U(nlabeled) - O(ccurrence) M(agnitude) M(ixture) which jointly estimates the occurrence and detection likelihood of positive samples, utilizing prior knowledge of the detection mechanism. Our approach uses ideas from positive-unlabeled (PU)-learning and zero-inflated models that jointly estimate the magnitude and occurrence of events. We provide conditions under which our model is identifiable and prove that even though our approach leads to a non-convex objective, any local minimizer has optimal statistical error (up to a log term) and projected gradient descent has geometric convergence rates. We demonstrate on both synthetic data and a California wildfire dataset that our method out-performs existing state-of-the-art approaches.

Via

Access Paper or Ask Questions

Data-driven Cloud Clustering via a Rotationally Invariant Autoencoder

Mar 08, 2021

Takuya Kurihana, Elisabeth Moyer, Rebecca Willett, Davis Gilton, Ian Foster

Figure 1 for Data-driven Cloud Clustering via a Rotationally Invariant Autoencoder

Figure 2 for Data-driven Cloud Clustering via a Rotationally Invariant Autoencoder

Figure 3 for Data-driven Cloud Clustering via a Rotationally Invariant Autoencoder

Figure 4 for Data-driven Cloud Clustering via a Rotationally Invariant Autoencoder

Abstract:Advanced satellite-born remote sensing instruments produce high-resolution multi-spectral data for much of the globe at a daily cadence. These datasets open up the possibility of improved understanding of cloud dynamics and feedback, which remain the biggest source of uncertainty in global climate model projections. As a step towards answering these questions, we describe an automated rotation-invariant cloud clustering (RICC) method that leverages deep learning autoencoder technology to organize cloud imagery within large datasets in an unsupervised fashion, free from assumptions about predefined classes. We describe both the design and implementation of this method and its evaluation, which uses a sequence of testing protocols to determine whether the resulting clusters: (1) are physically reasonable, (i.e., embody scientifically relevant distinctions); (2) capture information on spatial distributions, such as textures; (3) are cohesive and separable in latent space; and (4) are rotationally invariant, (i.e., insensitive to the orientation of an image). Results obtained when these evaluation protocols are applied to RICC outputs suggest that the resultant novel cloud clusters capture meaningful aspects of cloud physics, are appropriately spatially coherent, and are invariant to orientations of input images. Our results support the possibility of using an unsupervised data-driven approach for automated clustering and pattern discovery in cloud imagery.

* 21 pages. 15 figures. Under review by IEEE Transactions on Geoscience and Remote Sensing (TGRS)

Via

Access Paper or Ask Questions

Deep Equilibrium Architectures for Inverse Problems in Imaging

Feb 16, 2021

Davis Gilton, Gregory Ongie, Rebecca Willett

Figure 1 for Deep Equilibrium Architectures for Inverse Problems in Imaging

Figure 2 for Deep Equilibrium Architectures for Inverse Problems in Imaging

Figure 3 for Deep Equilibrium Architectures for Inverse Problems in Imaging

Figure 4 for Deep Equilibrium Architectures for Inverse Problems in Imaging

Abstract:Recent efforts on solving inverse problems in imaging via deep neural networks use architectures inspired by a fixed number of iterations of an optimization method. The number of iterations is typically quite small due to difficulties in training networks corresponding to more iterations; the resulting solvers cannot be run for more iterations at test time without incurring significant errors. This paper describes an alternative approach corresponding to an {\em infinite} number of iterations, yielding up to a 4dB PSNR improvement in reconstruction accuracy above state-of-the-art alternatives and where the computational budget can be selected at test time to optimize context-dependent trade-offs between accuracy and computation. The proposed approach leverages ideas from Deep Equilibrium Models, where the fixed-point iteration is constructed to incorporate a known forward model and insights from classical optimization-based reconstruction methods.

Via

Access Paper or Ask Questions