Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yixin Wang

Augmenting Neural Networks with Priors on Function Values

Feb 10, 2022
Hunter Nisonoff, Yixin Wang, Jennifer Listgarten

Figure 1 for Augmenting Neural Networks with Priors on Function Values

Figure 2 for Augmenting Neural Networks with Priors on Function Values

Figure 3 for Augmenting Neural Networks with Priors on Function Values

Figure 4 for Augmenting Neural Networks with Priors on Function Values

The need for function estimation in label-limited settings is common in the natural sciences. At the same time, prior knowledge of function values is often available in these domains. For example, data-free biophysics-based models can be informative on protein properties, while quantum-based computations can be informative on small molecule properties. How can we coherently leverage such prior knowledge to help improve a neural network model that is quite accurate in some regions of input space -- typically near the training data -- but wildly wrong in other regions? Bayesian neural networks (BNN) enable the user to specify prior information only on the neural network weights, not directly on the function values. Moreover, there is in general no clear mapping between these. Herein, we tackle this problem by developing an approach to augment BNNs with prior information on the function values themselves. Our probabilistic approach yields predictions that rely more heavily on the prior information when the epistemic uncertainty is large, and more heavily on the neural network when the epistemic uncertainty is small.

Via

Access Paper or Ask Questions

Spatiotemporal Clustering with Neyman-Scott Processes via Connections to Bayesian Nonparametric Mixture Models

Jan 14, 2022
Yixin Wang, Anthony Degleris, Alex H. Williams, Scott W. Linderman

Figure 1 for Spatiotemporal Clustering with Neyman-Scott Processes via Connections to Bayesian Nonparametric Mixture Models

Figure 2 for Spatiotemporal Clustering with Neyman-Scott Processes via Connections to Bayesian Nonparametric Mixture Models

Figure 3 for Spatiotemporal Clustering with Neyman-Scott Processes via Connections to Bayesian Nonparametric Mixture Models

Figure 4 for Spatiotemporal Clustering with Neyman-Scott Processes via Connections to Bayesian Nonparametric Mixture Models

Neyman-Scott processes (NSPs) are point process models that generate clusters of points in time or space. They are natural models for a wide range of phenomena, ranging from neural spike trains to document streams. The clustering property is achieved via a doubly stochastic formulation: first, a set of latent events is drawn from a Poisson process; then, each latent event generates a set of observed data points according to another Poisson process. This construction is similar to Bayesian nonparametric mixture models like the Dirichlet process mixture model (DPMM) in that the number of latent events (i.e. clusters) is a random variable, but the point process formulation makes the NSP especially well suited to modeling spatiotemporal data. While many specialized algorithms have been developed for DPMMs, comparatively fewer works have focused on inference in NSPs. Here, we present novel connections between NSPs and DPMMs, with the key link being a third class of Bayesian mixture models called mixture of finite mixture models (MFMMs). Leveraging this connection, we adapt the standard collapsed Gibbs sampling algorithm for DPMMs to enable scalable Bayesian inference on NSP models. We demonstrate the potential of Neyman-Scott processes on a variety of applications including sequence detection in neural spike trains and event detection in document streams.

* 37 pages, 6 figures

Via

Access Paper or Ask Questions

A Survey of Visual Transformers

Nov 13, 2021
Yang Liu, Yao Zhang, Yixin Wang, Feng Hou, Jin Yuan, Jiang Tian, Yang Zhang, Zhongchao Shi, Jianping Fan, Zhiqiang He

Figure 1 for A Survey of Visual Transformers

Figure 2 for A Survey of Visual Transformers

Figure 3 for A Survey of Visual Transformers

Figure 4 for A Survey of Visual Transformers

Transformer, an attention-based encoder-decoder architecture, has revolutionized the field of natural language processing. Inspired by this significant achievement, some pioneering works have recently been done on adapting Transformerliked architectures to Computer Vision (CV) fields, which have demonstrated their effectiveness on various CV tasks. Relying on competitive modeling capability, visual Transformers have achieved impressive performance on multiple benchmarks such as ImageNet, COCO, and ADE20k as compared with modern Convolution Neural Networks (CNN). In this paper, we have provided a comprehensive review of over one hundred different visual Transformers for three fundamental CV tasks (classification, detection, and segmentation), where a taxonomy is proposed to organize these methods according to their motivations, structures, and usage scenarios. Because of the differences in training settings and oriented tasks, we have also evaluated these methods on different configurations for easy and intuitive comparison instead of only various benchmarks. Furthermore, we have revealed a series of essential but unexploited aspects that may empower Transformer to stand out from numerous architectures, e.g., slack high-level semantic embeddings to bridge the gap between visual and sequential Transformers. Finally, three promising future research directions are suggested for further investment.

Via

Access Paper or Ask Questions

Identifiable Variational Autoencoders via Sparse Decoding

Oct 20, 2021
Gemma E. Moran, Dhanya Sridhar, Yixin Wang, David M. Blei

Figure 1 for Identifiable Variational Autoencoders via Sparse Decoding

Figure 2 for Identifiable Variational Autoencoders via Sparse Decoding

Figure 3 for Identifiable Variational Autoencoders via Sparse Decoding

Figure 4 for Identifiable Variational Autoencoders via Sparse Decoding

We develop the Sparse VAE, a deep generative model for unsupervised representation learning on high-dimensional data. Given a dataset of observations, the Sparse VAE learns a set of latent factors that captures its distribution. The model is sparse in the sense that each feature of the dataset (i.e., each dimension) depends on a small subset of the latent factors. As examples, in ratings data each movie is only described by a few genres; in text data each word is only applicable to a few topics; in genomics, each gene is active in only a few biological processes. We first show that the Sparse VAE is identifiable: given data drawn from the model, there exists a uniquely optimal set of factors. (In contrast, most VAE-based models are not identifiable.) The key assumption behind Sparse-VAE identifiability is the existence of "anchor features", where for each factor there exists a feature that depends only on that factor. Importantly, the anchor features do not need to be known in advance. We then show how to fit the Sparse VAE with variational EM. Finally, we empirically study the Sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.

Via

Access Paper or Ask Questions

All-Around Real Label Supervision: Cyclic Prototype Consistency Learning for Semi-supervised Medical Image Segmentation

Sep 28, 2021
Zhe Xu, Yixin Wang, Donghuan Lu, Lequan Yu, Jiangpeng Yan, Jie Luo, Kai Ma, Yefeng Zheng, Raymond Kai-yu Tong

Figure 1 for All-Around Real Label Supervision: Cyclic Prototype Consistency Learning for Semi-supervised Medical Image Segmentation

Figure 2 for All-Around Real Label Supervision: Cyclic Prototype Consistency Learning for Semi-supervised Medical Image Segmentation

Figure 3 for All-Around Real Label Supervision: Cyclic Prototype Consistency Learning for Semi-supervised Medical Image Segmentation

Figure 4 for All-Around Real Label Supervision: Cyclic Prototype Consistency Learning for Semi-supervised Medical Image Segmentation

Semi-supervised learning has substantially advanced medical image segmentation since it alleviates the heavy burden of acquiring the costly expert-examined annotations. Especially, the consistency-based approaches have attracted more attention for their superior performance, wherein the real labels are only utilized to supervise their paired images via supervised loss while the unlabeled images are exploited by enforcing the perturbation-based \textit{"unsupervised"} consistency without explicit guidance from those real labels. However, intuitively, the expert-examined real labels contain more reliable supervision signals. Observing this, we ask an unexplored but interesting question: can we exploit the unlabeled data via explicit real label supervision for semi-supervised training? To this end, we discard the previous perturbation-based consistency but absorb the essence of non-parametric prototype learning. Based on the prototypical network, we then propose a novel cyclic prototype consistency learning (CPCL) framework, which is constructed by a labeled-to-unlabeled (L2U) prototypical forward process and an unlabeled-to-labeled (U2L) backward process. Such two processes synergistically enhance the segmentation network by encouraging more discriminative and compact features. In this way, our framework turns previous \textit{"unsupervised"} consistency into new \textit{"supervised"} consistency, obtaining the \textit{"all-around real label supervision"} property of our method. Extensive experiments on brain tumor segmentation from MRI and kidney segmentation from CT images show that our CPCL can effectively exploit the unlabeled data and outperform other state-of-the-art semi-supervised medical image segmentation methods.

* 10 pages, under review in trans journal

Via

Access Paper or Ask Questions

Optimization-based Causal Estimation from Heterogenous Environments

Sep 24, 2021
Mingzhang Yin, Yixin Wang, David M. Blei

Figure 1 for Optimization-based Causal Estimation from Heterogenous Environments

Figure 2 for Optimization-based Causal Estimation from Heterogenous Environments

Figure 3 for Optimization-based Causal Estimation from Heterogenous Environments

Figure 4 for Optimization-based Causal Estimation from Heterogenous Environments

This paper presents a new optimization approach to causal estimation. Given data that contains covariates and an outcome, which covariates are causes of the outcome, and what is the strength of the causality? In classical machine learning (ML), the goal of optimization is to maximize predictive accuracy. However, some covariates might exhibit a non-causal association to the outcome. Such spurious associations provide predictive power for classical ML, but they prevent us from causally interpreting the result. This paper proposes CoCo, an optimization algorithm that bridges the gap between pure prediction and causal inference. CoCo leverages the recently-proposed idea of environments, datasets of covariates/response where the causal relationships remain invariant but where the distribution of the covariates changes from environment to environment. Given datasets from multiple environments -- and ones that exhibit sufficient heterogeneity -- CoCo maximizes an objective for which the only solution is the causal solution. We describe the theoretical foundations of this approach and demonstrate its effectiveness on simulated and real datasets. Compared to classical ML and existing methods, CoCo provides more accurate estimates of the causal model.

Via

Access Paper or Ask Questions

Desiderata for Representation Learning: A Causal Perspective

Sep 08, 2021
Yixin Wang, Michael I. Jordan

Figure 1 for Desiderata for Representation Learning: A Causal Perspective

Figure 2 for Desiderata for Representation Learning: A Causal Perspective

Figure 3 for Desiderata for Representation Learning: A Causal Perspective

Figure 4 for Desiderata for Representation Learning: A Causal Perspective

Representation learning constructs low-dimensional representations to summarize essential features of high-dimensional data. This learning problem is often approached by describing various desiderata associated with learned representations; e.g., that they be non-spurious, efficient, or disentangled. It can be challenging, however, to turn these intuitive desiderata into formal criteria that can be measured and enhanced based on observed data. In this paper, we take a causal perspective on representation learning, formalizing non-spuriousness and efficiency (in supervised representation learning) and disentanglement (in unsupervised representation learning) using counterfactual quantities and observable consequences of causal assertions. This yields computable metrics that can be used to assess the degree to which representations satisfy the desiderata of interest and learn non-spurious and disentangled representations from single observational datasets.

* 67 pages

Via

Access Paper or Ask Questions

Learning Equilibria in Matching Markets from Bandit Feedback

Aug 19, 2021
Meena Jagadeesan, Alexander Wei, Yixin Wang, Michael I. Jordan, Jacob Steinhardt

Figure 1 for Learning Equilibria in Matching Markets from Bandit Feedback

Figure 2 for Learning Equilibria in Matching Markets from Bandit Feedback

Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data. However, since preferences are inherently uncertain during learning, the classical notion of stability (Gale and Shapley, 1962; Shapley and Shubik, 1971) is unattainable in these settings. To bridge this gap, we develop a framework and algorithms for learning stable market outcomes under uncertainty. Our primary setting is matching with transferable utilities, where the platform both matches agents and sets monetary transfers between them. We design an incentive-aware learning objective that captures the distance of a market outcome from equilibrium. Using this objective, we analyze the complexity of learning as a function of preference structure, casting learning as a stochastic multi-armed bandit problem. Algorithmically, we show that "optimism in the face of uncertainty," the principle underlying many bandit algorithms, applies to a primal-dual formulation of matching with transfers and leads to near-optimal regret bounds. Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.

Via

Access Paper or Ask Questions

ACN: Adversarial Co-training Network for Brain Tumor Segmentation with Missing Modalities

Jun 29, 2021
Yixin Wang, Yang Zhang, Yang Liu, Zihao Lin, Jiang Tian, Cheng Zhong, Zhongchao Shi, Jianping Fan, Zhiqiang He

Figure 1 for ACN: Adversarial Co-training Network for Brain Tumor Segmentation with Missing Modalities

Figure 2 for ACN: Adversarial Co-training Network for Brain Tumor Segmentation with Missing Modalities

Figure 3 for ACN: Adversarial Co-training Network for Brain Tumor Segmentation with Missing Modalities

Figure 4 for ACN: Adversarial Co-training Network for Brain Tumor Segmentation with Missing Modalities

Accurate segmentation of brain tumors from magnetic resonance imaging (MRI) is clinically relevant in diagnoses, prognoses and surgery treatment, which requires multiple modalities to provide complementary morphological and physiopathologic information. However, missing modality commonly occurs due to image corruption, artifacts, different acquisition protocols or allergies to certain contrast agents in clinical practice. Though existing efforts demonstrate the possibility of a unified model for all missing situations, most of them perform poorly when more than one modality is missing. In this paper, we propose a novel Adversarial Co-training Network (ACN) to solve this issue, in which a series of independent yet related models are trained dedicated to each missing situation with significantly better results. Specifically, ACN adopts a novel co-training network, which enables a coupled learning process for both full modality and missing modality to supplement each other's domain and feature representations, and more importantly, to recover the `missing' information of absent modalities. Then, two unsupervised modules, i.e., entropy and knowledge adversarial learning modules are proposed to minimize the domain gap while enhancing prediction reliability and encouraging the alignment of latent representations, respectively. We also adapt modality-mutual information knowledge transfer learning to ACN to retain the rich mutual information among modalities. Extensive experiments on BraTS2018 dataset show that our proposed method significantly outperforms all state-of-the-art methods under any missing situation.

* MICCAI 2021

Via

Access Paper or Ask Questions