Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eric P. Xing

Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

Jul 14, 2018
Peilun Li, Xiaodan Liang, Daoyuan Jia, Eric P. Xing

Figure 1 for Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

Figure 2 for Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

Figure 3 for Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

Figure 4 for Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

Recent advances in vision tasks (e.g., segmentation) highly depend on the availability of large-scale real-world image annotations obtained by cumbersome human labors. Moreover, the perception performance often drops significantly for new scenarios, due to the poor generalization capability of models trained on limited and biased annotations. In this work, we resort to transfer knowledge from automatically rendered scene annotations in virtual-world to facilitate real-world visual tasks. Although virtual-world annotations can be ideally diverse and unlimited, the discrepant data distributions between virtual and real-world make it challenging for knowledge transferring. We thus propose a novel Semantic-aware Grad-GAN (SG-GAN) to perform virtual-to-real domain adaption with the ability of retaining vital semantic information. Beyond the simple holistic color/texture transformation achieved by prior works, SG-GAN successfully personalizes the appearance adaption for each semantic region in order to preserve their key characteristic for better recognition. It presents two main contributions to traditional GANs: 1) a soft gradient-sensitive objective for keeping semantic boundaries; 2) a semantic-aware discriminator for validating the fidelity of personalized adaptions with respect to each semantic region. Qualitative and quantitative experiments demonstrate the superiority of our SG-GAN in scene adaption over state-of-the-art GANs. Further evaluations on semantic segmentation on Cityscapes show using adapted virtual images by SG-GAN dramatically improves segmentation performance than original virtual data. We release our code at https://github.com/Peilun-Li/SG-GAN.

* In proceedings of BMVC 2018

Via

Access Paper or Ask Questions

On Unifying Deep Generative Models

Jul 11, 2018
Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric P. Xing

Figure 1 for On Unifying Deep Generative Models

Figure 2 for On Unifying Deep Generative Models

Figure 3 for On Unifying Deep Generative Models

Figure 4 for On Unifying Deep Generative Models

Deep generative models have achieved impressive success in recent years. Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as emerging families for generative model learning, have largely been considered as two distinct paradigms and received extensive independent studies respectively. This paper aims to establish formal connections between GANs and VAEs through a new formulation of them. We interpret sample generation in GANs as performing posterior inference, and show that GANs and VAEs involve minimizing KL divergences of respective posterior and inference distributions with opposite directions, extending the two learning phases of classic wake-sleep algorithm, respectively. The unified view provides a powerful tool to analyze a diverse set of existing model variants, and enables to transfer techniques across research lines in a principled way. For example, we apply the importance weighting method in VAE literatures for improved GAN learning, and enhance VAEs with an adversarial mechanism that leverages generated samples. Experiments show generality and effectiveness of the transferred techniques.

* Polished and extended content over the ICLR conference version: https://openreview.net/pdf?id=rylSzl-R-

Via

Access Paper or Ask Questions

Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio

Jul 10, 2018
Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Zeya Wang, Wei Dai, Eric P. Xing

Figure 1 for Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio

Figure 2 for Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio

Figure 3 for Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio

Figure 4 for Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio

The cardiothoracic ratio (CTR), a clinical metric of heart size in chest X-rays (CXRs), is a key indicator of cardiomegaly. Manual measurement of CTR is time-consuming and can be affected by human subjectivity, making it desirable to design computer-aided systems that assist clinicians in the diagnosis process. Automatic CTR estimation through chest organ segmentation, however, requires large amounts of pixel-level annotated data, which is often unavailable. To alleviate this problem, we propose an unsupervised domain adaptation framework based on adversarial networks. The framework learns domain invariant feature representations from openly available data sources to produce accurate chest organ segmentation for unlabeled datasets. Specifically, we propose a model that enforces our intuition that prediction masks should be domain independent. Hence, we introduce a discriminator that distinguishes segmentation predictions from ground truth masks. We evaluate our system's prediction based on the assessment of radiologists and demonstrate the clinical practicability for the diagnosis of cardiomegaly. We finally illustrate on the JSRT dataset that the semi-supervised performance of our model is also very promising.

* Accepted by MICCAI 2018

Via

Access Paper or Ask Questions

Techniques for proving Asynchronous Convergence results for Markov Chain Monte Carlo methods

Jun 03, 2018
Alexander Terenin, Eric P. Xing

Figure 1 for Techniques for proving Asynchronous Convergence results for Markov Chain Monte Carlo methods

Figure 2 for Techniques for proving Asynchronous Convergence results for Markov Chain Monte Carlo methods

Figure 3 for Techniques for proving Asynchronous Convergence results for Markov Chain Monte Carlo methods

Markov Chain Monte Carlo (MCMC) methods such as Gibbs sampling are finding widespread use in applied statistics and machine learning. These often lead to difficult computational problems, which are increasingly being solved on parallel and distributed systems such as compute clusters. Recent work has proposed running iterative algorithms such as gradient descent and MCMC in parallel asynchronously for increased performance, with good empirical results in certain problems. Unfortunately, for MCMC this parallelization technique requires new convergence theory, as it has been explicitly demonstrated to lead to divergence on some examples. Recent theory on Asynchronous Gibbs sampling describes why these algorithms can fail, and provides a way to alter them to make them converge. In this article, we describe how to apply this theory in a generic setting, to understand the asynchronous behavior of any MCMC algorithm, including those implemented using parameter servers, and those not based on Gibbs sampling.

* Workshop on Advances in Approximate Bayesian Inference, 31st Conference on Neural Information Processing Systems, 2017

Via

Access Paper or Ask Questions

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

May 31, 2018
Michael Kampffmeyer, Yinbo Chen, Xiaodan Liang, Hao Wang, Yujia Zhang, Eric P. Xing

Figure 1 for Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Figure 2 for Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Figure 3 for Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Figure 4 for Rethinking Knowledge Graph Propagation for Zero-Shot Learning

The potential of graph convolutional neural networks for the task of zero-shot learning has been demonstrated recently. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, knowledge from distant nodes can get diluted when propagating through intermediate nodes, because current approaches to zero-shot learning use graph propagation schemes that perform Laplacian smoothing at each layer. We show that extensive smoothing does not help the task of regressing classifier weights in zero-shot learning. In order to still incorporate information from distant nodes and utilize the graph structure, we propose an Attentive Dense Graph Propagation Module (ADGPM). ADGPM allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a node's relationship to its ancestors and descendants and an attention scheme is further used to weigh their contribution depending on the distance to the node. Finally, we illustrate that finetuning of the feature representation after training the ADGPM leads to considerable improvements. Our method achieves competitive results, outperforming previous zero-shot learning approaches.

* The first two authors contributed equally. Code at https://github.com/cyvius96/adgpm

Via

Access Paper or Ask Questions

Unsupervised Text Style Transfer using Language Models as Discriminators

May 31, 2018
Zichao Yang, Zhiting Hu, Chris Dyer, Eric P. Xing, Taylor Berg-Kirkpatrick

Figure 1 for Unsupervised Text Style Transfer using Language Models as Discriminators

Figure 2 for Unsupervised Text Style Transfer using Language Models as Discriminators

Figure 3 for Unsupervised Text Style Transfer using Language Models as Discriminators

Figure 4 for Unsupervised Text Style Transfer using Language Models as Discriminators

Binary classifiers are often employed as discriminators in GAN-based unsupervised style transfer systems to ensure that transferred sentences are similar to sentences in the target domain. One difficulty with this approach is that the error signal provided by the discriminator can be unstable and is sometimes insufficient to train the generator to produce fluent language. In this paper, we propose a new technique that uses a target domain language model as the discriminator, providing richer and more stable token-level feedback during the learning process. We train the generator to minimize the negative log likelihood (NLL) of generated sentences, evaluated by the language model. By using a continuous approximation of discrete sampling under the generator, our model can be trained using back-propagation in an end- to-end fashion. Moreover, our empirical results show that when using a language model as a structured discriminator, it is possible to forgoe adversarial steps during training, making the process more stable. We compare our model with previous work using convolutional neural networks (CNNs) as discriminators and show that our approach leads to improved performance on three tasks: word substitution decipherment, sentiment modification, and related language translation.

Via

Access Paper or Ask Questions

Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation

May 21, 2018
Christy Y. Li, Xiaodan Liang, Zhiting Hu, Eric P. Xing

Figure 1 for Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation

Figure 2 for Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation

Figure 3 for Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation

Figure 4 for Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation

Generating long and coherent reports to describe medical images poses challenges to bridging visual patterns with informative human linguistic descriptions. We propose a novel Hybrid Retrieval-Generation Reinforced Agent (HRGR-Agent) which reconciles traditional retrieval-based approaches populated with human prior knowledge, with modern learning-based approaches to achieve structured, robust, and diverse report generation. HRGR-Agent employs a hierarchical decision-making procedure. For each sentence, a high-level retrieval policy module chooses to either retrieve a template sentence from an off-the-shelf template database, or invoke a low-level generation module to generate a new sentence. HRGR-Agent is updated via reinforcement learning, guided by sentence-level and word-level rewards. Experiments show that our approach achieves the state-of-the-art results on two medical report datasets, generating well-balanced structured sentences with robust coverage of heterogeneous medical report contents. In addition, our model achieves the highest detection accuracy of medical terminologies, and improved human evaluation performance.

Via

Access Paper or Ask Questions

Image-derived generative modeling of pseudo-macromolecular structures - towards the statistical assessment of Electron CryoTomography template matching

May 12, 2018
Kai Wen Wang, Xiangrui Zeng, Xiaodan Liang, Zhiguang Huo, Eric P. Xing, Min Xu

Figure 1 for Image-derived generative modeling of pseudo-macromolecular structures - towards the statistical assessment of Electron CryoTomography template matching

Figure 2 for Image-derived generative modeling of pseudo-macromolecular structures - towards the statistical assessment of Electron CryoTomography template matching

Figure 3 for Image-derived generative modeling of pseudo-macromolecular structures - towards the statistical assessment of Electron CryoTomography template matching

Figure 4 for Image-derived generative modeling of pseudo-macromolecular structures - towards the statistical assessment of Electron CryoTomography template matching

Cellular Electron CryoTomography (CECT) is a 3D imaging technique that captures information about the structure and spatial organization of macromolecular complexes within single cells, in near-native state and at sub-molecular resolution. Although template matching is often used to locate macromolecules in a CECT image, it is insufficient as it only measures the relative structural similarity. Therefore, it is preferable to assess the statistical credibility of the decision through hypothesis testing, requiring many templates derived from a diverse population of macromolecular structures. Due to the very limited number of known structures, we need a generative model to efficiently and reliably sample pseudo-structures from the complex distribution of macromolecular structures. To address this challenge, we propose a novel image-derived approach for performing hypothesis testing for template matching by constructing generative models using the generative adversarial network. Finally, we conducted hypothesis testing experiments for template matching on both simulated and experimental subtomograms, allowing us to conclude the identity of subtomograms with high statistical credibility and significantly reducing false positives.

* British Machine Vision Conference (BMVC) 2018

Via

Access Paper or Ask Questions

DTR-GAN: Dilated Temporal Relational Adversarial Network for Video Summarization

Apr 30, 2018
Yujia Zhang, Michael Kampffmeyer, Xiaodan Liang, Dingwen Zhang, Min Tan, Eric P. Xing

Figure 1 for DTR-GAN: Dilated Temporal Relational Adversarial Network for Video Summarization

Figure 2 for DTR-GAN: Dilated Temporal Relational Adversarial Network for Video Summarization

Figure 3 for DTR-GAN: Dilated Temporal Relational Adversarial Network for Video Summarization

Figure 4 for DTR-GAN: Dilated Temporal Relational Adversarial Network for Video Summarization

The large amount of videos popping up every day, make it is more and more critical that key information within videos can be extracted and understood in a very short time. Video summarization, the task of finding the smallest subset of frames, which still conveys the whole story of a given video, is thus of great significance to improve efficiency of video understanding. In this paper, we propose a novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to achieve frame-level video summarization. Given a video, it can select a set of key frames, which contains the most meaningful and compact information. Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner. A new dilated temporal relation (DTR) unit is introduced for enhancing temporal representation capturing. The generator aims to select key frames by using DTR units to effectively exploit global multi-scale temporal context and to complement the commonly used Bi-LSTM. To ensure that the summaries capture enough key video representation from a global perspective rather than a trivial randomly shorten sequence, we present a discriminator that learns to enforce both the information completeness and compactness of summaries via a three-player loss. The three-player loss includes the generated summary loss, the random summary loss, and the real summary (ground-truth) loss, which play important roles for better regularizing the learned model to obtain useful summaries. Comprehensive experiments on two public datasets SumMe and TVSum show the superiority of our DTR-GAN over the state-of-the-art approaches.

Via

Access Paper or Ask Questions

Identifiability of Nonparametric Mixture Models and Bayes Optimal Clustering

Apr 22, 2018
Bryon Aragam, Chen Dan, Pradeep Ravikumar, Eric P. Xing

Figure 1 for Identifiability of Nonparametric Mixture Models and Bayes Optimal Clustering

Figure 2 for Identifiability of Nonparametric Mixture Models and Bayes Optimal Clustering

Figure 3 for Identifiability of Nonparametric Mixture Models and Bayes Optimal Clustering

Figure 4 for Identifiability of Nonparametric Mixture Models and Bayes Optimal Clustering

Motivated by problems in data clustering, we establish general conditions under which families of nonparametric mixture models are identifiable by introducing a novel framework for clustering overfitted \emph{parametric} (i.e. misspecified) mixture models. These conditions generalize existing conditions in the literature, and are flexible enough to include for example mixtures of Gaussian mixtures. In contrast to the recent literature on estimating nonparametric mixtures, we allow for general nonparametric mixture components, and instead impose regularity assumptions on the underlying mixing measure. As our primary application, we apply these results to partition-based clustering, generalizing the well-known notion of a Bayes optimal partition from classical model-based clustering to nonparametric settings. Furthermore, this framework is constructive in that it yields a practical algorithm for learning identified mixtures, which is illustrated through several examples. The key conceptual device in the analysis is the convex, metric geometry of probability distributions on metric spaces and its connection to optimal transport and the Wasserstein convergence of mixing measures. The result is a flexible framework for nonparametric clustering with formal consistency guarantees.

* 25 pages, 8 figures, 1 table. Added more experiments

Via

Access Paper or Ask Questions