Alert button
Picture for Lei Li

Lei Li

Alert button

Generating Sentences from Disentangled Syntactic and Semantic Spaces

Jul 06, 2019
Yu Bao, Hao Zhou, Shujian Huang, Lei Li, Lili Mou, Olga Vechtomova, Xinyu Dai, Jiajun Chen

Figure 1 for Generating Sentences from Disentangled Syntactic and Semantic Spaces
Figure 2 for Generating Sentences from Disentangled Syntactic and Semantic Spaces
Figure 3 for Generating Sentences from Disentangled Syntactic and Semantic Spaces
Figure 4 for Generating Sentences from Disentangled Syntactic and Semantic Spaces

Variational auto-encoders (VAEs) are widely used in natural language generation due to the regularization of the latent space. However, generating sentences from the continuous latent space does not explicitly model the syntactic information. In this paper, we propose to generate sentences from disentangled syntactic and semantic spaces. Our proposed method explicitly models syntactic information in the VAE's latent space by using the linearized tree sequence, leading to better performance of language generation. Additionally, the advantage of sampling in the disentangled syntactic and semantic latent spaces enables us to perform novel applications, such as the unsupervised paraphrase generation and syntax-transfer generation. Experimental results show that our proposed model achieves similar or better performance in various tasks, compared with state-of-the-art related work.

* 11 pages, accepted in ACL-2019 
Viaarxiv icon

Deep Active Learning for Anchor User Prediction

Jun 25, 2019
Anfeng Cheng, Chuan Zhou, Hong Yang, Jia Wu, Lei Li, Jianlong Tan, Li Guo

Figure 1 for Deep Active Learning for Anchor User Prediction
Figure 2 for Deep Active Learning for Anchor User Prediction
Figure 3 for Deep Active Learning for Anchor User Prediction
Figure 4 for Deep Active Learning for Anchor User Prediction

Predicting pairs of anchor users plays an important role in the cross-network analysis. Due to the expensive costs of labeling anchor users for training prediction models, we consider in this paper the problem of minimizing the number of user pairs across multiple networks for labeling as to improve the accuracy of the prediction. To this end, we present a deep active learning model for anchor user prediction (DALAUP for short). However, active learning for anchor user sampling meets the challenges of non-i.i.d. user pair data caused by network structures and the correlation among anchor or non-anchor user pairs. To solve the challenges, DALAUP uses a couple of neural networks with shared-parameter to obtain the vector representations of user pairs, and ensembles three query strategies to select the most informative user pairs for labeling and model training. Experiments on real-world social network data demonstrate that DALAUP outperforms the state-of-the-art approaches.

* 7 pages 
Viaarxiv icon

Fixing Gaussian Mixture VAEs for Interpretable Text Generation

Jun 16, 2019
Wenxian Shi, Hao Zhou, Ning Miao, Shenjian Zhao, Lei Li

Figure 1 for Fixing Gaussian Mixture VAEs for Interpretable Text Generation
Figure 2 for Fixing Gaussian Mixture VAEs for Interpretable Text Generation
Figure 3 for Fixing Gaussian Mixture VAEs for Interpretable Text Generation
Figure 4 for Fixing Gaussian Mixture VAEs for Interpretable Text Generation

Variational auto-encoder (VAE) with Gaussian priors is effective in text generation. To improve the controllability and interpretability, we propose to use Gaussian mixture distribution as the prior for VAE (GMVAE), since it includes an extra discrete latent variable in addition to the continuous one. Unfortunately, training GMVAE using standard variational approximation often leads to the mode-collapse problem. We theoretically analyze the root cause --- maximizing the evidence lower bound of GMVAE implicitly aggregates the means of multiple Gaussian priors. We propose Dispersed-GMVAE (DGMVAE), an improved model for text generation. It introduces two extra terms to alleviate mode-collapse and to induce a better structured latent space. Experimental results show that DGMVAE outperforms strong baselines in several language modeling and text generation benchmarks.

Viaarxiv icon

Dynamically Fused Graph Network for Multi-hop Reasoning

Jun 06, 2019
Yunxuan Xiao, Yanru Qu, Lin Qiu, Hao Zhou, Lei Li, Weinan Zhang, Yong Yu

Figure 1 for Dynamically Fused Graph Network for Multi-hop Reasoning
Figure 2 for Dynamically Fused Graph Network for Multi-hop Reasoning
Figure 3 for Dynamically Fused Graph Network for Multi-hop Reasoning
Figure 4 for Dynamically Fused Graph Network for Multi-hop Reasoning

Text-based question answering (TBQA) has been studied extensively in recent years. Most existing approaches focus on finding the answer to a question within a single paragraph. However, many difficult questions require multiple supporting evidence from scattered text among two or more documents. In this paper, we propose Dynamically Fused Graph Network(DFGN), a novel method to answer those questions requiring multiple scattered evidence and reasoning over them. Inspired by human's step-by-step reasoning behavior, DFGN includes a dynamic fusion layer that starts from the entities mentioned in the given query, explores along the entity graph dynamically built from the text, and gradually finds relevant supporting entities from the given documents. We evaluate DFGN on HotpotQA, a public TBQA dataset requiring multi-hop reasoning. DFGN achieves competitive results on the public board. Furthermore, our analysis shows DFGN produces interpretable reasoning chains.

* Accepted by ACL 19 
Viaarxiv icon

UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations

Apr 28, 2019
Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, Wei-Ying Ma

Figure 1 for UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations
Figure 2 for UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations
Figure 3 for UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations
Figure 4 for UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations

We propose Unified Visual-Semantic Embeddings (UniVSE) for learning a joint space of visual and textual concepts. The space unifies the concepts at different levels, including objects, attributes, relations, and full scenes. A contrastive learning approach is proposed for the fine-grained alignment from only image-caption pairs. Moreover, we present an effective approach for enforcing the coverage of semantic components that appear in the sentence. We demonstrate the robustness of Unified VSE in defending text-domain adversarial attacks on cross-modal retrieval tasks. Such robustness also empowers the use of visual cues to resolve word dependencies in novel sentences.

* v1 is the full version which is accepted by CVPR 2019. v2 is the short version accepted by NAACL 2019 SpLU-RoboNLP workshop (in non-archival proceedings) 
Viaarxiv icon

Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations

Apr 11, 2019
Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, Wei-Ying Ma

Figure 1 for Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations
Figure 2 for Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations
Figure 3 for Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations
Figure 4 for Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations

We propose the Unified Visual-Semantic Embeddings (Unified VSE) for learning a joint space of visual representation and textual semantics. The model unifies the embeddings of concepts at different levels: objects, attributes, relations, and full scenes. We view the sentential semantics as a combination of different semantic components such as objects and relations; their embeddings are aligned with different image regions. A contrastive learning approach is proposed for the effective learning of this fine-grained alignment from only image-caption pairs. We also present a simple yet effective approach that enforces the coverage of caption embeddings on the semantic components that appear in the sentence. We demonstrate that the Unified VSE outperforms baselines on cross-modal retrieval tasks; the enforcement of the semantic coverage improves the model's robustness in defending text-domain adversarial attacks. Moreover, our model empowers the use of visual cues to accurately resolve word dependencies in novel sentences.

* Accepted by CVPR 2019 
Viaarxiv icon

A stochastic version of Stein Variational Gradient Descent for efficient sampling

Apr 11, 2019
Lei Li, Yingzhou Li, Jian-Guo Liu, Zibu Liu, Jianfeng Lu

Figure 1 for A stochastic version of Stein Variational Gradient Descent for efficient sampling
Figure 2 for A stochastic version of Stein Variational Gradient Descent for efficient sampling
Figure 3 for A stochastic version of Stein Variational Gradient Descent for efficient sampling
Figure 4 for A stochastic version of Stein Variational Gradient Descent for efficient sampling

We propose in this work RBM-SVGD, a stochastic version of Stein Variational Gradient Descent (SVGD) method for efficiently sampling from a given probability measure and thus useful for Bayesian inference. The method is to apply the Random Batch Method (RBM) for interacting particle systems proposed by Jin et al to the interacting particle systems in SVGD. While keeping the behaviors of SVGD, it reduces the computational cost, especially when the interacting kernel has long range. Numerical examples verify the efficiency of this new version of SVGD.

Viaarxiv icon

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

Apr 06, 2019
Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang

Figure 1 for VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Figure 2 for VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Figure 3 for VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Figure 4 for VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

We present a new large-scale multilingual video description dataset, VATEX, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSR-VTT dataset, VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context. Extensive experiments on the VATEX dataset show that, first, the unified multilingual model can not only produce both English and Chinese descriptions for a video more efficiently, but also offer improved performance over the monolingual models. Furthermore, we demonstrate that the spatiotemporal video context can be effectively utilized to align source and target languages and thus assist machine translation. In the end, we discuss the potentials of using VATEX for other video-and-language research.

* Technical Report. 16 pages, 14 figures, 6 tables 
Viaarxiv icon