Alert button
Picture for Yanshuai Cao

Yanshuai Cao

Alert button

Ensemble Distillation for Unsupervised Constituency Parsing

Oct 03, 2023
Behzad Shayegh, Yanshuai Cao, Xiaodan Zhu, Jackie C. K. Cheung, Lili Mou

We investigate the unsupervised constituency parsing task, which organizes words and phrases of a sentence into a hierarchical structure without using linguistically annotated data. We observe that existing unsupervised parsers capture differing aspects of parsing structures, which can be leveraged to enhance unsupervised parsing performance. To this end, we propose a notion of "tree averaging," based on which we further propose a novel ensemble method for unsupervised parsing. To improve inference efficiency, we further distill the ensemble knowledge into a student model; such an ensemble-then-distill process is an effective approach to mitigate the over-smoothing problem existing in common multi-teacher distilling methods. Experiments show that our method surpasses all previous approaches, consistently demonstrating its effectiveness and robustness across various runs, with different ensemble components, and under domain-shift conditions.

Viaarxiv icon

An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation

Sep 29, 2022
Yuqiao Wen, Yongchang Hao, Yanshuai Cao, Lili Mou

Figure 1 for An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation
Figure 2 for An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation
Figure 3 for An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation
Figure 4 for An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation

Open-domain dialogue systems aim to interact with humans through natural language texts in an open-ended fashion. However, the widely successful neural networks may not work well for dialogue systems, as they tend to generate generic responses. In this work, we propose an Equal-size Hard Expectation--Maximization (EqHard-EM) algorithm to train a multi-decoder model for diverse dialogue generation. Our algorithm assigns a sample to a decoder in a hard manner and additionally imposes an equal-assignment constraint to ensure that all decoders are well-trained. We provide detailed theoretical analysis to justify our approach. Further, experiments on two large-scale, open-domain dialogue datasets verify that our EqHard-EM algorithm generates high-quality diverse responses.

Viaarxiv icon

Hierarchical Neural Data Synthesis for Semantic Parsing

Dec 04, 2021
Wei Yang, Peng Xu, Yanshuai Cao

Figure 1 for Hierarchical Neural Data Synthesis for Semantic Parsing
Figure 2 for Hierarchical Neural Data Synthesis for Semantic Parsing
Figure 3 for Hierarchical Neural Data Synthesis for Semantic Parsing
Figure 4 for Hierarchical Neural Data Synthesis for Semantic Parsing

Semantic parsing datasets are expensive to collect. Moreover, even the questions pertinent to a given domain, which are the input of a semantic parsing system, might not be readily available, especially in cross-domain semantic parsing. This makes data augmentation even more challenging. Existing methods to synthesize new data use hand-crafted or induced rules, requiring substantial engineering effort and linguistic expertise to achieve good coverage and precision, which limits the scalability. In this work, we propose a purely neural approach of data augmentation for semantic parsing that completely removes the need for grammar engineering while achieving higher semantic parsing accuracy. Furthermore, our method can synthesize in the zero-shot setting, where only a new domain schema is available without any input-output examples of the new domain. On the Spider cross-domain text-to-SQL semantic parsing benchmark, we achieve the state-of-the-art performance on the development set (77.2% accuracy) using our zero-shot augmentation.

Viaarxiv icon

Turing: an Accurate and Interpretable Multi-Hypothesis Cross-Domain Natural Language Database Interface

Jun 08, 2021
Peng Xu, Wenjie Zi, Hamidreza Shahidi, Ákos Kádár, Keyi Tang, Wei Yang, Jawad Ateeq, Harsh Barot, Meidan Alon, Yanshuai Cao

Figure 1 for Turing: an Accurate and Interpretable Multi-Hypothesis Cross-Domain Natural Language Database Interface
Figure 2 for Turing: an Accurate and Interpretable Multi-Hypothesis Cross-Domain Natural Language Database Interface
Figure 3 for Turing: an Accurate and Interpretable Multi-Hypothesis Cross-Domain Natural Language Database Interface
Figure 4 for Turing: an Accurate and Interpretable Multi-Hypothesis Cross-Domain Natural Language Database Interface

A natural language database interface (NLDB) can democratize data-driven insights for non-technical users. However, existing Text-to-SQL semantic parsers cannot achieve high enough accuracy in the cross-database setting to allow good usability in practice. This work presents Turing, a NLDB system toward bridging this gap. The cross-domain semantic parser of Turing with our novel value prediction method achieves $75.1\%$ execution accuracy, and $78.3\%$ top-5 beam execution accuracy on the Spider validation set. To benefit from the higher beam accuracy, we design an interactive system where the SQL hypotheses in the beam are explained step-by-step in natural language, with their differences highlighted. The user can then compare and judge the hypotheses to select which one reflects their intention if any. The English explanations of SQL queries in Turing are produced by our high-precision natural language generation system based on synchronous grammars.

* ACL 2021 demonstration track 
Viaarxiv icon

A Globally Normalized Neural Model for Semantic Parsing

Jun 07, 2021
Chenyang Huang, Wei Yang, Yanshuai Cao, Osmar Zaïane, Lili Mou

Figure 1 for A Globally Normalized Neural Model for Semantic Parsing
Figure 2 for A Globally Normalized Neural Model for Semantic Parsing
Figure 3 for A Globally Normalized Neural Model for Semantic Parsing
Figure 4 for A Globally Normalized Neural Model for Semantic Parsing

In this paper, we propose a globally normalized model for context-free grammar (CFG)-based semantic parsing. Instead of predicting a probability, our model predicts a real-valued score at each step and does not suffer from the label bias problem. Experiments show that our approach outperforms locally normalized models on small datasets, but it does not yield improvement on a large dataset.

Viaarxiv icon

Semantic Parsing with Less Prior and More Monolingual Data

Jan 01, 2021
Sajad Norouzi, Yanshuai Cao

Figure 1 for Semantic Parsing with Less Prior and More Monolingual Data
Figure 2 for Semantic Parsing with Less Prior and More Monolingual Data
Figure 3 for Semantic Parsing with Less Prior and More Monolingual Data
Figure 4 for Semantic Parsing with Less Prior and More Monolingual Data

Semantic parsing is the task of converting natural language utterances to machine-understandable meaning representations, such as logic forms or programming languages. Training datasets for semantic parsing are typically small due to the higher expertise required for annotation than most other NLP tasks. As a result, models for this application usually require additional prior knowledge to be built into the architecture or algorithm. The increased dependency on human experts hinders automation and raises the development and maintenance costs in practice. This work investigates whether a generic transformer-based seq2seq model can achieve competitive performance with minimal semantic-parsing specific inductive bias design. By exploiting a relatively large monolingual corpus of the target programming language, which is cheap to mine from the web, unlike a parallel corpus, we achieved 80.75% exact match accuracy on Django and 32.57 BLEU score on CoNaLa, both are SOTA to the best of our knowledge. This positive evidence highlights a potentially easier path toward building accurate semantic parsers in the wild.

Viaarxiv icon

Optimizing Deeper Transformers on Small Datasets: An Application on Text-to-SQL Semantic Parsing

Dec 30, 2020
Peng Xu, Wei Yang, Wenjie Zi, Keyi Tang, Chengyang Huang, Jackie Chi Kit Cheung, Yanshuai Cao

Figure 1 for Optimizing Deeper Transformers on Small Datasets: An Application on Text-to-SQL Semantic Parsing
Figure 2 for Optimizing Deeper Transformers on Small Datasets: An Application on Text-to-SQL Semantic Parsing
Figure 3 for Optimizing Deeper Transformers on Small Datasets: An Application on Text-to-SQL Semantic Parsing
Figure 4 for Optimizing Deeper Transformers on Small Datasets: An Application on Text-to-SQL Semantic Parsing

Due to the common belief that training deep transformers from scratch requires large datasets, people usually only use shallow and simple additional layers on top of pre-trained models during fine-tuning on small datasets. We provide evidence that this does not always need to be the case: with proper initialization and training techniques, the benefits of very deep transformers are shown to carry over to hard structural prediction tasks, even using small datasets. In particular, we successfully train 48 layers of transformers for a semantic parsing task. These comprise 24 fine-tuned transformer layers from pre-trained RoBERTa and 24 relation-aware transformer layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state of the art performance on the challenging cross-domain Text-to-SQL semantic parsing benchmark Spider. We achieve this by deriving a novel Data dependent Transformer Fixed-update initialization scheme (DT-Fixup), inspired by the prior T-Fixup work. Further error analysis demonstrates that increasing the depth of the transformer model can help improve generalization on the cases requiring reasoning and structural understanding.

* Work in progress 
Viaarxiv icon

Evaluating Lossy Compression Rates of Deep Generative Models

Aug 15, 2020
Sicong Huang, Alireza Makhzani, Yanshuai Cao, Roger Grosse

Figure 1 for Evaluating Lossy Compression Rates of Deep Generative Models
Figure 2 for Evaluating Lossy Compression Rates of Deep Generative Models
Figure 3 for Evaluating Lossy Compression Rates of Deep Generative Models
Figure 4 for Evaluating Lossy Compression Rates of Deep Generative Models

The field of deep generative modeling has succeeded in producing astonishingly realistic-seeming images and audio, but quantitative evaluation remains a challenge. Log-likelihood is an appealing metric due to its grounding in statistics and information theory, but it can be challenging to estimate for implicit generative models, and scalar-valued metrics give an incomplete picture of a model's quality. In this work, we propose to use rate distortion (RD) curves to evaluate and compare deep generative models. While estimating RD curves is seemingly even more computationally demanding than log-likelihood estimation, we show that we can approximate the entire RD curve using nearly the same computations as were previously used to achieve a single log-likelihood estimate. We evaluate lossy compression rates of VAEs, GANs, and adversarial autoencoders (AAEs) on the MNIST and CIFAR10 datasets. Measuring the entire RD curve gives a more complete picture than scalar-valued metrics, and we arrive at a number of insights not obtainable from log-likelihoods alone.

Viaarxiv icon

Variational Hyper RNN for Sequence Modeling

Feb 24, 2020
Ruizhi Deng, Yanshuai Cao, Bo Chang, Leonid Sigal, Greg Mori, Marcus A. Brubaker

Figure 1 for Variational Hyper RNN for Sequence Modeling
Figure 2 for Variational Hyper RNN for Sequence Modeling
Figure 3 for Variational Hyper RNN for Sequence Modeling
Figure 4 for Variational Hyper RNN for Sequence Modeling

In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence. Our method uses temporal latent variables to capture information about the underlying data pattern and dynamically decodes the latent information into modifications of weights of the base decoder and recurrent model. The efficacy of the proposed method is demonstrated on a range of synthetic and real-world sequential data that exhibit large scale variations, regime shifts, and complex dynamics.

Viaarxiv icon

Preventing Posterior Collapse in Sequence VAEs with Pooling

Nov 10, 2019
Teng Long, Yanshuai Cao, Jackie Chi Kit Cheung

Figure 1 for Preventing Posterior Collapse in Sequence VAEs with Pooling
Figure 2 for Preventing Posterior Collapse in Sequence VAEs with Pooling
Figure 3 for Preventing Posterior Collapse in Sequence VAEs with Pooling
Figure 4 for Preventing Posterior Collapse in Sequence VAEs with Pooling

Variational Autoencoders (VAEs) hold great potential for modelling text, as they could in theory separate high-level semantic and syntactic properties from local regularities of natural language. Practically, however, VAEs with autoregressive decoders often suffer from posterior collapse, a phenomenon where the model learns to ignore the latent variables, causing the sequence VAE to degenerate into a language model. Previous works attempt to solve this problem with complex architectural changes or costly optimization schemes. In this paper, we argue that posterior collapse is caused in part by the encoder network failing to capture the input variabilities. We verify this hypothesis empirically and propose a straightforward fix using pooling. This simple technique effectively prevents posterior collapse, allowing the model to achieve significantly better data log-likelihood than standard sequence VAEs. Compared to the previous SOTA on preventing posterior collapse, we are able to achieve comparable performances while being significantly faster.

Viaarxiv icon