Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yongyi Mao

On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

Oct 07, 2021

Ziqiao Wang, Yongyi Mao

Figure 1 for On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

Figure 2 for On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

Figure 3 for On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

Figure 4 for On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

Abstract:This paper follows up on a recent work of (Neu, 2021) and presents new and tighter information-theoretic upper bounds for the generalization error of machine learning models, such as neural networks, trained with SGD. We apply these bounds to analyzing the generalization behaviour of linear and two-layer ReLU networks. Experimental study based on these bounds provide some insights on the SGD training of neural networks. They also point to a new and simple regularization scheme which we show performs comparably to the current state of the art.

Via

Access Paper or Ask Questions

Robust Regularization with Adversarial Labelling of Perturbed Samples

May 28, 2021

Xiaohui Guo, Richong Zhang, Yaowei Zheng, Yongyi Mao

Figure 1 for Robust Regularization with Adversarial Labelling of Perturbed Samples

Figure 2 for Robust Regularization with Adversarial Labelling of Perturbed Samples

Figure 3 for Robust Regularization with Adversarial Labelling of Perturbed Samples

Figure 4 for Robust Regularization with Adversarial Labelling of Perturbed Samples

Abstract:Recent researches have suggested that the predictive accuracy of neural network may contend with its adversarial robustness. This presents challenges in designing effective regularization schemes that also provide strong adversarial robustness. Revisiting Vicinal Risk Minimization (VRM) as a unifying regularization principle, we propose Adversarial Labelling of Perturbed Samples (ALPS) as a regularization scheme that aims at improving the generalization ability and adversarial robustness of the trained model. ALPS trains neural networks with synthetic samples formed by perturbing each authentic input sample towards another one along with an adversarially assigned label. The ALPS regularization objective is formulated as a min-max problem, in which the outer problem is minimizing an upper-bound of the VRM loss, and the inner problem is L$_1$-ball constrained adversarial labelling on perturbed sample. The analytic solution to the induced inner maximization problem is elegantly derived, which enables computational efficiency. Experiments on the SVHN, CIFAR-10, CIFAR-100 and Tiny-ImageNet datasets show that the ALPS has a state-of-the-art regularization performance while also serving as an effective adversarial training scheme.

* Accepted to IJCAI2021

Via

Access Paper or Ask Questions

On the Dynamics of Training Attention Models

Nov 19, 2020

Haoye Lu, Yongyi Mao, Amiya Nayak

Figure 1 for On the Dynamics of Training Attention Models

Figure 2 for On the Dynamics of Training Attention Models

Figure 3 for On the Dynamics of Training Attention Models

Figure 4 for On the Dynamics of Training Attention Models

Abstract:The attention mechanism has been widely used in deep neural networks as a model component. By now, it has become a critical building block in many state-of-the-art natural language models. Despite its great success established empirically, the working mechanism of attention has not been investigated at a sufficient theoretical depth to date. In this paper, we set up a simple text classification task and study the dynamics of training a simple attention-based classification model using gradient descent. In this setting, we show that, for the discriminative words that the model should attend to, a persisting identity exists relating its embedding and the inner product of its key and the query. This allows us to prove that training must converge to attending to the discriminative words when the attention output is classified by a linear classifier. Experiments are performed, which validates our theoretical analysis and provides further insights.

Via

Access Paper or Ask Questions

Regularizing Neural Networks via Adversarial Model Perturbation

Oct 10, 2020

Yaowei Zheng, Richong Zhang, Yongyi Mao

Figure 1 for Regularizing Neural Networks via Adversarial Model Perturbation

Figure 2 for Regularizing Neural Networks via Adversarial Model Perturbation

Figure 3 for Regularizing Neural Networks via Adversarial Model Perturbation

Figure 4 for Regularizing Neural Networks via Adversarial Model Perturbation

Abstract:Recent research has suggested that when training neural networks, flat local minima of the empirical risk may cause the model to generalize better. Motivated by this understanding, we propose a new regularization scheme. In this scheme, referred to as adversarial model perturbation (AMP), instead directly minimizing the empirical risk, an alternative "AMP loss" function is minimized. Specifically, the AMP loss is obtained from the empirical risk by applying the "worst" norm-bounded perturbation on each point in the parameter space. We theoretically justify that minimizing the AMP loss favours flat local minima of the empirical risk and thereby improves generalization. Extensive experiments establish AMP as a new state of the art among regularization schemes.

* 14 pages, 11 figures, submitted to AAAI 2021

Via

Access Paper or Ask Questions

Neural Dialogue State Tracking with Temporally Expressive Networks

Oct 03, 2020

Junfan Chen, Richong Zhang, Yongyi Mao, Jie Xu

Figure 1 for Neural Dialogue State Tracking with Temporally Expressive Networks

Figure 2 for Neural Dialogue State Tracking with Temporally Expressive Networks

Figure 3 for Neural Dialogue State Tracking with Temporally Expressive Networks

Figure 4 for Neural Dialogue State Tracking with Temporally Expressive Networks

Abstract:Dialogue state tracking (DST) is an important part of a spoken dialogue system. Existing DST models either ignore temporal feature dependencies across dialogue turns or fail to explicitly model temporal state dependencies in a dialogue. In this work, we propose Temporally Expressive Networks (TEN) to jointly model the two types of temporal dependencies in DST. The TEN model utilizes the power of recurrent networks and probabilistic graphical models. Evaluating on standard datasets, TEN is demonstrated to be effective in improving the accuracy of turn-level-state prediction and the state aggregation.

* Accepted by Findings of EMNLP 2020

Via

Access Paper or Ask Questions

Parallel Interactive Networks for Multi-Domain Dialogue State Generation

Oct 03, 2020

Junfan Chen, Richong Zhang, Yongyi Mao, Jie Xu

Figure 1 for Parallel Interactive Networks for Multi-Domain Dialogue State Generation

Figure 2 for Parallel Interactive Networks for Multi-Domain Dialogue State Generation

Figure 3 for Parallel Interactive Networks for Multi-Domain Dialogue State Generation

Figure 4 for Parallel Interactive Networks for Multi-Domain Dialogue State Generation

Abstract:The dependencies between system and user utterances in the same turn and across different turns are not fully considered in existing multidomain dialogue state tracking (MDST) models. In this study, we argue that the incorporation of these dependencies is crucial for the design of MDST and propose Parallel Interactive Networks (PIN) to model these dependencies. Specifically, we integrate an interactive encoder to jointly model the in-turn dependencies and cross-turn dependencies. The slot-level context is introduced to extract more expressive features for different slots. And a distributed copy mechanism is utilized to selectively copy words from historical system utterances or historical user utterances. Empirical studies demonstrated the superiority of the proposed PIN model.

* Accepted by EMNLP 2020

Via

Access Paper or Ask Questions

On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions

Sep 02, 2020

Ziqiao Wang, Yongyi Mao, Hongyu Guo, Richong Zhang

Figure 1 for On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions

Figure 2 for On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions

Figure 3 for On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions

Figure 4 for On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions

Abstract:SkipGram word embedding models with negative sampling, or SGN in short, is an elegant family of word embedding models. In this paper, we formulate a framework for word embedding, referred to as Word-Context Classification (WCC), that generalizes SGN to a wide family of models. The framework, utilizing some "noise examples", is justified through a theoretical analysis. The impact of noise distribution on the learning of the WCC embedding models is studied experimentally, suggesting that the best noise distribution is in fact the data distribution, in terms of both the embedding performance and the speed of convergence during training. Along our way, we discover several novel embedding models that outperform the existing WCC models.

Via

Access Paper or Ask Questions

Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations

May 01, 2020

Kai Sun, Richong Zhang, Samuel Mensah, Yongyi Mao, Xudong Liu

Figure 1 for Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations

Figure 2 for Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations

Figure 3 for Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations

Figure 4 for Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations

Abstract:Named entity recognition (NER) and Relation extraction (RE) are two fundamental tasks in natural language processing applications. In practice, these two tasks are often to be solved simultaneously. Traditional multi-task learning models implicitly capture the correlations between NER and RE. However, there exist intrinsic connections between the output of NER and RE. In this study, we argue that an explicit interaction between the NER model and the RE model will better guide the training of both models. Based on the traditional multi-task learning framework, we design an interactive feature encoding method to capture the intrinsic connections between NER and RE tasks. In addition, we propose a recurrent interaction network to progressively capture the correlation between the two models. Empirical studies on two real-world datasets confirm the superiority of the proposed model.

Via

Access Paper or Ask Questions

Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers

Jan 12, 2020

Masoumeh Soflaei, Hongyu Guo, Ali Al-Bashabsheh, Yongyi Mao, Richong Zhang

Figure 1 for Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers

Figure 2 for Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers

Figure 3 for Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers

Figure 4 for Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers

Abstract:We consider the problem of learning a neural network classifier. Under the information bottleneck (IB) principle, we associate with this classification problem a representation learning problem, which we call "IB learning". We show that IB learning is, in fact, equivalent to a special class of the quantization problem. The classical results in rate-distortion theory then suggest that IB learning can benefit from a "vector quantization" approach, namely, simultaneously learning the representations of multiple input objects. Such an approach assisted with some variational techniques, result in a novel learning framework, "Aggregated Learning", for classification with neural network models. In this framework, several objects are jointly classified by a single neural network. The effectiveness of this framework is verified through extensive experiments on standard image recognition and text classification tasks.

Via

Access Paper or Ask Questions

Uncover the Ground-Truth Relations in Distant Supervision: A Neural Expectation-Maximization Framework

Sep 12, 2019

Junfan Chen, Richong Zhang, Yongyi Mao, Hongyu Guo, Jie Xu

Figure 1 for Uncover the Ground-Truth Relations in Distant Supervision: A Neural Expectation-Maximization Framework

Figure 2 for Uncover the Ground-Truth Relations in Distant Supervision: A Neural Expectation-Maximization Framework

Figure 3 for Uncover the Ground-Truth Relations in Distant Supervision: A Neural Expectation-Maximization Framework

Figure 4 for Uncover the Ground-Truth Relations in Distant Supervision: A Neural Expectation-Maximization Framework

Abstract:Distant supervision for relation extraction enables one to effectively acquire structured relations out of very large text corpora with less human efforts. Nevertheless, most of the prior-art models for such tasks assume that the given text can be noisy, but their corresponding labels are clean. Such unrealistic assumption is contradictory with the fact that the given labels are often noisy as well, thus leading to significant performance degradation of those models on real-world data. To cope with this challenge, we propose a novel label-denoising framework that combines neural network with probabilistic modelling, which naturally takes into account the noisy labels during learning. We empirically demonstrate that our approach significantly improves the current art in uncovering the ground-truth relation labels.

* To appear in 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing

Via

Access Paper or Ask Questions