Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) are widely used structured models, both of which can be represented as factor graph grammars (FGGs), a powerful formalism capable of describing a wide range of models. Recent research found it beneficial to use large state spaces for HMMs and PCFGs. However, inference with large state spaces is computationally demanding, especially for PCFGs. To tackle this challenge, we leverage tensor rank decomposition (aka.\ CPD) to decrease inference computational complexities for a subset of FGGs subsuming HMMs and PCFGs. We apply CPD on the factors of an FGG and then construct a new FGG defined in the rank space. Inference with the new FGG produces the same result but has a lower time complexity when the rank size is smaller than the state size. We conduct experiments on HMM language modeling and unsupervised PCFG parsing, showing better performance than previous work. Our code is publicly available at \url{https://github.com/VPeterV/RankSpace-Models}.
Second-order semantic parsing with end-to-end mean-field inference has been shown good performance. In this work we aim to improve this method by modeling label correlations between adjacent arcs. However, direct modeling leads to memory explosion because second-order score tensors have sizes of $O(n^3L^2)$ ($n$ is the sentence length and $L$ is the number of labels), which is not affordable. To tackle this computational challenge, we leverage tensor decomposition techniques, and interestingly, we show that the large second-order score tensors have no need to be materialized during mean-field inference, thereby reducing the computational complexity from cubic to quadratic. We conduct experiments on SemEval 2015 Task 18 English datasets, showing the effectiveness of modeling label correlations. Our code is publicly available at https://github.com/sustcsonglin/mean-field-dep-parsing.
Nested named entity recognition (NER) has been receiving increasing attention. Recently, (Fu et al, 2021) adapt a span-based constituency parser to tackle nested NER. They treat nested entities as partially-observed constituency trees and propose the masked inside algorithm for partial marginalization. However, their method cannot leverage entity heads, which have been shown useful in entity mention detection and entity typing. In this work, we resort to more expressive structures, lexicalized constituency trees in which constituents are annotated by headwords, to model nested entities. We leverage the Eisner-Satta algorithm to perform partial marginalization and inference efficiently. In addition, we propose to use (1) a two-stage strategy (2) a head regularization loss and (3) a head-aware labeling loss in order to enhance the performance. We make a thorough ablation study to investigate the functionality of each component. Experimentally, our method achieves the state-of-the-art performance on ACE2004, ACE2005 and NNE, and competitive performance on GENIA, and meanwhile has a fast inference speed.
Constituency parsing and nested named entity recognition (NER) are typical \textit{nested structured prediction} tasks since they both aim to predict a collection of nested and non-crossing spans. There are many previous studies adapting constituency parsing methods to tackle nested NER. In this work, we propose a novel global pointing mechanism for bottom-up parsing with pointer networks to do both tasks, which needs linear steps to parse. Our method obtain the state-of-the-art performance on PTB among all BERT-based models (96.01 F1 score) and competitive performance on CTB7 in constituency parsing; and comparable performance on three benchmark datasets of nested NER: ACE2004, ACE2005, and GENIA. Our code is publicly available at \url{https://github.com/sustcsonglin/pointer-net-for-nested}
Graph-based methods are popular in dependency parsing for decades. Recently, \citet{yang2021headed} propose a headed span-based method. Both of them score all possible trees and globally find the highest-scoring tree. In this paper, we combine these two kinds of methods, designing several dynamic programming algorithms for joint inference. Experiments show the effectiveness of our proposed methods\footnote{Our code is publicly available at \url{https://github.com/sustcsonglin/span-based-dependency-parsing}.}.
We propose a headed span-based method for projective dependency parsing. In a projective tree, the subtree rooted at each word occurs in a contiguous sequence (i.e., span) in the surface order, we call the span-headword pair \textit{headed span}. In this view, a projective tree can be regarded as a collection of headed spans. It is similar to the case in constituency parsing since a constituency tree can be regarded as a collection of constituent spans. Span-based methods decompose the score of a constituency tree sorely into the score of constituent spans and use the CYK algorithm for global training and exact inference, obtaining state-of-the-art results in constituency parsing. Inspired by them, we decompose the score of a dependency tree into the score of headed spans. We use neural networks to score headed spans and design a novel $O(n^3)$ dynamic programming algorithm to enable global training and exact inference. We evaluate our method on PTB, CTB, and UD, achieving state-of-the-art or comparable results.
With the identity information in face data more closely related to personal credit and property security, people pay increasing attention to the protection of face data privacy. In different tasks, people have various requirements for face de-identification (De-ID), so we propose a systematical solution compatible for these De-ID operations. Firstly, an attribute disentanglement and generative network is constructed to encode two parts of the face, which are the identity (facial features like mouth, nose and eyes) and expression (including expression, pose and illumination). Through face swapping, we can remove the original ID completely. Secondly, we add an adversarial vector mapping network to perturb the latent code of the face image, different from previous traditional adversarial methods. Through this, we can construct unrestricted adversarial image to decrease ID similarity recognized by model. Our method can flexibly de-identify the face data in various ways and the processed images have high image quality.
Neural lexicalized PCFGs (L-PCFGs) have been shown effective in grammar induction. However, to reduce computational complexity, they make a strong independence assumption on the generation of the child word and thus bilexical dependencies are ignored. In this paper, we propose an approach to parameterize L-PCFGs without making implausible independence assumptions. Our approach directly models bilexical dependencies and meanwhile reduces both learning and representation complexities of L-PCFGs. Experimental results on the English WSJ dataset confirm the effectiveness of our approach in improving both running speed and unsupervised parsing performance.
Probabilistic context-free grammars (PCFGs) with neural parameterization have been shown to be effective in unsupervised phrase-structure grammar induction. However, due to the cubic computational complexity of PCFG representation and parsing, previous approaches cannot scale up to a relatively large number of (nonterminal and preterminal) symbols. In this work, we present a new parameterization form of PCFGs based on tensor decomposition, which has at most quadratic computational complexity in the symbol number and therefore allows us to use a much larger number of symbols. We further use neural parameterization for the new form to improve unsupervised parsing performance. We evaluate our model across ten languages and empirically demonstrate the effectiveness of using more symbols. Our code: https://github.com/sustcsonglin/TN-PCFG
Most of the unsupervised dependency parsers are based on first-order probabilistic generative models that only consider local parent-child information. Inspired by second-order supervised dependency parsing, we proposed a second-order extension of unsupervised neural dependency models that incorporate grandparent-child or sibling information. We also propose a novel design of the neural parameterization and optimization methods of the dependency models. In second-order models, the number of grammar rules grows cubically with the increase of vocabulary size, making it difficult to train lexicalized models that may contain thousands of words. To circumvent this problem while still benefiting from both second-order parsing and lexicalization, we use the agreement-based learning framework to jointly train a second-order unlexicalized model and a first-order lexicalized model. Experiments on multiple datasets show the effectiveness of our second-order models compared with recent state-of-the-art methods. Our joint model achieves a 10% improvement over the previous state-of-the-art parser on the full WSJ test set