We present a novel neural network model that learns POS tagging and graph-based dependency parsing jointly. Our model uses bidirectional LSTMs to learn feature representations shared for both POS tagging and dependency parsing tasks, thus handling the feature-engineering problem. Our extensive experiments, on 19 languages from the Universal Dependencies project, show that our model outperforms the state-of-the-art neural network-based Stack-propagation model for joint POS tagging and transition-based dependency parsing, resulting in a new state of the art. Our code is open-source and available together with pre-trained models at: https://github.com/datquocnguyen/jPTDP
Ensemble methods using multiple classifiers have proven to be the most successful approach for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble architectures such as classifier stacking have not been closely evaluated. We present a set of experiments using three ensemble-based models, testing each with multiple configurations and algorithms. This includes a rigorous application of meta-classification models for NLI, achieving state-of-the-art results on three datasets from different languages. We also present the first use of statistical significance testing for comparing NLI systems, showing that our results are significantly better than the previous state of the art. We make available a collection of test set predictions to facilitate future statistical tests.
This paper presents an empirical comparison of different dependency parsers for Vietnamese, which has some unusual characteristics such as copula drop and verb serialization. Experimental results show that the neural network-based parsers perform significantly better than the traditional parsers. We report the highest parsing scores published to date for Vietnamese with the labeled attachment score (LAS) at 73.53% and the unlabeled attachment score (UAS) at 80.66%.
This paper presents a number of experiments to model changes in a historical Portuguese corpus composed of literary texts for the purpose of temporal text classification. Algorithms were trained to classify texts with respect to their publication date taking into account lexical variation represented as word n-grams, and morphosyntactic variation represented by part-of-speech (POS) distribution. We report results of 99.8% accuracy using word unigram features with a Support Vector Machines classifier to predict the publication date of documents in time intervals of both one century and half a century. A feature analysis is performed to investigate the most informative features for this task and how they are linked to language change.
This paper develops a computational model of paraphrase under which text modification is carried out reluctantly; that is, there are external constraints, such as length or readability, on an otherwise ideal text, and modifications to the text are necessary to ensure conformance to these constraints. This problem is analogous to a mathematical optimisation problem: the textual constraints can be described as a set of constraint equations, and the requirement for minimal change to the text can be expressed as a function to be minimised; so techniques from this domain can be used to solve the problem. The work is done as part of a computational paraphrase system using the XTAG system as a base. The paper will present a theoretical computational framework for working within the Reluctant Paraphrase paradigm: three types of textual constraints are specified, effects of paraphrase on text are described, and a model incorporating mathematical optimisation techniques is outlined.
Some verbs have a particular kind of binary ambiguity: they can carry their normal, full meaning, or they can be merely acting as a prop for the nominal object. It has been suggested that there is a detectable pattern in the relationship between a verb acting as a prop (a \term{support verb}) and the noun it supports. The task this paper undertakes is to develop a model which identifies the support verb for a particular noun, and by extension, when nouns are enumerated, a model which disambiguates a verb with respect to its support status. The paper sets up a basic model as a standard for comparison; it then proposes a more complex model, and gives some results to support the model's validity, comparing it with other similar approaches.
Current definitions of notions of lexical density and semantic weight are based on the division of words into closed and open classes, and on intuition. This paper develops a computationally tractable definition of semantic weight, concentrating on what it means for a word to be semantically light; the definition involves looking at the frequency of a word in particular syntactic constructions which are indicative of lightness. Verbs such as "make" and "take", when they function as support verbs, are often considered to be semantically light. To test our definition, we carried out an experiment based on that of Grefenstette and Teufel (1995), where we automatically identify light instances of these words in a corpus; this was done by incorporating our frequency-related definition of semantic weight into a statistical approach similar to that of Grefenstette and Teufel. The results show that this is a plausible definition of semantic lightness for verbs, which can possibly be extended to defining semantic lightness for other classes of words.
Compound nouns such as example noun compound are becoming more common in natural language and pose a number of difficult problems for NLP systems, notably increasing the complexity of parsing. In this paper we develop a probabilistic model for syntactically analysing such compounds. The model predicts compound noun structures based on knowledge of affinities between nouns, which can be acquired from a corpus. Problems inherent in this corpus-based approach are addressed: data sparseness is overcome by the use of semantically motivated word classes and sense ambiguity is explicitly handled in the model. An implementation based on this model is described in Lauer (1994) and correctly parses 77% of the test set.