This paper describes technology developed to automatically grade Italian students (ages 9-16) on their English and German spoken language proficiency. The students' spoken answers are first transcribed by an automatic speech recognition (ASR) system and then scored using a feedforward neural network (NN) that processes features extracted from the automatic transcriptions. In-domain acoustic models, employing deep neural networks (DNNs), are derived by adapting the parameters of an original out of domain DNN.
In this work, we propose a deep neural network architecture motivated by primal-dual splitting methods from convex optimization. We show theoretically that there exists a close relation between the derived architecture and residual networks, and further investigate this connection in numerical experiments. Moreover, we demonstrate how our approach can be used to unroll optimization algorithms for certain problems with hard constraints. Using the example of speech dequantization, we show that our method can outperform classical splitting methods when both are applied to the same task.
In this paper we describe the use of text classification methods to investigate genre and method variation in an English - German translation corpus. For this purpose we use linguistically motivated features representing texts using a combination of part-of-speech tags arranged in bigrams, trigrams, and 4-grams. The classification method used in this paper is a Bayesian classifier with Laplace smoothing. We use the output of the classifiers to carry out an extensive feature analysis on the main difference between genres and methods of translation.
Systems now exist which are able to compile unification grammars into language models that can be included in a speech recognizer, but it is so far unclear whether non-trivial linguistically principled grammars can be used for this purpose. We describe a series of experiments which investigate the question empirically, by incrementally constructing a grammar and discovering what problems emerge when successively larger versions are compiled into finite state graph representations and used as language models for a medium-vocabulary recognition task.
This paper describes some of the recent work of project AMALGAM (automatic mapping among lexico-grammatical annotation models). We are investigating ways to map between the leading corpus annotation schemes in order to improve their resuability. Collation of all the included corpora into a single large annotated corpus will provide a more detailed language model to be developed for tasks such as speech and handwriting recognition. In particular, we focus here on a method of extracting mappings from corpora that have been annotated according to more than one annotation scheme.
We present PhoBERT with two versions of "base" and "large"--the first public large-scale monolingual language models pre-trained for Vietnamese. We show that PhoBERT improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part-of-speech tagging, Named-entity recognition and Natural language inference. We release PhoBERT to facilitate future research and downstream applications for Vietnamese NLP. Our PhoBERT is released at: https://github.com/VinAIResearch/PhoBERT
In this paper I present a new approach for regression of time series using their own samples. This is a celebrated problem known as Auto-Regression. Dealing with outlier or missed samples in a time series makes the problem of estimation difficult, so it should be robust against them. Moreover for coding purposes I will show that it is desired the residual of auto-regression be sparse. To these aims, I first assume a multivariate Gaussian prior on the residual and then obtain the estimation. Two simple simulations have been done on spectrum estimation and speech coding.
This paper extends a polynomial-time parsing algorithm that resolves structural ambiguity in input to a speech-based user interface by calculating and comparing the denotations of rival constituents, given some model of the interfaced application environment (Schuler 2001). The algorithm is extended to incorporate a full set of logical operators, including quantifiers and conjunctions, into this calculation without increasing the complexity of the overall algorithm beyond polynomial time, both in terms of the length of the input and the number of entities in the environment model.
Existing approaches to mitigate demographic biases evaluate on monolingual data, however, multilingual data has not been examined. In this work, we treat the gender as domains (e.g., male vs. female) and present a standard domain adaptation model to reduce the gender bias and improve performance of text classifiers under multilingual settings. We evaluate our approach on two text classification tasks, hate speech detection and rating prediction, and demonstrate the effectiveness of our approach with three fair-aware baselines.
In this paper we present our approach for detecting signs of depression from social media text. Our model relies on word unigrams, part-of-speech tags, readabilitiy measures and the use of first, second or third person and the number of words. Our best model obtained a macro F1-score of 0.439 and ranked 25th, out of 31 teams. We further take advantage of the interpretability of the Logistic Regression model and we make an attempt to interpret the model coefficients with the hope that these will be useful for further research on the topic.