This research is dedicated to improving text-independent Emirati-accented speaker identification performance in stressful talking conditions using three distinct classifiers: First-Order Hidden Markov Models (HMM1s), Second-Order Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models (HMM3s). The database that has been used in this work was collected from 25 per gender Emirati native speakers uttering eight widespread Emirati sentences in each of neutral, shouted, slow, loud, soft, and fast talking conditions. The extracted features of the captured database are called Mel-Frequency Cepstral Coefficients (MFCCs). Based on HMM1s, HMM2s, and HMM3s, average Emirati-accented speaker identification accuracy in stressful conditions is 58.6%, 61.1%, and 65.0%, respectively. The achieved average speaker identification accuracy in stressful conditions based on HMM3s is so similar to that attained in subjective assessment by human listeners.
This study compares the effectiveness and robustness of multi-class categorization of Amazon product data using transfer learning on pre-trained contextualized language models. Specifically, we fine-tuned BERT and XLNet, two bidirectional models that have achieved state-of-the-art performance on many natural language tasks and benchmarks, including text classification. While existing classification studies and benchmarks focus on binary targets, with the exception of ordinal ranking tasks, here we examine the robustness of such models as the number of classes grows from 1 to 20. Our experiments demonstrate an approximately linear decrease in performance metrics (i.e., precision, recall, $F_1$ score, and accuracy) with the number of class labels. BERT consistently outperforms XLNet using identical hyperparameters on the entire range of class label quantities for categorizing products based on their textual descriptions. BERT is also more affordable than XLNet in terms of the computational cost (i.e., time and memory) required for training.
Distributed representations of words have shown to be useful to improve the effectiveness of IR systems in many sub-tasks like query expansion, retrieval and ranking. Algorithms like word2vec, GloVe and others are also key factors in many improvements in different NLP tasks. One common issue with such embedding models is that words like happy and sad appear in similar contexts and hence are wrongly clustered close in the embedding space. In this paper we leverage Aff2Vec, a set of word embeddings models which include affect information, in order to better capture the affect aspect in news text to achieve better results in information retrieval tasks, also such embeddings are less hit by the synonym/antonym issue. We evaluate their effectiveness on two IR related tasks (query expansion and ranking) over the New York Times dataset (TREC-core '17) comparing them against other word embeddings based models and classic ranking models.
Neural generation methods for task-oriented dialogue typically generate from a meaning representation that is populated using a database of domain information, such as a table of data describing a restaurant. While earlier work focused solely on the semantic fidelity of outputs, recent work has started to explore methods for controlling the style of the generated text while simultaneously achieving semantic accuracy. Here we experiment with two stylistic benchmark tasks, generating language that exhibits variation in personality, and generating discourse contrast. We report a huge performance improvement in both stylistic control and semantic accuracy over the state of the art on both of these benchmarks. We test several different models and show that putting stylistic conditioning in the decoder and eliminating the semantic re-ranker used in earlier models results in more than 15 points higher BLEU for Personality, with a reduction of semantic error to near zero. We also report an improvement from .75 to .81 in controlling contrast and a reduction in semantic error from 16% to 2%.
Arguments, counter-arguments, facts, and evidence obtained via documents related to previous court cases are of essential need for legal professionals. Therefore, the process of automatic information extraction from documents containing legal opinions related to court cases can be considered to be of significant importance. This study is focused on the identification of sentences in legal opinion texts which convey different perspectives on a certain topic or entity. We combined several approaches based on semantic analysis, open information extraction, and sentiment analysis to achieve our objective. Then, our methodology was evaluated with the help of human judges. The outcomes of the evaluation demonstrate that our system is successful in detecting situations where two sentences deliver different opinions on the same topic or entity. The proposed methodology can be used to facilitate other information extraction tasks related to the legal domain. One such task is the automated detection of counter arguments for a given argument. Another is the identification of opponent parties in a court case.
Temporal word embeddings have been proposed to support the analysis of word meaning shifts during time and to study the evolution of languages. Different approaches have been proposed to generate vector representations of words that embed their meaning during a specific time interval. However, the training process used in these approaches is complex, may be inefficient or it may require large text corpora. As a consequence, these approaches may be difficult to apply in resource-scarce domains or by scientists with limited in-depth knowledge of embedding models. In this paper, we propose a new heuristic to train temporal word embeddings based on the Word2vec model. The heuristic consists in using atemporal vectors as a reference, i.e., as a compass, when training the representations specific to a given time interval. The use of the compass simplifies the training process and makes it more efficient. Experiments conducted using state-of-the-art datasets and methodologies suggest that our approach outperforms or equals comparable approaches while being more robust in terms of the required corpus size.
Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine learning algorithms, the process of training a LDA model may leak the sensitive information of the training datasets and bring significant privacy risks. To mitigate the privacy issues in LDA, we focus on studying privacy-preserving algorithms of LDA model training in this paper. In particular, we first develop a privacy monitoring algorithm to investigate the privacy guarantee obtained from the inherent randomness of the Collapsed Gibbs Sampling (CGS) process in a typical LDA training algorithm on centralized curated datasets. Then, we further propose a locally private LDA training algorithm on crowdsourced data to provide local differential privacy for individual data contributors. The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms.
The use of deep networks to extract embeddings for speaker recognition has proven successfully. However, such embeddings are susceptible to performance degradation due to the mismatches among the training, enrollment, and test conditions. In this work, we propose an adversarial speaker verification (ASV) scheme to learn the condition-invariant deep embedding via adversarial multi-task training. In ASV, a speaker classification network and a condition identification network are jointly optimized to minimize the speaker classification loss and simultaneously mini-maximize the condition loss. The target labels of the condition network can be categorical (environment types) and continuous (SNR values). We further propose multi-factorial ASV to simultaneously suppress multiple factors that constitute the condition variability. Evaluated on a Microsoft Cortana text-dependent speaker verification task, the ASV achieves 8.8% and 14.5% relative improvements in equal error rates (EER) for known and unknown conditions, respectively.