Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rajiv Ratn Shah

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Mar 30, 2022

Sreyan Ghosh, Sonal Kumar, Yaman Kumar Singla, Rajiv Ratn Shah, S. Umesh

Figure 1 for Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Figure 2 for Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Figure 3 for Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Figure 4 for Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Abstract:Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear sequences in text, thus ignoring the structured information in text which is efficiently captured by dependency trees. In this paper, building on the span classification paradigm of entity recognition, we propose a novel architecture for detecting disfluencies in transcripts from spoken utterances, incorporating both contextual information through transformers and long-distance structured information captured by dependency trees, through graph convolutional networks (GCNs). Experimental results show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior-art by a significant margin. We make all our codes publicly available on GitHub (https://github.com/Sreyan88/Disfluency-Detection-with-Span-Classification)

Via

Access Paper or Ask Questions

Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets

Dec 18, 2021

Zaki Mustafa Farooqi, Sreyan Ghosh, Rajiv Ratn Shah

Figure 1 for Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets

Figure 2 for Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets

Figure 3 for Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets

Figure 4 for Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets

Abstract:In the current era of the internet, where social media platforms are easily accessible for everyone, people often have to deal with threats, identity attacks, hate, and bullying due to their association with a cast, creed, gender, religion, or even acceptance or rejection of a notion. Existing works in hate speech detection primarily focus on individual comment classification as a sequence labeling task and often fail to consider the context of the conversation. The context of a conversation often plays a substantial role when determining the author's intent and sentiment behind the tweet. This paper describes the system proposed by team MIDAS-IIITD for HASOC 2021 subtask 2, one of the first shared tasks focusing on detecting hate speech from Hindi-English code-mixed conversations on Twitter. We approach this problem using neural networks, leveraging the transformer's cross-lingual embeddings and further finetuning them for low-resource hate-speech classification in transliterated Hindi text. Our best performing system, a hard voting ensemble of Indic-BERT, XLM-RoBERTa, and Multilingual BERT, achieved a macro F1 score of 0.7253, placing us first on the overall leaderboard standings.

* Accepted at FIRE 2021 - Hate Speech and offensive content detection (HASOC) Track

Via

Access Paper or Ask Questions

Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Nov 30, 2021

Pakhi Bamdev, Manraj Singh Grover, Yaman Kumar Singla, Payman Vafaee, Mika Hama, Rajiv Ratn Shah

Figure 1 for Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Figure 2 for Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Figure 3 for Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Figure 4 for Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Abstract:English proficiency assessments have become a necessary metric for filtering and selecting prospective candidates for both academia and industry. With the rise in demand for such assessments, it has become increasingly necessary to have the automated human-interpretable results to prevent inconsistencies and ensure meaningful feedback to the second language learners. Feature-based classical approaches have been more interpretable in understanding what the scoring model learns. Therefore, in this work, we utilize classical machine learning models to formulate a speech scoring task as both a classification and a regression problem, followed by a thorough study to interpret and study the relation between the linguistic cues and the English proficiency level of the speaker. First, we extract linguist features under five categories (fluency, pronunciation, content, grammar and vocabulary, and acoustic) and train models to grade responses. In comparison, we find that the regression-based models perform equivalent to or better than the classification approach. Second, we perform ablation studies to understand the impact of each of the feature and feature categories on the performance of proficiency grading. Further, to understand individual feature contributions, we present the importance of top features on the best performing algorithm for the grading task. Third, we make use of Partial Dependence Plots and Shapley values to explore feature importance and conclude that the best performing trained model learns the underlying rubrics used for grading the dataset used in this study.

* Accepted for publication in the International Journal of Artificial Intelligence in Education (IJAIED)

Via

Access Paper or Ask Questions

Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Nov 17, 2021

Yaman Kumar Singla, Sriram Krishna, Rajiv Ratn Shah, Changyou Chen

Figure 1 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Figure 2 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Figure 3 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Figure 4 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Abstract:Automated Scoring (AS), the natural language processing task of scoring essays and speeches in an educational testing setting, is growing in popularity and being deployed across contexts from government examinations to companies providing language proficiency services. However, existing systems either forgo human raters entirely, thus harming the reliability of the test, or score every response by both human and machine thereby increasing costs. We target the spectrum of possible solutions in between, making use of both humans and machines to provide a higher quality test while keeping costs reasonable to democratize access to AS. In this work, we propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently. We propose reward sampling and observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget (30% samples) using our proposed sampling. The accuracy increase observed using standard random and importance sampling baselines are 8.6% and 12.2% respectively. Furthermore, we demonstrate the system's model agnostic nature by measuring its performance on a variety of models currently deployed in an AS setting as well as pseudo models. Finally, we propose an algorithm to estimate the accuracy/QWK with statistical guarantees (Our code is available at https://git.io/J1IOy).

Via

Access Paper or Ask Questions

Speech Toxicity Analysis: A New Spoken Language Processing Task

Nov 06, 2021

Sreyan Ghosh, Samden Lepcha, S Sakshi, Rajiv Ratn Shah

Figure 1 for Speech Toxicity Analysis: A New Spoken Language Processing Task

Figure 2 for Speech Toxicity Analysis: A New Spoken Language Processing Task

Figure 3 for Speech Toxicity Analysis: A New Spoken Language Processing Task

Figure 4 for Speech Toxicity Analysis: A New Spoken Language Processing Task

Abstract:Toxic speech, also known as hate speech, is regarded as one of the crucial issues plaguing online social media today. Most recent work on toxic speech detection is constrained to the modality of text with no existing work on toxicity detection from spoken utterances. In this paper, we propose a new Spoken Language Processing task of detecting toxicity from spoken speech. We introduce DeToxy, the first publicly available toxicity annotated dataset for English speech, sourced from various openly available speech databases, consisting of over 2 million utterances. Finally, we also provide analysis on how a spoken speech corpus annotated for toxicity can help facilitate the development of E2E models which better capture various prosodic cues in speech, thereby boosting toxicity classification on spoken utterances.

* 5 pages, submitted to ICASSP 2022

Via

Access Paper or Ask Questions

Intent Classification Using Pre-Trained Embeddings For Low Resource Languages

Oct 18, 2021

Hemant Yadav, Akshat Gupta, Sai Krishna Rallabandi, Alan W Black, Rajiv Ratn Shah

Figure 1 for Intent Classification Using Pre-Trained Embeddings For Low Resource Languages

Figure 2 for Intent Classification Using Pre-Trained Embeddings For Low Resource Languages

Figure 3 for Intent Classification Using Pre-Trained Embeddings For Low Resource Languages

Figure 4 for Intent Classification Using Pre-Trained Embeddings For Low Resource Languages

Abstract:Building Spoken Language Understanding (SLU) systems that do not rely on language specific Automatic Speech Recognition (ASR) is an important yet less explored problem in language processing. In this paper, we present a comparative study aimed at employing a pre-trained acoustic model to perform SLU in low resource scenarios. Specifically, we use three different embeddings extracted using Allosaurus, a pre-trained universal phone decoder: (1) Phone (2) Panphone, and (3) Allo embeddings. These embeddings are then used in identifying the spoken intent. We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios. Our system improves on the state-of-the-art (SOTA) intent classification accuracy by approximately 2.11% for Sinhala and 7.00% for Tamil and achieves competitive results on English. Furthermore, we present a quantitative analysis of how the performance scales with the number of training examples used per intent.

Via

Access Paper or Ask Questions

AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Oct 14, 2021

Yaman Kumar Singla, Swapnil Parekh, Somesh Singh, Junyi Jessy Li, Rajiv Ratn Shah, Changyou Chen

Figure 1 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Figure 2 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Figure 3 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Figure 4 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Abstract:Deep-learning based Automatic Essay Scoring (AES) systems are being actively used by states and language testing agencies alike to evaluate millions of candidates for life-changing decisions ranging from college applications to visa approvals. However, little research has been put to understand and interpret the black-box nature of deep-learning based scoring algorithms. Previous studies indicate that scoring models can be easily fooled. In this paper, we explore the reason behind their surprising adversarial brittleness. We utilize recent advances in interpretability to find the extent to which features such as coherence, content, vocabulary, and relevance are important for automated scoring mechanisms. We use this to investigate the oversensitivity i.e., large change in output score with a little change in input essay content) and overstability i.e., little change in output scores with large changes in input essay content) of AES. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models with rich contextual embeddings such as BERT, behave like bag-of-words models. A few words determine the essay score without the requirement of any context making the model largely overstable. This is in stark contrast to recent probing studies on pre-trained representation learning models, which show that rich linguistic features such as parts-of-speech and morphology are encoded by them. Further, we also find that the models have learnt dataset biases, making them oversensitive. To deal with these issues, we propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies. We find that our proposed models are able to detect unusual attribution patterns and flag adversarial samples successfully.

* arXiv admin note: text overlap with arXiv:2012.13872

Via

Access Paper or Ask Questions

NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

Oct 13, 2021

Mohit Sharma, Raj Patra, Harshal Desai, Shruti Vyas, Yogesh Rawat, Rajiv Ratn Shah

Figure 1 for NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

Figure 2 for NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

Figure 3 for NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

Figure 4 for NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

Abstract:Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore the use of user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We utilize the collected dataset for action classification and demonstrate its usefulness with existing small-scale annotated datasets, UCF101 and HMDB51. We study different loss functions and two pretraining strategies, simple and self-supervised learning. We also show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets. We present this as a benchmark dataset in noisy learning for video understanding. The dataset, code, and trained models will be publicly available for future research.

* Accepted at ACM Multimedia Asia 2021

Via

Access Paper or Ask Questions

Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Oct 13, 2021

Anuj Saraswat, Mehar Bhatia, Yaman Kumar Singla, Changyou Chen, Rajiv Ratn Shah

Figure 1 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Figure 2 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Figure 3 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Figure 4 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Abstract:Recent studies in speech perception have been closely linked to fields of cognitive psychology, phonology, and phonetics in linguistics. During perceptual attunement, a critical and sensitive developmental trajectory has been examined in bilingual and monolingual infants where they can best discriminate common phonemes. In this paper, we compare and identify these cognitive aspects on deep neural-based visual lip-reading models. We conduct experiments on the two most extensive public visual speech recognition datasets for English and Mandarin. Through our experimental results, we observe a strong correlation between these theories in cognitive psychology and our unique modeling. We inspect how these computational models develop similar phases in speech perception and acquisitions.

* 9 pages, 6 figures, 2 tables

Via

Access Paper or Ask Questions

MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

Sep 25, 2021

Swapnil Parekh, Yaman Singla Kumar, Somesh Singh, Changyou Chen, Balaji Krishnamurthy, Rajiv Ratn Shah

Figure 1 for MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

Figure 2 for MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

Figure 3 for MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

Figure 4 for MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

Abstract:It is well known that natural language models are vulnerable to adversarial attacks, which are mostly input-specific in nature. Recently, it has been shown that there also exist input-agnostic attacks in NLP models, called universal adversarial triggers. However, existing methods to craft universal triggers are data intensive. They require large amounts of data samples to generate adversarial triggers, which are typically inaccessible by attackers. For instance, previous works take 3000 data samples per class for the SNLI dataset to generate adversarial triggers. In this paper, we present a novel data-free approach, MINIMAL, to mine input-agnostic adversarial triggers from models. Using the triggers produced with our data-free algorithm, we reduce the accuracy of Stanford Sentiment Treebank's positive class from 93.6% to 9.6%. Similarly, for the Stanford Natural Language Inference (SNLI), our single-word trigger reduces the accuracy of the entailment class from 90.95% to less than 0.6\%. Despite being completely data-free, we get equivalent accuracy drops as data-dependent methods.

Via

Access Paper or Ask Questions