Alert button
Picture for Yaman Kumar Singla

Yaman Kumar Singla

Alert button

Synthesizing Human Gaze Feedback for Improved NLP Performance

Feb 11, 2023
Varun Khurana, Yaman Kumar Singla, Nora Hollenstein, Rajesh Kumar, Balaji Krishnamurthy

Figure 1 for Synthesizing Human Gaze Feedback for Improved NLP Performance
Figure 2 for Synthesizing Human Gaze Feedback for Improved NLP Performance
Figure 3 for Synthesizing Human Gaze Feedback for Improved NLP Performance
Figure 4 for Synthesizing Human Gaze Feedback for Improved NLP Performance

Integrating human feedback in models can improve the performance of natural language processing (NLP) models. Feedback can be either explicit (e.g. ranking used in training language models) or implicit (e.g. using human cognitive signals in the form of eyetracking). Prior eye tracking and NLP research reveal that cognitive processes, such as human scanpaths, gleaned from human gaze patterns aid in the understanding and performance of NLP models. However, the collection of real eyetracking data for NLP tasks is challenging due to the requirement of expensive and precise equipment coupled with privacy invasion issues. To address this challenge, we propose ScanTextGAN, a novel model for generating human scanpaths over text. We show that ScanTextGAN-generated scanpaths can approximate meaningful cognitive signals in human gaze patterns. We include synthetically generated scanpaths in four popular NLP tasks spanning six different datasets as proof of concept and show that the models augmented with generated scanpaths improve the performance of all downstream NLP tasks.

* Accepted at European Chapter of the Association for Computational Linguistics (EACL) 
Viaarxiv icon

Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines

Aug 20, 2022
Yaman Kumar Singla, Rajat Jha, Arunim Gupta, Milan Aggarwal, Aditya Garg, Ayush Bhardwaj, Tushar, Balaji Krishnamurthy, Rajiv Ratn Shah, Changyou Chen

Figure 1 for Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines
Figure 2 for Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines
Figure 3 for Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines
Figure 4 for Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines

Modeling what makes an advertisement persuasive, i.e., eliciting the desired response from consumer, is critical to the study of propaganda, social psychology, and marketing. Despite its importance, computational modeling of persuasion in computer vision is still in its infancy, primarily due to the lack of benchmark datasets that can provide persuasion-strategy labels associated with ads. Motivated by persuasion literature in social psychology and marketing, we introduce an extensive vocabulary of persuasion strategies and build the first ad image corpus annotated with persuasion strategies. We then formulate the task of persuasion strategy prediction with multi-modal learning, where we design a multi-task attention fusion model that can leverage other ad-understanding tasks to predict persuasion strategies. Further, we conduct a real-world case study on 1600 advertising campaigns of 30 Fortune-500 companies where we use our model's predictions to analyze which strategies work with different demographics (age and gender). The dataset also provides image segmentation masks, which labels persuasion strategies in the corresponding ad images on the test split. We publicly release our code and dataset https://midas-research.github.io/persuasion-advertisements/.

Viaarxiv icon

LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

Apr 01, 2022
Debanjan Mahata, Navneet Agarwal, Dibya Gautam, Amardeep Kumar, Swapnil Parekh, Yaman Kumar Singla, Anish Acharya, Rajiv Ratn Shah

Figure 1 for LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents
Figure 2 for LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents
Figure 3 for LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents
Figure 4 for LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

Identifying keyphrases (KPs) from text documents is a fundamental task in natural language processing and information retrieval. Vast majority of the benchmark datasets for this task are from the scientific domain containing only the document title and abstract information. This limits keyphrase extraction (KPE) and keyphrase generation (KPG) algorithms to identify keyphrases from human-written summaries that are often very short (approx 8 sentences). This presents three challenges for real-world applications: human-written summaries are unavailable for most documents, the documents are almost always long, and a high percentage of KPs are directly found beyond the limited context of title and abstract. Therefore, we release two extensive corpora mapping KPs of ~1.3M and ~100K scientific articles with their fully extracted text and additional metadata including publication venue, year, author, field of study, and citations for facilitating research on this real-world problem.

Viaarxiv icon

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Mar 30, 2022
Sreyan Ghosh, Sonal Kumar, Yaman Kumar Singla, Rajiv Ratn Shah, S. Umesh

Figure 1 for Span Classification with Structured Information for Disfluency Detection in Spoken Utterances
Figure 2 for Span Classification with Structured Information for Disfluency Detection in Spoken Utterances
Figure 3 for Span Classification with Structured Information for Disfluency Detection in Spoken Utterances
Figure 4 for Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear sequences in text, thus ignoring the structured information in text which is efficiently captured by dependency trees. In this paper, building on the span classification paradigm of entity recognition, we propose a novel architecture for detecting disfluencies in transcripts from spoken utterances, incorporating both contextual information through transformers and long-distance structured information captured by dependency trees, through graph convolutional networks (GCNs). Experimental results show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior-art by a significant margin. We make all our codes publicly available on GitHub (https://github.com/Sreyan88/Disfluency-Detection-with-Span-Classification)

Viaarxiv icon

Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Nov 30, 2021
Pakhi Bamdev, Manraj Singh Grover, Yaman Kumar Singla, Payman Vafaee, Mika Hama, Rajiv Ratn Shah

Figure 1 for Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency
Figure 2 for Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency
Figure 3 for Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency
Figure 4 for Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

English proficiency assessments have become a necessary metric for filtering and selecting prospective candidates for both academia and industry. With the rise in demand for such assessments, it has become increasingly necessary to have the automated human-interpretable results to prevent inconsistencies and ensure meaningful feedback to the second language learners. Feature-based classical approaches have been more interpretable in understanding what the scoring model learns. Therefore, in this work, we utilize classical machine learning models to formulate a speech scoring task as both a classification and a regression problem, followed by a thorough study to interpret and study the relation between the linguistic cues and the English proficiency level of the speaker. First, we extract linguist features under five categories (fluency, pronunciation, content, grammar and vocabulary, and acoustic) and train models to grade responses. In comparison, we find that the regression-based models perform equivalent to or better than the classification approach. Second, we perform ablation studies to understand the impact of each of the feature and feature categories on the performance of proficiency grading. Further, to understand individual feature contributions, we present the importance of top features on the best performing algorithm for the grading task. Third, we make use of Partial Dependence Plots and Shapley values to explore feature importance and conclude that the best performing trained model learns the underlying rubrics used for grading the dataset used in this study.

* Accepted for publication in the International Journal of Artificial Intelligence in Education (IJAIED) 
Viaarxiv icon

Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Nov 17, 2021
Yaman Kumar Singla, Sriram Krishna, Rajiv Ratn Shah, Changyou Chen

Figure 1 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees
Figure 2 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees
Figure 3 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees
Figure 4 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Automated Scoring (AS), the natural language processing task of scoring essays and speeches in an educational testing setting, is growing in popularity and being deployed across contexts from government examinations to companies providing language proficiency services. However, existing systems either forgo human raters entirely, thus harming the reliability of the test, or score every response by both human and machine thereby increasing costs. We target the spectrum of possible solutions in between, making use of both humans and machines to provide a higher quality test while keeping costs reasonable to democratize access to AS. In this work, we propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently. We propose reward sampling and observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget (30% samples) using our proposed sampling. The accuracy increase observed using standard random and importance sampling baselines are 8.6% and 12.2% respectively. Furthermore, we demonstrate the system's model agnostic nature by measuring its performance on a variety of models currently deployed in an AS setting as well as pseudo models. Finally, we propose an algorithm to estimate the accuracy/QWK with statistical guarantees (Our code is available at https://git.io/J1IOy).

Viaarxiv icon

AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Oct 14, 2021
Yaman Kumar Singla, Swapnil Parekh, Somesh Singh, Junyi Jessy Li, Rajiv Ratn Shah, Changyou Chen

Figure 1 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses
Figure 2 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses
Figure 3 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses
Figure 4 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Deep-learning based Automatic Essay Scoring (AES) systems are being actively used by states and language testing agencies alike to evaluate millions of candidates for life-changing decisions ranging from college applications to visa approvals. However, little research has been put to understand and interpret the black-box nature of deep-learning based scoring algorithms. Previous studies indicate that scoring models can be easily fooled. In this paper, we explore the reason behind their surprising adversarial brittleness. We utilize recent advances in interpretability to find the extent to which features such as coherence, content, vocabulary, and relevance are important for automated scoring mechanisms. We use this to investigate the oversensitivity i.e., large change in output score with a little change in input essay content) and overstability i.e., little change in output scores with large changes in input essay content) of AES. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models with rich contextual embeddings such as BERT, behave like bag-of-words models. A few words determine the essay score without the requirement of any context making the model largely overstable. This is in stark contrast to recent probing studies on pre-trained representation learning models, which show that rich linguistic features such as parts-of-speech and morphology are encoded by them. Further, we also find that the models have learnt dataset biases, making them oversensitive. To deal with these issues, we propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies. We find that our proposed models are able to detect unusual attribution patterns and flag adversarial samples successfully.

* arXiv admin note: text overlap with arXiv:2012.13872 
Viaarxiv icon

Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Oct 13, 2021
Anuj Saraswat, Mehar Bhatia, Yaman Kumar Singla, Changyou Chen, Rajiv Ratn Shah

Figure 1 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks
Figure 2 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks
Figure 3 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks
Figure 4 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Recent studies in speech perception have been closely linked to fields of cognitive psychology, phonology, and phonetics in linguistics. During perceptual attunement, a critical and sensitive developmental trajectory has been examined in bilingual and monolingual infants where they can best discriminate common phonemes. In this paper, we compare and identify these cognitive aspects on deep neural-based visual lip-reading models. We conduct experiments on the two most extensive public visual speech recognition datasets for English and Mandarin. Through our experimental results, we observe a strong correlation between these theories in cognitive psychology and our unique modeling. We inspect how these computational models develop similar phases in speech perception and acquisitions.

* 9 pages, 6 figures, 2 tables 
Viaarxiv icon

AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Sep 24, 2021
Yaman Kumar Singla, Swapnil Parekh, Somesh Singh, Junyi Jessy Li, Rajiv Ratn Shah, Changyou Chen

Figure 1 for AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses
Figure 2 for AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses
Figure 3 for AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses
Figure 4 for AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Deep-learning based Automatic Essay Scoring (AES) systems are being actively used by states and language testing agencies alike to evaluate millions of candidates for life-changing decisions ranging from college applications to visa approvals. However, little research has been put to understand and interpret the black-box nature of deep-learning based scoring algorithms. Previous studies indicate that scoring models can be easily fooled. In this paper, we explore the reason behind their surprising adversarial brittleness. We utilize recent advances in interpretability to find the extent to which features such as coherence, content, vocabulary, and relevance are important for automated scoring mechanisms. We use this to investigate the oversensitivity i.e., large change in output score with a little change in input essay content) and overstability i.e., little change in output scores with large changes in input essay content) of AES. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models with rich contextual embeddings such as BERT, behave like bag-of-words models. A few words determine the essay score without the requirement of any context making the model largely overstable. This is in stark contrast to recent probing studies on pre-trained representation learning models, which show that rich linguistic features such as parts-of-speech and morphology are encoded by them. Further, we also find that the models have learnt dataset biases, making them oversensitive. To deal with these issues, we propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies. We find that our proposed models are able to detect unusual attribution patterns and flag adversarial samples successfully.

* arXiv admin note: text overlap with arXiv:2012.13872 
Viaarxiv icon

Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Aug 30, 2021
Yaman Kumar Singla, Avykat Gupta, Shaurya Bagga, Changyou Chen, Balaji Krishnamurthy, Rajiv Ratn Shah

Figure 1 for Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring
Figure 2 for Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring
Figure 3 for Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring
Figure 4 for Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a candidate's speaking proficiency in a language. ASS systems face many challenges like open grammar, variable pronunciations, and unstructured or semi-structured content. Recent deep learning approaches have shown some promise in this domain. However, most of these approaches focus on extracting features from a single audio, making them suffer from the lack of speaker-specific context required to model such a complex task. We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context vectors from these responses and feed them as additional speaker-specific context to our network to score a particular response. We compare our technique with strong baselines and find that such modeling improves the model's average performance by 6.92% (maximum = 12.86%, minimum = 4.51%). We further show both quantitative and qualitative insights into the importance of this additional context in solving the problem of ASS.

* Published in CIKM 2021 
Viaarxiv icon