Alert button
Picture for Saurav Sahay

Saurav Sahay

Alert button

Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models

Oct 17, 2023
Hsuan Su, Cheng-Chu Cheng, Hua Farn, Shachi H Kumar, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee

Recently, researchers have made considerable improvements in dialogue systems with the progress of large language models (LLMs) such as ChatGPT and GPT-4. These LLM-based chatbots encode the potential biases while retaining disparities that can harm humans during interactions. The traditional biases investigation methods often rely on human-written test cases. However, these test cases are usually expensive and limited. In this work, we propose a first-of-its-kind method that automatically generates test cases to detect LLMs' potential gender bias. We apply our method to three well-known LLMs and find that the generated test cases effectively identify the presence of biases. To address the biases identified, we propose a mitigation strategy that uses the generated test cases as demonstrations for in-context learning to circumvent the need for parameter fine-tuning. The experimental results show that LLMs generate fairer responses with the proposed approach.

Viaarxiv icon

Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home

Jun 01, 2023
Eda Okur, Roddy Fuentes Alba, Saurav Sahay, Lama Nachman

Figure 1 for Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home
Figure 2 for Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home
Figure 3 for Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home
Figure 4 for Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home

Enriching the quality of early childhood education with interactive math learning at home systems, empowered by recent advances in conversational AI technologies, is slowly becoming a reality. With this motivation, we implement a multimodal dialogue system to support play-based learning experiences at home, guiding kids to master basic math concepts. This work explores Spoken Language Understanding (SLU) pipeline within a task-oriented dialogue system developed for Kid Space, with cascading Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) components evaluated on our home deployment data with kids going through gamified math learning activities. We validate the advantages of a multi-task architecture for NLU and experiment with a diverse set of pretrained language representations for Intent Recognition and Entity Extraction tasks in the math learning domain. To recognize kids' speech in realistic home environments, we investigate several ASR systems, including the commercial Google Cloud and the latest open-source Whisper solutions with varying model sizes. We evaluate the SLU pipeline by testing our best-performing NLU models on noisy ASR output to inspect the challenges of understanding children for math learning in authentic homes.

* Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at ACL 2023 
Viaarxiv icon

Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization

Mar 08, 2023
Sumanta Bhattacharyya, Ramesh Manuvinakurike, Sahisnu Mazumder, Saurav Sahay

Figure 1 for Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Figure 2 for Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Figure 3 for Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Figure 4 for Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization

In this work, we develop a prompting approach for incremental summarization of task videos. We develop a sample-efficient few-shot approach for extracting semantic concepts as an intermediate step. We leverage an existing model for extracting the concepts from the images and extend it to videos and introduce a clustering and querying approach for sample efficiency, motivated by the recent advances in perceiver-based architectures. Our work provides further evidence that an approach with richer input context with relevant entities and actions from the videos and using these as prompts could enhance the summaries generated by the model. We show the results on a relevant dataset and discuss possible directions for the work.

Viaarxiv icon

Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

Feb 12, 2023
Hsuan Su, Shachi H Kumar, Sahisnu Mazumder, Wenda Chen, Ramesh Manuvinakurike, Eda Okur, Saurav Sahay, Lama Nachman, Shang-Tse Chen, Hung-yi Lee

Figure 1 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue
Figure 2 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue
Figure 3 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue
Figure 4 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

With the power of large pretrained language models, various research works have integrated knowledge into dialogue systems. The traditional techniques treat knowledge as part of the input sequence for the dialogue system, prepending a set of knowledge statements in front of dialogue history. However, such a mechanism forces knowledge sets to be concatenated in an ordered manner, making models implicitly pay imbalanced attention to the sets during training. In this paper, we first investigate how the order of the knowledge set can influence autoregressive dialogue systems' responses. We conduct experiments on two commonly used dialogue datasets with two types of transformer-based models and find that models view the input knowledge unequally. To this end, we propose a simple and novel technique to alleviate the order effect by modifying the position embeddings of knowledge input in these models. With the proposed position embedding method, the experimental results show that each knowledge statement is uniformly considered to generate responses.

Viaarxiv icon

General Framework for Self-Supervised Model Priming for Parameter-Efficient Fine-tuning

Dec 02, 2022
Shih-Cheng Huang, Shih-Heng Wang, Min-Han Shih, Saurav Sahay, Hung-yi Lee

Figure 1 for General Framework for Self-Supervised Model Priming for Parameter-Efficient Fine-tuning
Figure 2 for General Framework for Self-Supervised Model Priming for Parameter-Efficient Fine-tuning
Figure 3 for General Framework for Self-Supervised Model Priming for Parameter-Efficient Fine-tuning
Figure 4 for General Framework for Self-Supervised Model Priming for Parameter-Efficient Fine-tuning

Parameter-efficient methods (like Prompt or Adapters) for adapting pre-trained language models to downstream tasks have been popular recently. However, hindrances still prevent these methods from reaching their full potential. For example, two significant challenges are few-shot adaptation and cross-task generalization ability. To tackle these issues, we propose a general framework to enhance the few-shot adaptation and cross-domain generalization ability of parameter-efficient methods. In our framework, we prime the self-supervised model for parameter-efficient methods to rapidly adapt to various downstream few-shot tasks. To evaluate the authentic generalization ability of these parameter-efficient methods, we conduct experiments on a few-shot cross-domain benchmark containing 160 diverse NLP tasks. The experiment result reveals that priming by tuning PLM only with extra training tasks leads to the best performance. Also, we perform a comprehensive analysis of various parameter-efficient methods under few-shot cross-domain scenarios.

Viaarxiv icon

End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics

Nov 07, 2022
Eda Okur, Saurav Sahay, Roddy Fuentes Alba, Lama Nachman

Figure 1 for End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics
Figure 2 for End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics
Figure 3 for End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics
Figure 4 for End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics

The advances in language-based Artificial Intelligence (AI) technologies applied to build educational applications can present AI for social-good opportunities with a broader positive impact. Across many disciplines, enhancing the quality of mathematics education is crucial in building critical thinking and problem-solving skills at younger ages. Conversational AI systems have started maturing to a point where they could play a significant role in helping students learn fundamental math concepts. This work presents a task-oriented Spoken Dialogue System (SDS) built to support play-based learning of basic math concepts for early childhood education. The system has been evaluated via real-world deployments at school while the students are practicing early math concepts with multimodal interactions. We discuss our efforts to improve the SDS pipeline built for math learning, for which we explore utilizing MathBERT representations for potential enhancement to the Natural Language Understanding (NLU) module. We perform an end-to-end evaluation using real-world deployment outputs from the Automatic Speech Recognition (ASR), Intent Recognition, and Dialogue Manager (DM) components to understand how error propagation affects the overall performance in real-world scenarios.

* Proceedings of the 1st Workshop on Mathematical Natural Language Processing (MathNLP) at EMNLP 2022 
Viaarxiv icon

Human in the loop approaches in multi-modal conversational task guidance system development

Nov 03, 2022
Ramesh Manuvinakurike, Sovan Biswas, Giuseppe Raffa, Richard Beckwith, Anthony Rhodes, Meng Shi, Gesem Gudino Mejia, Saurav Sahay, Lama Nachman

Figure 1 for Human in the loop approaches in multi-modal conversational task guidance system development
Figure 2 for Human in the loop approaches in multi-modal conversational task guidance system development
Figure 3 for Human in the loop approaches in multi-modal conversational task guidance system development
Figure 4 for Human in the loop approaches in multi-modal conversational task guidance system development

Development of task guidance systems for aiding humans in a situated task remains a challenging problem. The role of search (information retrieval) and conversational systems for task guidance has immense potential to help the task performers achieve various goals. However, there are several technical challenges that need to be addressed to deliver such conversational systems, where common supervised approaches fail to deliver the expected results in terms of overall performance, user experience and adaptation to realistic conditions. In this preliminary work we first highlight some of the challenges involved during the development of such systems. We then provide an overview of existing datasets available and highlight their limitations. We finally develop a model-in-the-loop wizard-of-oz based data collection tool and perform a pilot experiment.

* SCAI @ SIGIR 
Viaarxiv icon

Few-shot Prompting Towards Controllable Response Generation

Jun 09, 2022
Hsuan Su, Pohan Chi, Shih-Cheng Huang, Chung Ho Lam, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee

Figure 1 for Few-shot Prompting Towards Controllable Response Generation
Figure 2 for Few-shot Prompting Towards Controllable Response Generation
Figure 3 for Few-shot Prompting Towards Controllable Response Generation
Figure 4 for Few-shot Prompting Towards Controllable Response Generation

Much literature has shown that prompt-based learning is an efficient method to make use of the large pre-trained language model. Recent works also exhibit the possibility of steering a chatbot's output by plugging in an appropriate prompt. Gradient-based methods are often used to perturb the prompts. However, some language models are not even available to the public. In this work, we first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters. Second, to reduce the training effort and enhance the generalizability to the unseen task, we apply multi-task learning to make the model learn to generalize to new tasks better. The experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters. Furthermore, the model demonstrates the strong ability to quickly adapt to an unseen task in fewer steps than the baseline model.

Viaarxiv icon

NLU for Game-based Learning in Real: Initial Evaluations

May 27, 2022
Eda Okur, Saurav Sahay, Lama Nachman

Figure 1 for NLU for Game-based Learning in Real: Initial Evaluations
Figure 2 for NLU for Game-based Learning in Real: Initial Evaluations
Figure 3 for NLU for Game-based Learning in Real: Initial Evaluations
Figure 4 for NLU for Game-based Learning in Real: Initial Evaluations

Intelligent systems designed for play-based interactions should be contextually aware of the users and their surroundings. Spoken Dialogue Systems (SDS) are critical for these interactive agents to carry out effective goal-oriented communication with users in real-time. For the real-world (i.e., in-the-wild) deployment of such conversational agents, improving the Natural Language Understanding (NLU) module of the goal-oriented SDS pipeline is crucial, especially with limited task-specific datasets. This study explores the potential benefits of a recently proposed transformer-based multi-task NLU architecture, mainly to perform Intent Recognition on small-size domain-specific educational game datasets. The evaluation datasets were collected from children practicing basic math concepts via play-based interactions in game-based learning settings. We investigate the NLU performances on the initial proof-of-concept game datasets versus the real-world deployment datasets and observe anticipated performance drops in-the-wild. We have shown that compared to the more straightforward baseline approaches, Dual Intent and Entity Transformer (DIET) architecture is robust enough to handle real-world data to a large extent for the Intent Recognition task on these domain-specific in-the-wild game datasets.

* Proceedings of the Games and Natural Language Processing Workshop at LREC 2022 
Viaarxiv icon

Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System

May 09, 2022
Eda Okur, Saurav Sahay, Lama Nachman

Figure 1 for Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System
Figure 2 for Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System
Figure 3 for Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System
Figure 4 for Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System

Contextually aware intelligent agents are often required to understand the users and their surroundings in real-time. Our goal is to build Artificial Intelligence (AI) systems that can assist children in their learning process. Within such complex frameworks, Spoken Dialogue Systems (SDS) are crucial building blocks to handle efficient task-oriented communication with children in game-based learning settings. We are working towards a multimodal dialogue system for younger kids learning basic math concepts. Our focus is on improving the Natural Language Understanding (NLU) module of the task-oriented SDS pipeline with limited datasets. This work explores the potential benefits of data augmentation with paraphrase generation for the NLU models trained on small task-specific datasets. We also investigate the effects of extracting entities for conceivably further data expansion. We have shown that paraphrasing with model-in-the-loop (MITL) strategies using small seed data is a promising approach yielding improved performance results for the Intent Recognition task.

* Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022) 
Viaarxiv icon