Alert button
Picture for Ramesh Manuvinakurike

Ramesh Manuvinakurike

Alert button

Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization

Mar 08, 2023
Sumanta Bhattacharyya, Ramesh Manuvinakurike, Sahisnu Mazumder, Saurav Sahay

Figure 1 for Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Figure 2 for Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Figure 3 for Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Figure 4 for Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization

In this work, we develop a prompting approach for incremental summarization of task videos. We develop a sample-efficient few-shot approach for extracting semantic concepts as an intermediate step. We leverage an existing model for extracting the concepts from the images and extend it to videos and introduce a clustering and querying approach for sample efficiency, motivated by the recent advances in perceiver-based architectures. Our work provides further evidence that an approach with richer input context with relevant entities and actions from the videos and using these as prompts could enhance the summaries generated by the model. We show the results on a relevant dataset and discuss possible directions for the work.

Viaarxiv icon

Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

Feb 12, 2023
Hsuan Su, Shachi H Kumar, Sahisnu Mazumder, Wenda Chen, Ramesh Manuvinakurike, Eda Okur, Saurav Sahay, Lama Nachman, Shang-Tse Chen, Hung-yi Lee

Figure 1 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue
Figure 2 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue
Figure 3 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue
Figure 4 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

With the power of large pretrained language models, various research works have integrated knowledge into dialogue systems. The traditional techniques treat knowledge as part of the input sequence for the dialogue system, prepending a set of knowledge statements in front of dialogue history. However, such a mechanism forces knowledge sets to be concatenated in an ordered manner, making models implicitly pay imbalanced attention to the sets during training. In this paper, we first investigate how the order of the knowledge set can influence autoregressive dialogue systems' responses. We conduct experiments on two commonly used dialogue datasets with two types of transformer-based models and find that models view the input knowledge unequally. To this end, we propose a simple and novel technique to alleviate the order effect by modifying the position embeddings of knowledge input in these models. With the proposed position embedding method, the experimental results show that each knowledge statement is uniformly considered to generate responses.

Viaarxiv icon

Distill and Collect for Semi-Supervised Temporal Action Segmentation

Nov 03, 2022
Sovan Biswas, Anthony Rhodes, Ramesh Manuvinakurike, Giuseppe Raffa, Richard Beckwith

Figure 1 for Distill and Collect for Semi-Supervised Temporal Action Segmentation
Figure 2 for Distill and Collect for Semi-Supervised Temporal Action Segmentation
Figure 3 for Distill and Collect for Semi-Supervised Temporal Action Segmentation
Figure 4 for Distill and Collect for Semi-Supervised Temporal Action Segmentation

Recent temporal action segmentation approaches need frame annotations during training to be effective. These annotations are very expensive and time-consuming to obtain. This limits their performances when only limited annotated data is available. In contrast, we can easily collect a large corpus of in-domain unannotated videos by scavenging through the internet. Thus, this paper proposes an approach for the temporal action segmentation task that can simultaneously leverage knowledge from annotated and unannotated video sequences. Our approach uses multi-stream distillation that repeatedly refines and finally combines their frame predictions. Our model also predicts the action order, which is later used as a temporal constraint while estimating frames labels to counter the lack of supervision for unannotated videos. In the end, our evaluation of the proposed approach on two different datasets demonstrates its capability to achieve comparable performance to the full supervision despite limited annotation.

Viaarxiv icon

Human in the loop approaches in multi-modal conversational task guidance system development

Nov 03, 2022
Ramesh Manuvinakurike, Sovan Biswas, Giuseppe Raffa, Richard Beckwith, Anthony Rhodes, Meng Shi, Gesem Gudino Mejia, Saurav Sahay, Lama Nachman

Figure 1 for Human in the loop approaches in multi-modal conversational task guidance system development
Figure 2 for Human in the loop approaches in multi-modal conversational task guidance system development
Figure 3 for Human in the loop approaches in multi-modal conversational task guidance system development
Figure 4 for Human in the loop approaches in multi-modal conversational task guidance system development

Development of task guidance systems for aiding humans in a situated task remains a challenging problem. The role of search (information retrieval) and conversational systems for task guidance has immense potential to help the task performers achieve various goals. However, there are several technical challenges that need to be addressed to deliver such conversational systems, where common supervised approaches fail to deliver the expected results in terms of overall performance, user experience and adaptation to realistic conditions. In this preliminary work we first highlight some of the challenges involved during the development of such systems. We then provide an overview of existing datasets available and highlight their limitations. We finally develop a model-in-the-loop wizard-of-oz based data collection tool and perform a pilot experiment.

* SCAI @ SIGIR 
Viaarxiv icon

Controllable Response Generation for Assistive Use-cases

Dec 04, 2021
Shachi H Kumar, Hsuan Su, Ramesh Manuvinakurike, Saurav Sahay, Lama Nachman

Figure 1 for Controllable Response Generation for Assistive Use-cases
Figure 2 for Controllable Response Generation for Assistive Use-cases
Figure 3 for Controllable Response Generation for Assistive Use-cases
Figure 4 for Controllable Response Generation for Assistive Use-cases

Conversational agents have become an integral part of the general population for simple task enabling situations. However, these systems are yet to have any social impact on the diverse and minority population, for example, helping people with neurological disorders, for example ALS, and people with speech, language and social communication disorders. Language model technology can play a huge role to help these users carry out daily communication and social interactions. To enable this population, we build a dialog system that can be controlled by users using cues or keywords. We build models that can suggest relevant cues in the dialog response context which is used to control response generation and can speed up communication. We also introduce a keyword loss to lexically constrain the model output. We show both qualitatively and quantitatively that our models can effectively induce the keyword into the model response without degrading the quality of response. In the context of usage of such systems for people with degenerative disorders, we present human evaluation of our cue or keyword predictor and the controllable dialog system and show that our models perform significantly better than models without control. Our study shows that keyword control on end to end response generation models is powerful and can enable and empower users with degenerative disorders to carry out their day to day communication.

Viaarxiv icon

Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation

Apr 12, 2021
Jakob Nyberg, Ramesh Manuvinakurike, Maike Paetzel-Prüsmann

Figure 1 for Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation
Figure 2 for Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation
Figure 3 for Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation

Human ratings are one of the most prevalent methods to evaluate the performance of natural language processing algorithms. Similarly, it is common to measure the quality of sentences generated by a natural language generation model using human raters. In this paper, we argue for exploring the use of subjective evaluations within the process of training language generation models in a multi-task learning setting. As a case study, we use a crowd-authored dialogue corpus to fine-tune six different language generation models. Two of these models incorporate multi-task learning and use subjective ratings of lines as part of an explicit learning goal. A human evaluation of the generated dialogue lines reveals that utterances generated by the multi-tasking models were subjectively rated as the most typical, most moving the conversation forward, and least offensive. Based on these promising first results, we discuss future research directions for incorporating subjective human evaluations into language model training and to hence keep the human user in the loop during the development process.

* To appear at Workshop on Human Evaluation of NLP Systems (EACL 2021) 
Viaarxiv icon

"Can you say more about the location?" The Development of a Pedagogical Reference Resolution Agent

Sep 03, 2019
Maike Paetzel, Ramesh Manuvinakurike

Figure 1 for "Can you say more about the location?" The Development of a Pedagogical Reference Resolution Agent
Figure 2 for "Can you say more about the location?" The Development of a Pedagogical Reference Resolution Agent
Figure 3 for "Can you say more about the location?" The Development of a Pedagogical Reference Resolution Agent
Figure 4 for "Can you say more about the location?" The Development of a Pedagogical Reference Resolution Agent

In an increasingly globalized world, geographic literacy is crucial. In this paper, we present a collaborative two-player game to improve people's ability to locate countries on the world map. We discuss two implementations of the game: First, we created a web-based version which can be played with the remote-controlled agent Nellie. With the knowledge we gained from a large online data collection, we re-implemented the game so it can be played face-to-face with the Furhat robot Neil. Our analysis shows that participants found the game not just engaging to play, they also believe they gained lasting knowledge about the world map.

* Accepted at 1st workshop on dialogue for social good 
Viaarxiv icon

A System for Automated Image Editing from Natural Language Commands

Dec 03, 2018
Jacqueline Brixey, Ramesh Manuvinakurike, Nham Le, Tuan Lai, Walter Chang, Trung Bui

Figure 1 for A System for Automated Image Editing from Natural Language Commands
Figure 2 for A System for Automated Image Editing from Natural Language Commands
Figure 3 for A System for Automated Image Editing from Natural Language Commands
Figure 4 for A System for Automated Image Editing from Natural Language Commands

This work presents the task of modifying images in an image editing program using natural language written commands. We utilize a corpus of over 6000 image edit text requests to alter real world images collected via crowdsourcing. A novel framework composed of actions and entities to map a user's natural language request to executable commands in an image editing program is described. We resolve previously labeled annotator disagreement through a voting process and complete annotation of the corpus. We experimented with different machine learning models and found that the LSTM, the SVM, and the bidirectional LSTM-CRF joint models are the best to detect image editing actions and associated entities in a given utterance.

Viaarxiv icon

Towards Understanding End-of-trip Instructions in a Taxi Ride Scenario

Jul 11, 2018
Deepthi Karkada, Ramesh Manuvinakurike, Kallirroi Georgila

Figure 1 for Towards Understanding End-of-trip Instructions in a Taxi Ride Scenario
Figure 2 for Towards Understanding End-of-trip Instructions in a Taxi Ride Scenario
Figure 3 for Towards Understanding End-of-trip Instructions in a Taxi Ride Scenario
Figure 4 for Towards Understanding End-of-trip Instructions in a Taxi Ride Scenario

We introduce a dataset containing human-authored descriptions of target locations in an "end-of-trip in a taxi ride" scenario. We describe our data collection method and a novel annotation scheme that supports understanding of such descriptions of target locations. Our dataset contains target location descriptions for both synthetic and real-world images as well as visual annotations (ground truth labels, dimensions of vehicles and objects, coordinates of the target location,distance and direction of the target location from vehicles and objects) that can be used in various visual and language tasks. We also perform a pilot experiment on how the corpus could be applied to visual reference resolution in this domain.

* to appear in Fourteenth Joint ACL - ISO Workshop on Interoperable Semantic Annotation, Corresponding author: Ramesh Manuvinakurike 
Viaarxiv icon

A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Behavior Change

Jul 11, 2018
Ramesh Manuvinakurike, Sumanth Bharadwaj, Kallirroi Georgila

Figure 1 for A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Behavior Change
Figure 2 for A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Behavior Change
Figure 3 for A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Behavior Change
Figure 4 for A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Behavior Change

In this study we collect and annotate human-human role-play dialogues in the domain of weight management. There are two roles in the conversation: the "seeker" who is looking for ways to lose weight and the "helper" who provides suggestions to help the "seeker" in their weight loss journey. The chat dialogues collected are then annotated with a novel annotation scheme inspired by a popular health behavior change theory called "trans-theoretical model of health behavior change". We also build classifiers to automatically predict the annotation labels used in our corpus. We find that classification accuracy improves when oracle segmentations of the interlocutors' sentences are provided compared to directly classifying unsegmented sentences.

* to appear in Fourteenth Joint ACL - ISO Workshop on Interoperable Semantic Annotation 
Viaarxiv icon