Alert button
Picture for Bishal Santra

Bishal Santra

Alert button

Frugal Prompting for Dialog Models

May 24, 2023
Bishal Santra, Sakya Basak, Abhinandan De, Manish Gupta, Pawan Goyal

Figure 1 for Frugal Prompting for Dialog Models
Figure 2 for Frugal Prompting for Dialog Models
Figure 3 for Frugal Prompting for Dialog Models
Figure 4 for Frugal Prompting for Dialog Models

The use of large language models (LLMs) in natural language processing (NLP) tasks is rapidly increasing, leading to changes in how researchers approach problems in the field. To fully utilize these models' abilities, a better understanding of their behavior for different input protocols is required. With LLMs, users can directly interact with the models through a text-based interface to define and solve various tasks. Hence, understanding the conversational abilities of these LLMs, which may not have been specifically trained for dialog modeling, is also important. This study examines different approaches for building dialog systems using LLMs by considering various aspects of the prompt. As part of prompt tuning, we experiment with various ways of providing instructions, exemplars, current query and additional context. The research also analyzes the representations of dialog history that have the optimal usable-information density. Based on the findings, the paper suggests more compact ways of providing dialog history information while ensuring good performance and reducing model's inference-API costs. The research contributes to a better understanding of how LLMs can be effectively used for building interactive systems.

* First two authors have equal contribution 
Viaarxiv icon

CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models

May 21, 2022
Bishal Santra, Ravi Ghadia, Arpit Dwivedi, Manish Gupta, Pawan Goyal

Figure 1 for CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models
Figure 2 for CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models
Figure 3 for CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models
Figure 4 for CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models

Natural Language Generation (NLG) represents a large collection of tasks in the field of NLP. While many of these tasks have been tackled well by the cross-entropy (CE) loss, the task of dialog generation poses a few unique challenges for this loss function. First, CE loss assumes that for any given input, the only possible output is the one available as the ground truth in the training dataset. In general, this is not true for any task, as there can be multiple semantically equivalent sentences, each with a different surface form. This problem gets exaggerated further for the dialog generation task, as there can be multiple valid responses (for a given context) that not only have different surface forms but are also not semantically equivalent. Second, CE loss does not take the context into consideration while processing the response and, hence, it treats all ground truths with equal importance irrespective of the context. But, we may want our final agent to avoid certain classes of responses (e.g. bland, non-informative or biased responses) and give relatively higher weightage for more context-specific responses. To circumvent these shortcomings of the CE loss, in this paper, we propose a novel loss function, CORAL, that directly optimizes recently proposed estimates of human preference for generated responses. Using CORAL, we can train dialog generation models without assuming non-existence of response other than the ground-truth. Also, the CORAL loss is computed based on both the context and the response. Extensive comparisons on two benchmark datasets show that the proposed methods outperform strong state-of-the-art baseline models of different sizes.

* 15 pages, 3 figures 
Viaarxiv icon

A Study on Prompt-based Few-Shot Learning Methods for Belief State Tracking in Task-oriented Dialog Systems

Apr 18, 2022
Debjoy Saha, Bishal Santra, Pawan Goyal

Figure 1 for A Study on Prompt-based Few-Shot Learning Methods for Belief State Tracking in Task-oriented Dialog Systems
Figure 2 for A Study on Prompt-based Few-Shot Learning Methods for Belief State Tracking in Task-oriented Dialog Systems
Figure 3 for A Study on Prompt-based Few-Shot Learning Methods for Belief State Tracking in Task-oriented Dialog Systems
Figure 4 for A Study on Prompt-based Few-Shot Learning Methods for Belief State Tracking in Task-oriented Dialog Systems

We tackle the Dialogue Belief State Tracking(DST) problem of task-oriented conversational systems. Recent approaches to this problem leveraging Transformer-based models have yielded great results. However, training these models is expensive, both in terms of computational resources and time. Additionally, collecting high quality annotated dialogue datasets remains a challenge for researchers because of the extensive annotation required for training these models. Driven by the recent success of pre-trained language models and prompt-based learning, we explore prompt-based few-shot learning for Dialogue Belief State Tracking. We formulate the DST problem as a 2-stage prompt-based language modelling task and train language models for both tasks and present a comprehensive empirical analysis of their separate and joint performance. We demonstrate the potential of prompt-based methods in few-shot learning for DST and provide directions for future improvement.

* 9 pages, 12 figures 
Viaarxiv icon

Representation Learning for Conversational Data using Discourse Mutual Information Maximization

Dec 04, 2021
Bishal Santra, Sumegh Roychowdhury, Aishik Mandal, Vasu Gurram, Atharva Naik, Manish Gupta, Pawan Goyal

Figure 1 for Representation Learning for Conversational Data using Discourse Mutual Information Maximization
Figure 2 for Representation Learning for Conversational Data using Discourse Mutual Information Maximization
Figure 3 for Representation Learning for Conversational Data using Discourse Mutual Information Maximization
Figure 4 for Representation Learning for Conversational Data using Discourse Mutual Information Maximization

Although many pretrained models exist for text or images, there have been relatively fewer attempts to train representations specifically for dialog understanding. Prior works usually relied on finetuned representations based on generic text representation models like BERT or GPT-2. But, existing pretraining objectives do not take the structural information of text into consideration. Although generative dialog models can learn structural features too, we argue that the structure-unaware word-by-word generation is not suitable for effective conversation modeling. We empirically demonstrate that such representations do not perform consistently across various dialog understanding tasks. Hence, we propose a structure-aware Mutual Information based loss-function DMI (Discourse Mutual Information) for training dialog-representation models, that additionally captures the inherent uncertainty in response prediction. Extensive evaluation on nine diverse dialog modeling tasks shows that our proposed DMI-based models outperform strong baselines by significant margins, even with small-scale pretraining. Our models show the most promising performance on the dialog evaluation task DailyDialog++, in both random and adversarial negative scenarios.

* Preprint, 15 pages 
Viaarxiv icon

Exploring Effects of Random Walk Based Minibatch Selection Policy on Knowledge Graph Completion

Apr 12, 2020
Bishal Santra, Prakhar Sharma, Sumegh Roychowdhury, Pawan Goyal

Figure 1 for Exploring Effects of Random Walk Based Minibatch Selection Policy on Knowledge Graph Completion
Figure 2 for Exploring Effects of Random Walk Based Minibatch Selection Policy on Knowledge Graph Completion
Figure 3 for Exploring Effects of Random Walk Based Minibatch Selection Policy on Knowledge Graph Completion
Figure 4 for Exploring Effects of Random Walk Based Minibatch Selection Policy on Knowledge Graph Completion

In this paper, we have explored the effects of different minibatch sampling techniques in Knowledge Graph Completion. Knowledge Graph Completion (KGC) or Link Prediction is the task of predicting missing facts in a knowledge graph. KGC models are usually trained using margin, soft-margin or cross-entropy loss function that promotes assigning a higher score or probability for true fact triplets. Minibatch gradient descent is used to optimize these loss functions for training the KGC models. But, as each minibatch consists of only a few randomly sampled triplets from a large knowledge graph, any entity that occurs in a minibatch, occurs only once in most cases. Because of this, these loss functions ignore all other neighbors of any entity, whose embedding is being updated at some minibatch step. In this paper, we propose a new random-walk based minibatch sampling technique for training KGC models that optimizes the loss incurred by a minibatch of closely connected subgraph of triplets instead of randomly selected ones. We have shown results of experiments for different models and datasets with our sampling technique and found that the proposed sampling algorithm has varying effects on these datasets/models. Specifically, we find that our proposed method achieves state-of-the-art performance on the DB100K dataset.

* 7 pages, 3 figures 
Viaarxiv icon

Incorporating Domain Knowledge into Medical NLI using Knowledge Graphs

Aug 31, 2019
Soumya Sharma, Bishal Santra, Abhik Jana, T. Y. S. S. Santosh, Niloy Ganguly, Pawan Goyal

Figure 1 for Incorporating Domain Knowledge into Medical NLI using Knowledge Graphs
Figure 2 for Incorporating Domain Knowledge into Medical NLI using Knowledge Graphs
Figure 3 for Incorporating Domain Knowledge into Medical NLI using Knowledge Graphs

Recently, biomedical version of embeddings obtained from language models such as BioELMo have shown state-of-the-art results for the textual inference task in the medical domain. In this paper, we explore how to incorporate structured domain knowledge, available in the form of a knowledge graph (UMLS), for the Medical NLI task. Specifically, we experiment with fusing embeddings obtained from knowledge graph with the state-of-the-art approaches for NLI task (ESIM model). We also experiment with fusing the domain-specific sentiment information for the task. Experiments conducted on MedNLI dataset clearly show that this strategy improves the baseline BioELMo architecture for the Medical NLI task.

* EMNLP 2019 accepted short paper 
Viaarxiv icon

Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit

Oct 25, 2018
Amrith Krishna, Bishal Santra, Sasi Prasanth Bandaru, Gaurav Sahu, Vishnu Dutt Sharma, Pavankumar Satuluri, Pawan Goyal

Figure 1 for Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit
Figure 2 for Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit
Figure 3 for Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit
Figure 4 for Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit

The configurational information in sentences of a free word order language such as Sanskrit is of limited use. Thus, the context of the entire sentence will be desirable even for basic processing tasks such as word segmentation. We propose a structured prediction framework that jointly solves the word segmentation and morphological tagging tasks in Sanskrit. We build an energy based model where we adopt approaches generally employed in graph based parsing techniques (McDonald et al., 2005a; Carreras, 2007). Our model outperforms the state of the art with an F-Score of 96.92 (percentage improvement of 7.06%) while using less than one-tenth of the task-specific training data. We find that the use of a graph based ap- proach instead of a traditional lattice-based sequential labelling approach leads to a percentage gain of 12.6% in F-Score for the segmentation task.

* version 2: Corrected typo in Table1, page7 | Accepted in EMNLP 2018. Supplementary material can be found at - http://cse.iitkgp.ac.in/~amrithk/1080_supp.pdf 
Viaarxiv icon