Indian Institute of Technology Kharagpur
Abstract:The explosion of textual data has made manual document classification increasingly challenging. To address this, we introduce a robust, efficient domain-agnostic generative model framework for multi-label text classification. Instead of treating labels as mere atomic symbols, our approach utilizes predefined label descriptions and is trained to generate these descriptions based on the input text. During inference, the generated descriptions are matched to the pre-defined labels using a finetuned sentence transformer. We integrate this with a dual-objective loss function, combining cross-entropy loss and cosine similarity of the generated sentences with the predefined target descriptions, ensuring both semantic alignment and accuracy. Our proposed model LAGAMC stands out for its parameter efficiency and versatility across diverse datasets, making it well-suited for practical applications. We demonstrate the effectiveness of our proposed model by achieving new state-of-the-art performances across all evaluated datasets, surpassing several strong baselines. We achieve improvements of 13.94% in Micro-F1 and 24.85% in Macro-F1 compared to the closest baseline across all datasets.
Abstract:The study presents a comprehensive benchmark for retrieving Sanskrit documents using English queries, focusing on the chapters of the Srimadbhagavatam. It employs a tripartite approach: Direct Retrieval (DR), Translation-based Retrieval (DT), and Query Translation (QT), utilizing shared embedding spaces and advanced translation methods to enhance retrieval systems in a RAG framework. The study fine-tunes state-of-the-art models for Sanskrit's linguistic nuances, evaluating models such as BM25, REPLUG, mDPR, ColBERT, Contriever, and GPT-2. It adapts summarization techniques for Sanskrit documents to improve QA processing. Evaluation shows DT methods outperform DR and QT in handling the cross-lingual challenges of ancient texts, improving accessibility and understanding. A dataset of 3,400 English-Sanskrit query-document pairs underpins the study, aiming to preserve Sanskrit scriptures and share their philosophical importance widely. Our dataset is publicly available at https://huggingface.co/datasets/manojbalaji1/anveshana
Abstract:Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructions in a semi-automated and task-agnostic manner directly from the model itself. Many of these efforts have relied on large API-only parameter-based models such as GPT-3.5 (175B), which are expensive, and subject to limits on a number of queries. This paper explores the performance of three open-source small LLMs such as LLaMA 2-7B, LLama 2-13B, and Mistral 7B, using a semi-automated framework, thereby reducing human intervention, effort, and cost required to generate an instruction dataset for fine-tuning LLMs. Furthermore, we demonstrate that incorporating a Reinforcement Learning (RL) based training algorithm into this LLMs-based framework leads to further enhancements. Our evaluation of the dataset reveals that these RL-based frameworks achieve a substantial improvements in 63-66% of the tasks compared to previous approaches.
Abstract:In task-oriented dialogue systems, intent detection is crucial for interpreting user queries and providing appropriate responses. Existing research primarily addresses simple queries with a single intent, lacking effective systems for handling complex queries with multiple intents and extracting different intent spans. Additionally, there is a notable absence of multilingual, multi-intent datasets. This study addresses three critical tasks: extracting multiple intent spans from queries, detecting multiple intents, and developing a multi-lingual multi-label intent dataset. We introduce a novel multi-label multi-class intent detection dataset (MLMCID-dataset) curated from existing benchmark datasets. We also propose a pointer network-based architecture (MLMCID) to extract intent spans and detect multiple intents with coarse and fine-grained labels in the form of sextuplets. Comprehensive analysis demonstrates the superiority of our pointer network-based system over baseline approaches in terms of accuracy and F1-score across various datasets.
Abstract:Neural dependency parsing has achieved remarkable performance for low resource morphologically rich languages. It has also been well-studied that morphologically rich languages exhibit relatively free word order. This prompts a fundamental investigation: Is there a way to enhance dependency parsing performance, making the model robust to word order variations utilizing the relatively free word order nature of morphologically rich languages? In this work, we examine the robustness of graph-based parsing architectures on 7 relatively free word order languages. We focus on scrutinizing essential modifications such as data augmentation and the removal of position encoding required to adapt these architectures accordingly. To this end, we propose a contrastive self-supervised learning method to make the model robust to word order variations. Furthermore, our proposed modification demonstrates a substantial average gain of 3.03/2.95 points in 7 relatively free word order languages, as measured by the UAS/LAS Score metric when compared to the best performing baseline.
Abstract:The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of <image, question, answer> triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.
Abstract:This paper presents an energy-preserving machine learning method for inferring reduced-order models (ROMs) by exploiting the multi-symplectic form of partial differential equations (PDEs). The vast majority of energy-preserving reduced-order methods use symplectic Galerkin projection to construct reduced-order Hamiltonian models by projecting the full models onto a symplectic subspace. However, symplectic projection requires the existence of fully discrete operators, and in many cases, such as black-box PDE solvers, these operators are inaccessible. In this work, we propose an energy-preserving machine learning method that can infer the dynamics of the given PDE using data only, so that the proposed framework does not depend on the fully discrete operators. In this context, the proposed method is non-intrusive. The proposed method is grey box in the sense that it requires only some basic knowledge of the multi-symplectic model at the partial differential equation level. We prove that the proposed method satisfies spatially discrete local energy conservation and preserves the multi-symplectic conservation laws. We test our method on the linear wave equation, the Korteweg-de Vries equation, and the Zakharov-Kuznetsov equation. We test the generalization of our learned models by testing them far outside the training time interval.
Abstract:Model reduction is an active research field to construct low-dimensional surrogate models of high fidelity to accelerate engineering design cycles. In this work, we investigate model reduction for linear structured systems using dominant reachable and observable subspaces. When the training set $-$ containing all possible interpolation points $-$ is large, then these subspaces can be determined by solving many large-scale linear systems. However, for high-fidelity models, this easily becomes computationally intractable. To circumvent this issue, in this work, we propose an active sampling strategy to sample only a few points from the given training set, which can allow us to estimate those subspaces accurately. To this end, we formulate the identification of the subspaces as the solution of the generalized Sylvester equations, guiding us to select the most relevant samples from the training set to achieve our goals. Consequently, we construct solutions of the matrix equations in low-rank forms, which encode subspace information. We extensively discuss computational aspects and efficient usage of the low-rank factors in the process of obtaining reduced-order models. We illustrate the proposed active sampling scheme to obtain reduced-order models via dominant reachable and observable subspaces and present its comparison with the method where all the points from the training set are taken into account. It is shown that the active sample strategy can provide us $17$x speed-up without sacrificing any noticeable accuracy.
Abstract:The purpose of the current work is the development and comparison of Fourier neural operators (FNOs) for surrogate modeling of the quasi-static mechanical response of polycrystalline materials. Three types of such FNOs are considered here: a physics-guided FNO (PgFNO), a physics-informed FNO (PiFNO), and a physics-encoded FNO (PeFNO). These are trained and compared with the help of stress field data from a reference model for heterogeneous elastic materials with a periodic grain microstructure. Whereas PgFNO training is based solely on these data, that of the PiFNO and PeFNO is in addition constrained by the requirement that stress fields satisfy mechanical equilibrium, i.e., be divergence-free. The difference between the PiFNO and PeFNO lies in how this constraint is taken into account; in the PiFNO, it is included in the loss function, whereas in the PeFNO, it is "encoded" in the operator architecture. In the current work, this encoding is based on a stress potential and Fourier transforms. As a result, only the training of the PiFNO is constrained by mechanical equilibrium; in contrast, mechanical equilibrium constrains both the training and output of the PeFNO. Due in particular to this, stress fields calculated by the trained PeFNO are significantly more accurate than those calculated by the trained PiFNO in the example cases considered.
Abstract:The ever-increasing volume of digital information necessitates efficient methods for users to extract key insights from lengthy documents. Aspect-based summarization offers a targeted approach, generating summaries focused on specific aspects within a document. Despite advancements in aspect-based summarization research, there is a continuous quest for improved model performance. Given that large language models (LLMs) have demonstrated the potential to revolutionize diverse tasks within natural language processing, particularly in the problem of summarization, this paper explores the potential of fine-tuning LLMs for the aspect-based summarization task. We evaluate the impact of fine-tuning open-source foundation LLMs, including Llama2, Mistral, Gemma and Aya, on a publicly available domain-specific aspect based summary dataset. We hypothesize that this approach will enable these models to effectively identify and extract aspect-related information, leading to superior quality aspect-based summaries compared to the state-of-the-art. We establish a comprehensive evaluation framework to compare the performance of fine-tuned LLMs against competing aspect-based summarization methods and vanilla counterparts of the fine-tuned LLMs. Our work contributes to the field of aspect-based summarization by demonstrating the efficacy of fine-tuning LLMs for generating high-quality aspect-based summaries. Furthermore, it opens doors for further exploration of using LLMs for targeted information extraction tasks across various NLP domains.