Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Soujanya Poria

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Aug 18, 2024

Meng Luo, Hao Fei, Bobo Li, Shengqiong Wu, Qian Liu, Soujanya Poria, Erik Cambria, Mong-Li Lee, Wynne Hsu

Figure 1 for PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Figure 2 for PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Figure 3 for PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Figure 4 for PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Abstract:While existing Aspect-based Sentiment Analysis (ABSA) has received extensive effort and advancement, there are still gaps in defining a more holistic research target seamlessly integrating multimodality, conversation context, fine-granularity, and also covering the changing sentiment dynamics as well as cognitive causal rationales. This paper bridges the gaps by introducing a multimodal conversational ABSA, where two novel subtasks are proposed: 1) Panoptic Sentiment Sextuple Extraction, panoramically recognizing holder, target, aspect, opinion, sentiment, rationale from multi-turn multi-party multimodal dialogue. 2) Sentiment Flipping Analysis, detecting the dynamic sentiment transformation throughout the conversation with the causal reasons. To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements. To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism. Extensive evaluations demonstrate the superiority of our methods over strong baselines, validating the efficacy of all our proposed methods. The work is expected to open up a new era for the ABSA community, and thus all our codes and data are open at https://PanoSent.github.io/

* Accepted by ACM MM 2024 (Oral)

Via

Access Paper or Ask Questions

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Aug 07, 2024

Prannaya Gupta, Le Qi Yau, Hao Han Low, I-Shiang Lee, Hugo Maximus Lim, Yu Xin Teoh, Jia Hng Koh, Dar Win Liew, Rishabh Bhardwaj, Rajat Bhardwaj(+1 more)

Figure 1 for WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Figure 2 for WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Figure 3 for WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Figure 4 for WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Abstract:WalledEval is a comprehensive AI safety testing toolkit designed to evaluate large language models (LLMs). It accommodates a diverse range of models, including both open-weight and API-based ones, and features over 35 safety benchmarks covering areas such as multilingual safety, exaggerated safety, and prompt injections. The framework supports both LLM and judge benchmarking, and incorporates custom mutators to test safety against various text-style mutations such as future tense and paraphrasing. Additionally, WalledEval introduces WalledGuard, a new, small and performant content moderation tool, and SGXSTest, a benchmark for assessing exaggerated safety in cultural contexts. We make WalledEval publicly available at https://github.com/walledai/walledevalA.

* Under review

Via

Access Paper or Ask Questions

Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation

Jun 25, 2024

Yingting Li, Ambuj Mehrish, Bryan Chew, Bo Cheng, Soujanya Poria

Figure 1 for Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation

Figure 2 for Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation

Figure 3 for Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation

Figure 4 for Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation

Abstract:Different languages have distinct phonetic systems and vary in their prosodic features making it challenging to develop a Text-to-Speech (TTS) model that can effectively synthesise speech in multilingual settings. Furthermore, TTS architecture needs to be both efficient enough to capture nuances in multiple languages and efficient enough to be practical for deployment. The standard approach is to build transformer based model such as SpeechT5 and train it on large multilingual dataset. As the size of these models grow the conventional fine-tuning for adapting these model becomes impractical due to heavy computational cost. In this paper, we proposes to integrate parameter-efficient transfer learning (PETL) methods such as adapters and hypernetwork with TTS architecture for multilingual speech synthesis. Notably, in our experiments PETL methods able to achieve comparable or even better performance compared to full fine-tuning with only $\sim$2.5\% tunable parameters.The code and samples are available at: https://anonymous.4open.science/r/multilingualTTS-BA4C.

Via

Access Paper or Ask Questions

Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

Jun 25, 2024

Chia-Yu Hung, Navonil Majumder, Ambuj Mehrish, Soujanya Poria

Figure 1 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

Figure 2 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

Figure 3 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

Figure 4 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

Abstract:The widespread applicability and increasing omnipresence of LLMs have instigated a need to align LLM responses to user and stakeholder preferences. Many preference optimization approaches have been proposed that fine-tune LLM parameters to achieve good alignment. However, such parameter tuning is known to interfere with model performance on many tasks. Moreover, keeping up with shifting user preferences is tricky in such a situation. Decoding-time alignment with reward model guidance solves these issues at the cost of increased inference time. However, most of such methods fail to strike the right balance between exploration and exploitation of reward -- often due to the conflated formulation of these two aspects - to give well-aligned responses. To remedy this we decouple these two aspects and implement them in an evolutionary fashion: exploration is enforced by decoding from mutated instructions and exploitation is represented as the periodic replacement of poorly-rewarded generations with well-rewarded ones. Empirical evidences indicate that this strategy outperforms many preference optimization and decode-time alignment approaches on two widely accepted alignment benchmarks AlpacaEval 2 and MT-Bench. Our implementation will be available at: https://darwin-alignment.github.io.

Via

Access Paper or Ask Questions

Improving Text-To-Audio Models with Synthetic Captions

Jun 18, 2024

Zhifeng Kong, Sang-gil Lee, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Rafael Valle, Soujanya Poria, Bryan Catanzaro

Figure 1 for Improving Text-To-Audio Models with Synthetic Captions

Figure 2 for Improving Text-To-Audio Models with Synthetic Captions

Figure 3 for Improving Text-To-Audio Models with Synthetic Captions

Figure 4 for Improving Text-To-Audio Models with Synthetic Captions

Abstract:It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model} to synthesize accurate and diverse captions for audio at scale. We leverage this pipeline to produce a dataset of synthetic captions for AudioSet, named \texttt{AF-AudioSet}, and then evaluate the benefit of pre-training text-to-audio models on these synthetic captions. Through systematic evaluations on AudioCaps and MusicCaps, we find leveraging our pipeline and synthetic captions leads to significant improvements on audio generation quality, achieving a new \textit{state-of-the-art}.

Via

Access Paper or Ask Questions

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Jun 17, 2024

Pala Tej Deep, Rishabh Bhardwaj, Soujanya Poria

Figure 1 for DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Figure 2 for DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Figure 3 for DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Figure 4 for DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Abstract:With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the parameters in order of their magnitude and assigns higher dropout probabilities (p) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MAGPRUNE employs a rescaling operation on the parameters that survive the random dropping by 1/(1 - p). On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), DELLA shows an average improvement of 2.4 points over baseline methods employing delta parameter pruning (an improvement of 3.6 points over TIES, 1.2 points over DARE), and 11.1 points over the no-pruning baseline (TA). We release the source code at: https://github.com/declare-lab/della.

Via

Access Paper or Ask Questions

Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations

Jun 17, 2024

Rima Hazra, Sayan Layek, Somnath Banerjee, Soujanya Poria

Abstract:Ensuring the safe alignment of large language models (LLMs) with human values is critical as they become integral to applications like translation and question answering. Current alignment methods struggle with dynamic user intentions and complex objectives, making models vulnerable to generating harmful content. We propose Safety Arithmetic, a training-free framework enhancing LLM safety across different scenarios: Base models, Supervised fine-tuned models (SFT), and Edited models. Safety Arithmetic involves Harm Direction Removal to avoid harmful content and Safety Alignment to promote safe responses. Additionally, we present NoIntentEdit, a dataset highlighting edit instances that could compromise model safety if used unintentionally. Our experiments show that Safety Arithmetic significantly improves safety measures, reduces over-safety, and maintains model utility, outperforming existing methods in ensuring safe content generation.

* Under Review. Codes are available at: https://github.com/declare-lab/safety-arithmetic

Via

Access Paper or Ask Questions

Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

Jun 17, 2024

Vernon Toh Yan Han, Rishabh Bhardwaj, Soujanya Poria

Figure 1 for Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

Figure 2 for Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

Figure 3 for Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

Figure 4 for Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

Abstract:We propose Ruby Teaming, a method that improves on Rainbow Teaming by including a memory cache as its third dimension. The memory dimension provides cues to the mutator to yield better-quality prompts, both in terms of attack success rate (ASR) and quality diversity. The prompt archive generated by Ruby Teaming has an ASR of 74%, which is 20% higher than the baseline. In terms of quality diversity, Ruby Teaming outperforms Rainbow Teaming by 6% and 3% on Shannon's Evenness Index (SEI) and Simpson's Diversity Index (SDI), respectively.

Via

Access Paper or Ask Questions

Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents

May 08, 2024

Yanfei Dong, Lambert Deng, Jiazheng Zhang, Xiaodong Yu, Ting Lin, Francesco Gelli, Soujanya Poria, Wee Sun Lee

Figure 1 for Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents

Figure 2 for Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents

Figure 3 for Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents

Figure 4 for Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents

Abstract:Documents that consist of diverse templates and exhibit complex spatial structures pose a challenge for document entity classification. We propose KNN-former, which incorporates a new kind of spatial bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities. We limit entities' attention only to their local radius defined by the KNN graph. We also use combinatorial matching to address the one-to-one mapping property that exists in many documents, where one field has only one corresponding entity. Moreover, our method is highly parameter-efficient compared to existing approaches in terms of the number of trainable parameters. Despite this, experiments across various datasets show our method outperforms baselines in most entity types. Many real-world documents exhibit combinatorial properties which can be leveraged as inductive biases to improve extraction accuracy, but existing datasets do not cover these documents. To facilitate future research into these types of documents, we release a new ID document dataset that covers diverse templates and languages. We also release enhanced annotations for an existing dataset.

Via

Access Paper or Ask Questions

Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense

May 07, 2024

Siqi Shen, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Soujanya Poria, Rada Mihalcea

Figure 1 for Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense

Figure 2 for Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense

Figure 3 for Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense

Figure 4 for Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense

Abstract:Large language models (LLMs) have demonstrated substantial commonsense understanding through numerous benchmark evaluations. However, their understanding of cultural commonsense remains largely unexamined. In this paper, we conduct a comprehensive examination of the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks. Using several general and cultural commonsense benchmarks, we find that (1) LLMs have a significant discrepancy in performance when tested on culture-specific commonsense knowledge for different cultures; (2) LLMs' general commonsense capability is affected by cultural context; and (3) The language used to query the LLMs can impact their performance on cultural-related tasks. Our study points to the inherent bias in the cultural understanding of LLMs and provides insights that can help develop culturally aware language models.

Via

Access Paper or Ask Questions