Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lidong Bing

Semantic-Aware Contrastive Sentence Representation Learning with Large Language Models

Oct 17, 2023

Huiming Wang, Liying Cheng, Zhaodonghui Li, De Wen Soh, Lidong Bing

Figure 1 for Semantic-Aware Contrastive Sentence Representation Learning with Large Language Models

Figure 2 for Semantic-Aware Contrastive Sentence Representation Learning with Large Language Models

Figure 3 for Semantic-Aware Contrastive Sentence Representation Learning with Large Language Models

Figure 4 for Semantic-Aware Contrastive Sentence Representation Learning with Large Language Models

Abstract:Contrastive learning has been proven to be effective in learning better sentence representations. However, to train a contrastive learning model, large numbers of labeled sentences are required to construct positive and negative pairs explicitly, such as those in natural language inference (NLI) datasets. Unfortunately, acquiring sufficient high-quality labeled data can be both time-consuming and resource-intensive, leading researchers to focus on developing methods for learning unsupervised sentence representations. As there is no clear relationship between these unstructured randomly-sampled sentences, building positive and negative pairs over them is tricky and problematic. To tackle these challenges, in this paper, we propose SemCSR, a semantic-aware contrastive sentence representation framework. By leveraging the generation and evaluation capabilities of large language models (LLMs), we can automatically construct a high-quality NLI-style corpus without any human annotation, and further incorporate the generated sentence pairs into learning a contrastive sentence representation model. Extensive experiments and comprehensive analyses demonstrate the effectiveness of our proposed framework for learning a better sentence representation with LLMs.

Via

Access Paper or Ask Questions

Multilingual Jailbreak Challenges in Large Language Models

Oct 10, 2023

Yue Deng, Wenxuan Zhang, Sinno Jialin Pan, Lidong Bing

Abstract:While large language models (LLMs) exhibit remarkable capabilities across a wide range of tasks, they pose potential safety concerns, such as the ``jailbreak'' problem, wherein malicious instructions can manipulate LLMs to exhibit undesirable behavior. Although several preventive measures have been developed to mitigate the potential risks associated with LLMs, they have primarily focused on English data. In this study, we reveal the presence of multilingual jailbreak challenges within LLMs and consider two potential risk scenarios: unintentional and intentional. The unintentional scenario involves users querying LLMs using non-English prompts and inadvertently bypassing the safety mechanisms, while the intentional scenario concerns malicious users combining malicious instructions with multilingual prompts to deliberately attack LLMs. The experimental results reveal that in the unintentional scenario, the rate of unsafe content increases as the availability of languages decreases. Specifically, low-resource languages exhibit three times the likelihood of encountering harmful content compared to high-resource languages, with both ChatGPT and GPT-4. In the intentional scenario, multilingual prompts can exacerbate the negative impact of malicious instructions, with astonishingly high rates of unsafe output: 80.92\% for ChatGPT and 40.71\% for GPT-4. To handle such a challenge in the multilingual context, we propose a novel \textsc{Self-Defense} framework that automatically generates multilingual training data for safety fine-tuning. Experimental results show that ChatGPT fine-tuned with such data can achieve a substantial reduction in unsafe content generation. Data is available at https://github.com/DAMO-NLP-SG/multilingual-safety-for-LLMs. Warning: This paper contains examples with potentially harmful content.

Via

Access Paper or Ask Questions

Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models

Jun 27, 2023

Qingyu Tan, Hwee Tou Ng, Lidong Bing

Figure 1 for Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models

Figure 2 for Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models

Figure 3 for Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models

Figure 4 for Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models

Abstract:Reasoning about time is of fundamental importance. Many facts are time-dependent. For example, athletes change teams from time to time, and different government officials are elected periodically. Previous time-dependent question answering (QA) datasets tend to be biased in either their coverage of time spans or question types. In this paper, we introduce a comprehensive probing dataset \tempreason to evaluate the temporal reasoning capability of large language models. Our dataset includes questions of three temporal reasoning levels. In addition, we also propose a novel learning framework to improve the temporal reasoning capability of large language models, based on temporal span extraction and time-sensitive reinforcement learning. We conducted experiments in closed book QA, open book QA, and reasoning QA settings and demonstrated the effectiveness of our approach. Our code and data are released on https://github.com/DAMO-NLP-SG/TempReason.

* ACL 2023

Via

Access Paper or Ask Questions

Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Jun 20, 2023

Xuan-Phi Nguyen, Sharifah Mahani Aljunied, Shafiq Joty, Lidong Bing

Figure 1 for Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Figure 2 for Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Figure 3 for Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Figure 4 for Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Abstract:Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary. Moreover, competent generative capabilities of LLMs are observed only in high-resource languages, while their performances among under-represented languages fall behind due to pre-training data imbalance. To elicit LLMs' ability onto low-resource languages without any supervised data, we propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. These prompts are then used to create intra-lingual exemplars to perform tasks in the target languages. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages. We also show that fine-tuning a 7B model on data generated from our method helps it perform competitively with a 175B model. In non-English translation tasks, our method even outperforms supervised prompting by up to 3 chrF++ in many low-resource languages. When evaluated on zero-shot multilingual summarization, our method surpasses other English-pivoting baselines by up to 4 ROUGE-L and is also favored by GPT-4.

* Pre-print

Via

Access Paper or Ask Questions

Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data

Jun 16, 2023

Qingyu Tan, Lu Xu, Lidong Bing, Hwee Tou Ng

Figure 1 for Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data

Figure 2 for Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data

Figure 3 for Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data

Figure 4 for Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data

Abstract:Relation extraction (RE) aims to extract relations from sentences and documents. Existing relation extraction models typically rely on supervised machine learning. However, recent studies showed that many RE datasets are incompletely annotated. This is known as the false negative problem in which valid relations are falsely annotated as 'no_relation'. Models trained with such data inevitably make similar mistakes during the inference stage. Self-training has been proven effective in alleviating the false negative problem. However, traditional self-training is vulnerable to confirmation bias and exhibits poor performance in minority classes. To overcome this limitation, we proposed a novel class-adaptive re-sampling self-training framework. Specifically, we re-sampled the pseudo-labels for each class by precision and recall scores. Our re-sampling strategy favored the pseudo-labels of classes with high precision and low recall, which improved the overall recall without significantly compromising precision. We conducted experiments on document-level and biomedical relation extraction datasets, and the results showed that our proposed self-training framework consistently outperforms existing competitive methods on the Re-DocRED and ChemDisgene datasets when the training data are incompletely annotated. Our code is released at https://github.com/DAMO-NLP-SG/CAST.

* ACL 2023 Findings

Via

Access Paper or Ask Questions

INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Jun 15, 2023

Yew Ken Chia, Pengfei Hong, Lidong Bing, Soujanya Poria

Figure 1 for INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Figure 2 for INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Figure 3 for INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Figure 4 for INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Abstract:Instruction-tuned large language models have revolutionized natural language processing and have shown great potential in applications such as conversational agents. These models, such as GPT-4, can not only master language but also solve complex tasks in areas like mathematics, coding, medicine, and law. Despite their impressive capabilities, there is still a lack of comprehensive understanding regarding their full potential, primarily due to the black-box nature of many models and the absence of holistic evaluation studies. To address these challenges, we present INSTRUCTEVAL, a more comprehensive evaluation suite designed specifically for instruction-tuned large language models. Unlike previous works, our evaluation involves a rigorous assessment of models based on problem-solving, writing ability, and alignment to human values. We take a holistic approach to analyze various factors affecting model performance, including the pretraining foundation, instruction-tuning data, and training methods. Our findings reveal that the quality of instruction data is the most crucial factor in scaling model performance. While open-source models demonstrate impressive writing abilities, there is substantial room for improvement in problem-solving and alignment. We are encouraged by the rapid development of models by the open-source community, but we also highlight the need for rigorous evaluation to support claims made about these models. Through INSTRUCTEVAL, we aim to foster a deeper understanding of instruction-tuned models and advancements in their capabilities. INSTRUCTEVAL is publicly available at https://github.com/declare-lab/instruct-eval.

* Github: https://github.com/declare-lab/instruct-eval Leaderboard: https://declare-lab.github.io/instruct-eval/

Via

Access Paper or Ask Questions

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Jun 12, 2023

Hang Zhang, Xin Li, Lidong Bing

Figure 1 for Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Figure 2 for Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Figure 3 for Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Figure 4 for Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Abstract:We present Video-LLaMA, a multi-modal framework that empowers Large Language Models (LLMs) with the capability of understanding both visual and auditory content in the video. Video-LLaMA bootstraps cross-modal training from the frozen pre-trained visual & audio encoders and the frozen LLMs. Unlike previous vision-LLMs that focus on static image comprehensions such as MiniGPT-4 and LLaVA, Video-LLaMA mainly tackles two challenges in video understanding: (1) capturing the temporal changes in visual scenes, (2) integrating audio-visual signals. To counter the first challenge, we propose a Video Q-former to assemble the pre-trained image encoder into our video encoder and introduce a video-to-text generation task to learn video-language correspondence. For the second challenge, we leverage ImageBind, a universal embedding model aligning multiple modalities as the pre-trained audio encoder, and introduce an Audio Q-former on top of ImageBind to learn reasonable auditory query embeddings for the LLM module. To align the output of both visual & audio encoders with LLM's embedding space, we train Video-LLaMA on massive video/image-caption pairs as well as visual-instruction-tuning datasets of moderate amount but higher quality. We found Video-LLaMA showcases the ability to perceive and comprehend video content, generating meaningful responses that are grounded in the visual and auditory information presented in the videos. This highlights the potential of Video-LLaMA as a promising prototype for audio-visual AI assistants.

* Technical Report; Code, Pretrained Model, and Dataset: https://github.com/DAMO-NLP-SG/Video-LLaMA

Via

Access Paper or Ask Questions

M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models

Jun 08, 2023

Wenxuan Zhang, Sharifah Mahani Aljunied, Chang Gao, Yew Ken Chia, Lidong Bing

Figure 1 for M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models

Figure 2 for M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models

Figure 3 for M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models

Figure 4 for M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models

Abstract:Despite the existence of various benchmarks for evaluating natural language processing models, we argue that human exams are a more suitable means of evaluating general intelligence for large language models (LLMs), as they inherently demand a much wider range of abilities such as language understanding, domain knowledge, and problem-solving skills. To this end, we introduce M3Exam, a novel benchmark sourced from real and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context. M3Exam exhibits three unique characteristics: (1) multilingualism, encompassing questions from multiple countries that require strong multilingual proficiency and cultural knowledge; (2) multimodality, accounting for the multimodal nature of many exam questions to test the model's multimodal understanding capability; and (3) multilevel structure, featuring exams from three critical educational periods to comprehensively assess a model's proficiency at different levels. In total, M3Exam contains 12,317 questions in 9 diverse languages with three educational levels, where about 23\% of the questions require processing images for successful solving. We assess the performance of top-performing LLMs on M3Exam and find that current models, including GPT-4, still struggle with multilingual text, particularly in low-resource and non-Latin script languages. Multimodal LLMs also perform poorly with complex multimodal questions. We believe that M3Exam can be a valuable resource for comprehensively evaluating LLMs by examining their multilingual and multimodal abilities and tracking their development. Data and evaluation code is available at \url{https://github.com/DAMO-NLP-SG/M3Exam}.

Via

Access Paper or Ask Questions

AQE: Argument Quadruplet Extraction via a Quad-Tagging Augmented Generative Approach

May 31, 2023

Jia Guo, Liying Cheng, Wenxuan Zhang, Stanley Kok, Xin Li, Lidong Bing

Figure 1 for AQE: Argument Quadruplet Extraction via a Quad-Tagging Augmented Generative Approach

Figure 2 for AQE: Argument Quadruplet Extraction via a Quad-Tagging Augmented Generative Approach

Figure 3 for AQE: Argument Quadruplet Extraction via a Quad-Tagging Augmented Generative Approach

Figure 4 for AQE: Argument Quadruplet Extraction via a Quad-Tagging Augmented Generative Approach

Abstract:Argument mining involves multiple sub-tasks that automatically identify argumentative elements, such as claim detection, evidence extraction, stance classification, etc. However, each subtask alone is insufficient for a thorough understanding of the argumentative structure and reasoning process. To learn a complete view of an argument essay and capture the interdependence among argumentative components, we need to know what opinions people hold (i.e., claims), why those opinions are valid (i.e., supporting evidence), which source the evidence comes from (i.e., evidence type), and how those claims react to the debating topic (i.e., stance). In this work, we for the first time propose a challenging argument quadruplet extraction task (AQE), which can provide an all-in-one extraction of four argumentative components, i.e., claims, evidence, evidence types, and stances. To support this task, we construct a large-scale and challenging dataset. However, there is no existing method that can solve the argument quadruplet extraction. To fill this gap, we propose a novel quad-tagging augmented generative approach, which leverages a quadruplet tagging module to augment the training of the generative framework. The experimental results on our dataset demonstrate the empirical superiority of our proposed approach over several strong baselines.

* Findings of ACL 2023

Via

Access Paper or Ask Questions

Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

May 25, 2023

Shengqiong Wu, Hao Fei, Yixin Cao, Lidong Bing, Tat-Seng Chua

Abstract:Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. To combat that, we propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting. First, we represent the fine-grained semantic structures of the input image and text with the visual and textual scene graphs, which are further fused into a unified cross-modal graph (CMG). Based on CMG, we perform structure refinement with the guidance of the graph information bottleneck principle, actively denoising the less-informative features. Next, we perform topic modeling over the input image and text, incorporating latent multimodal topic features to enrich the contexts. On the benchmark MRE dataset, our system outperforms the current best model significantly. With further in-depth analyses, we reveal the great potential of our method for the MRE task. Our codes are open at https://github.com/ChocoWu/MRE-ISE.

* ACL 2023

Via

Access Paper or Ask Questions