Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gholamreza Haffari

Monash University

SADAS: A Dialogue Assistant System Towards Remediating Norm Violations in Bilingual Socio-Cultural Conversations

Jan 29, 2024

Yuncheng Hua, Zhuang Li, Linhao Luo, Kadek Ananta Satriadi, Tao Feng, Haolan Zhan, Lizhen Qu, Suraj Sharma, Ingrid Zukerman, Zhaleh Semnani-Azad(+1 more)

Abstract:In today's globalized world, bridging the cultural divide is more critical than ever for forging meaningful connections. The Socially-Aware Dialogue Assistant System (SADAS) is our answer to this global challenge, and it's designed to ensure that conversations between individuals from diverse cultural backgrounds unfold with respect and understanding. Our system's novel architecture includes: (1) identifying the categories of norms present in the dialogue, (2) detecting potential norm violations, (3) evaluating the severity of these violations, (4) implementing targeted remedies to rectify the breaches, and (5) articulates the rationale behind these corrective actions. We employ a series of State-Of-The-Art (SOTA) techniques to build different modules, and conduct numerous experiments to select the most suitable backbone model for each of the modules. We also design a human preference experiment to validate the overall performance of the system. We will open-source our system (including source code, tools and applications), hoping to advance future research. A demo video of our system can be found at:~\url{https://youtu.be/JqetWkfsejk}. We have released our code and software at:~\url{https://github.com/AnonymousEACLDemo/SADAS}.

* 8 pages, 2 figures

Via

Access Paper or Ask Questions

Importance-Aware Data Augmentation for Document-Level Neural Machine Translation

Jan 27, 2024

Minghao Wu, Yufei Wang, George Foster, Lizhen Qu, Gholamreza Haffari

Figure 1 for Importance-Aware Data Augmentation for Document-Level Neural Machine Translation

Figure 2 for Importance-Aware Data Augmentation for Document-Level Neural Machine Translation

Figure 3 for Importance-Aware Data Augmentation for Document-Level Neural Machine Translation

Figure 4 for Importance-Aware Data Augmentation for Document-Level Neural Machine Translation

Abstract:Document-level neural machine translation (DocNMT) aims to generate translations that are both coherent and cohesive, in contrast to its sentence-level counterpart. However, due to its longer input length and limited availability of training data, DocNMT often faces the challenge of data sparsity. To overcome this issue, we propose a novel Importance-Aware Data Augmentation (IADA) algorithm for DocNMT that augments the training data based on token importance information estimated by the norm of hidden states and training gradients. We conduct comprehensive experiments on three widely-used DocNMT benchmarks. Our empirical results show that our proposed IADA outperforms strong DocNMT baselines as well as several data augmentation approaches, with statistical significance on both sentence-level and document-level BLEU.

* 13 pages, 4 figures, 7 tables, accepted by EACL2024 main conference

Via

Access Paper or Ask Questions

Towards Event Extraction from Speech with Contextual Clues

Jan 27, 2024

Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari

Abstract:While text-based event extraction has been an active research area and has seen successful application in many domains, extracting semantic events from speech directly is an under-explored problem. In this paper, we introduce the Speech Event Extraction (SpeechEE) task and construct three synthetic training sets and one human-spoken test set. Compared to event extraction from text, SpeechEE poses greater challenges mainly due to complex speech signals that are continuous and have no word boundaries. Additionally, unlike perceptible sound events, semantic events are more subtle and require a deeper understanding. To tackle these challenges, we introduce a sequence-to-structure generation paradigm that can produce events from speech signals in an end-to-end manner, together with a conditioned generation method that utilizes speech recognition transcripts as the contextual clue. We further propose to represent events with a flat format to make outputs more natural language-like. Our experimental results show that our method brings significant improvements on all datasets, achieving a maximum F1 gain of 10.7%. The code and datasets are released on https://github.com/jodie-kang/SpeechEE.

* Under Review

Via

Access Paper or Ask Questions

Adapting Large Language Models for Document-Level Machine Translation

Jan 12, 2024

Minghao Wu, Thuy-Trang Vu, Lizhen Qu, George Foster, Gholamreza Haffari

Figure 1 for Adapting Large Language Models for Document-Level Machine Translation

Figure 2 for Adapting Large Language Models for Document-Level Machine Translation

Figure 3 for Adapting Large Language Models for Document-Level Machine Translation

Figure 4 for Adapting Large Language Models for Document-Level Machine Translation

Abstract:Large language models (LLMs) have made significant strides in various natural language processing (NLP) tasks. Recent research shows that the moderately-sized LLMs often outperform their larger counterparts after task-specific fine-tuning. In this work, we delve into the process of adapting LLMs to specialize in document-level machine translation (DocMT) for a specific language pair. Firstly, we explore how prompt strategies affect downstream translation performance. Then, we conduct extensive experiments with two fine-tuning methods, three LLM backbones, and 18 translation tasks across nine language pairs. Our findings indicate that in some cases, these specialized models even surpass GPT-4 in translation performance, while they still significantly suffer from the off-target translation issue in others, even if they are exclusively fine-tuned on bilingual parallel documents. Furthermore, we provide an in-depth analysis of these LLMs tailored for DocMT, exploring aspects such as translation errors, the scaling law of parallel documents, out-of-domain generalization, and the impact of zero-shot crosslingual transfer. The findings of this research not only shed light on the strengths and limitations of LLM-based DocMT models but also provide a foundation for future research in DocMT.

* work in progress; 21 pages, 14 tables, 7 figures

Via

Access Paper or Ask Questions

Natural Language Processing for Dialects of a Language: A Survey

Jan 11, 2024

Aditya Joshi, Raj Dabre, Diptesh Kanojia, Zhuang Li, Haolan Zhan, Gholamreza Haffari, Doris Dippold

Figure 1 for Natural Language Processing for Dialects of a Language: A Survey

Figure 2 for Natural Language Processing for Dialects of a Language: A Survey

Figure 3 for Natural Language Processing for Dialects of a Language: A Survey

Figure 4 for Natural Language Processing for Dialects of a Language: A Survey

Abstract:State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches. We describe a wide range of NLP tasks in terms of two categories: natural language understanding (NLU) (for tasks such as dialect classification, sentiment analysis, parsing, and NLU benchmarks) and natural language generation (NLG) (for summarisation, machine translation, and dialogue systems). The survey is also broad in its coverage of languages which include English, Arabic, German among others. We observe that past work in NLP concerning dialects goes deeper than mere dialect classification, and . This includes early approaches that used sentence transduction that lead to the recent approaches that integrate hypernetworks into LoRA. We expect that this survey will be useful to NLP researchers interested in building equitable language technologies by rethinking LLM benchmarks and model architectures.

* The paper is under review at ACM Computing Surveys. Please reach out to the authors in the case of feedback

Via

Access Paper or Ask Questions

Systematic Assessment of Factual Knowledge in Large Language Models

Oct 30, 2023

Linhao Luo, Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari

Abstract:Previous studies have relied on existing question-answering benchmarks to evaluate the knowledge stored in large language models (LLMs). However, this approach has limitations regarding factual knowledge coverage, as it mostly focuses on generic domains which may overlap with the pretraining data. This paper proposes a framework to systematically assess the factual knowledge of LLMs by leveraging knowledge graphs (KGs). Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions. We systematically evaluate the state-of-the-art LLMs with KGs in generic and specific domains. The experiment shows that ChatGPT is consistently the top performer across all domains. We also find that LLMs performance depends on the instruction finetuning, domain and question complexity and is prone to adversarial context.

* Accepted by EMNLP 2023 Findings

Via

Access Paper or Ask Questions

DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

Oct 24, 2023

Xiao-Yu Guo, Yuan-Fang Li, Gholamreza Haffari

Figure 1 for DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

Figure 2 for DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

Figure 3 for DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

Figure 4 for DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

Abstract:Social intelligence is essential for understanding and reasoning about human expressions, intents and interactions. One representative benchmark for its study is Social Intelligence Queries (Social-IQ), a dataset of multiple-choice questions on videos of complex social interactions. We define a comprehensive methodology to study the soundness of Social-IQ, as the soundness of such benchmark datasets is crucial to the investigation of the underlying research problem. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model to learn spurious correlations to achieve perfect performance without being given the context or even the question. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ. Our empirical analysis shows DeSIQ significantly reduces the biases in the original Social-IQ dataset. Furthermore, we examine and shed light on the effect of model size, model style, learning settings, commonsense knowledge, and multi-modality on the new benchmark performance. Our new dataset, observations and findings open up important research questions for the study of social intelligence.

* 12 pages, 5 figures, EMNLP 2023 Long Paper

Via

Access Paper or Ask Questions

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Oct 02, 2023

Linhao Luo, Yuan-Fang Li, Gholamreza Haffari, Shirui Pan

Figure 1 for Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Figure 2 for Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Figure 3 for Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Figure 4 for Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Abstract:Large language models (LLMs) have demonstrated impressive reasoning abilities in complex tasks. However, they lack up-to-date knowledge and experience hallucinations during reasoning, which can lead to incorrect reasoning processes and diminish their performance and trustworthiness. Knowledge graphs (KGs), which capture vast amounts of facts in a structured format, offer a reliable source of knowledge for reasoning. Nevertheless, existing KG-based LLM reasoning methods only treat KGs as factual knowledge bases and overlook the importance of their structural information for reasoning. In this paper, we propose a novel method called reasoning on graphs (RoG) that synergizes LLMs with KGs to enable faithful and interpretable reasoning. Specifically, we present a planning-retrieval-reasoning framework, where RoG first generates relation paths grounded by KGs as faithful plans. These plans are then used to retrieve valid reasoning paths from the KGs for LLMs to conduct faithful reasoning. Furthermore, RoG not only distills knowledge from KGs to improve the reasoning ability of LLMs through training but also allows seamless integration with any arbitrary LLMs during inference. Extensive experiments on two benchmark KGQA datasets demonstrate that RoG achieves state-of-the-art performance on KG reasoning tasks and generates faithful and interpretable reasoning results.

* 22 pages, 4 figures

Via

Access Paper or Ask Questions

Reranking for Natural Language Generation from Logical Forms: A Study based on Large Language Models

Sep 21, 2023

Levon Haroutunian, Zhuang Li, Lucian Galescu, Philip Cohen, Raj Tumuluri, Gholamreza Haffari

Abstract:Large language models (LLMs) have demonstrated impressive capabilities in natural language generation. However, their output quality can be inconsistent, posing challenges for generating natural language from logical forms (LFs). This task requires the generated outputs to embody the exact semantics of LFs, without missing any LF semantics or creating any hallucinations. In this work, we tackle this issue by proposing a novel generate-and-rerank approach. Our approach involves initially generating a set of candidate outputs by prompting an LLM and subsequently reranking them using a task-specific reranker model. In addition, we curate a manually collected dataset to evaluate the alignment between different ranking metrics and human judgements. The chosen ranking metrics are utilized to enhance the training and evaluation of the reranker model. By conducting extensive experiments on three diverse datasets, we demonstrate that the candidates selected by our reranker outperform those selected by baseline methods in terms of semantic consistency and fluency, as measured by three comprehensive metrics. Our findings provide strong evidence for the effectiveness of our approach in improving the quality of generated outputs.

* IJCNLP-AACL 2023

Via

Access Paper or Ask Questions

ChatRule: Mining Logical Rules with Large Language Models for Knowledge Graph Reasoning

Sep 13, 2023

Linhao Luo, Jiaxin Ju, Bo Xiong, Yuan-Fang Li, Gholamreza Haffari, Shirui Pan

Figure 1 for ChatRule: Mining Logical Rules with Large Language Models for Knowledge Graph Reasoning

Figure 2 for ChatRule: Mining Logical Rules with Large Language Models for Knowledge Graph Reasoning

Figure 3 for ChatRule: Mining Logical Rules with Large Language Models for Knowledge Graph Reasoning

Figure 4 for ChatRule: Mining Logical Rules with Large Language Models for Knowledge Graph Reasoning

Abstract:Logical rules are essential for uncovering the logical connections between relations, which could improve the reasoning performance and provide interpretable results on knowledge graphs (KGs). Although there have been many efforts to mine meaningful logical rules over KGs, existing methods suffer from the computationally intensive searches over the rule space and a lack of scalability for large-scale KGs. Besides, they often ignore the semantics of relations which is crucial for uncovering logical connections. Recently, large language models (LLMs) have shown impressive performance in the field of natural language processing and various applications, owing to their emergent ability and generalizability. In this paper, we propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs. Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs to prompt LLMs to generate logical rules. To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs. Last, a rule validator harnesses the reasoning ability of LLMs to validate the logical correctness of ranked rules through chain-of-thought reasoning. ChatRule is evaluated on four large-scale KGs, w.r.t. different rule quality metrics and downstream tasks, showing the effectiveness and scalability of our method.

* 11 pages, 4 figures

Via

Access Paper or Ask Questions