Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wray Buntine

NICTA

LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

Jun 13, 2024

Xiaohao Yang, He Zhao, Dinh Phung, Wray Buntine, Lan Du

Figure 1 for LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

Figure 2 for LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

Figure 3 for LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

Figure 4 for LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

Abstract:Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality) at a time, which is insufficient to reflect the overall model performance. In this paper, we propose WALM (Words Agreement with Language Model), a new evaluation method for topic modeling that comprehensively considers the semantic quality of document representations and topics in a joint manner, leveraging the power of large language models (LLMs). With extensive experiments involving different types of topic models, WALM is shown to align with human judgment and can serve as a complementary evaluation method to the existing ones, bringing a new perspective to topic modeling. Our software package will be available at https://github.com/Xiaohao-Yang/Topic_Model_Evaluation, which can be integrated with many widely used topic models.

Via

Access Paper or Ask Questions

Navigating Conflicting Views: Harnessing Trust for Learning

Jun 03, 2024

Jueqing Lu, Lan Du, Wray Buntine, Myong Chol Jung, Joanna Dipnall, Belinda Gabbe

Figure 1 for Navigating Conflicting Views: Harnessing Trust for Learning

Figure 2 for Navigating Conflicting Views: Harnessing Trust for Learning

Figure 3 for Navigating Conflicting Views: Harnessing Trust for Learning

Figure 4 for Navigating Conflicting Views: Harnessing Trust for Learning

Abstract:Resolving conflicts is essential to make the decisions of multi-view classification more reliable. Much research has been conducted on learning consistent informative representations among different views, assuming that all views are identically important and strictly aligned. However, real-world multi-view data may not always conform to these assumptions, as some views may express distinct information. To address this issue, we develop a computational trust-based discounting method to enhance the existing trustworthy framework in scenarios where conflicts between different views may arise. Its belief fusion process considers the trustworthiness of predictions made by individual views via an instance-wise probability-sensitive trust discounting mechanism. We evaluate our method on six real-world datasets, using Top-1 Accuracy, AUC-ROC for Uncertainty-Aware Prediction, Fleiss' Kappa, and a new metric called Multi-View Agreement with Ground Truth that takes into consideration the ground truth labels. The experimental results show that computational trust can effectively resolve conflicts, paving the way for more reliable multi-view classification models in real-world applications.

Via

Access Paper or Ask Questions

A Survey on the Real Power of ChatGPT

Apr 22, 2024

Ming Liu, Ran Liu, Hua Wang, Wray Buntine

Figure 1 for A Survey on the Real Power of ChatGPT

Figure 2 for A Survey on the Real Power of ChatGPT

Abstract:ChatGPT has changed the AI community and an active research line is the performance evaluation of ChatGPT. A key challenge for the evaluation is that ChatGPT is still closed-source and traditional benchmark datasets may have been used by ChatGPT as the training data. In this paper, (i) we survey recent studies which uncover the real performance levels of ChatGPT in seven categories of NLP tasks, (ii) review the social implications and safety issues of ChatGPT, and (iii) emphasize key challenges and opportunities for its evaluation. We hope our survey can shed some light on its blackbox manner, so that researchers are not misleaded by its surface generation.

* 9 pages, 2 tables

Via

Access Paper or Ask Questions

Improving Vietnamese-English Medical Machine Translation

Mar 28, 2024

Nhu Vo, Dat Quoc Nguyen, Dung D. Le, Massimo Piccardi, Wray Buntine

Figure 1 for Improving Vietnamese-English Medical Machine Translation

Figure 2 for Improving Vietnamese-English Medical Machine Translation

Figure 3 for Improving Vietnamese-English Medical Machine Translation

Figure 4 for Improving Vietnamese-English Medical Machine Translation

Abstract:Machine translation for Vietnamese-English in the medical domain is still an under-explored research area. In this paper, we introduce MedEV -- a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. We conduct extensive experiments comparing Google Translate, ChatGPT (gpt-3.5-turbo), state-of-the-art Vietnamese-English neural machine translation models and pre-trained bilingual/multilingual sequence-to-sequence models on our new MedEV dataset. Experimental results show that the best performance is achieved by fine-tuning "vinai-translate" for each translation direction. We publicly release our dataset to promote further research.

* To appear in Proceedings of LREC-COLING 2024

Via

Access Paper or Ask Questions

Towards Uncertainty-Aware Language Agent

Feb 08, 2024

Jiuzhou Han, Wray Buntine, Ehsan Shareghi

Figure 1 for Towards Uncertainty-Aware Language Agent

Figure 2 for Towards Uncertainty-Aware Language Agent

Figure 3 for Towards Uncertainty-Aware Language Agent

Figure 4 for Towards Uncertainty-Aware Language Agent

Abstract:While Language Agents have achieved promising success by placing Large Language Models at the core of a more versatile design that dynamically interacts with the external world, the existing approaches neglect the notion of uncertainty during these interactions. We present the Uncertainty-Aware Language Agent (UALA), a framework that orchestrates the interaction between the agent and the external world using uncertainty quantification. Compared with other well-known counterparts like ReAct, our extensive experiments across 3 representative tasks (HotpotQA, StrategyQA, MMLU) and various LLM sizes demonstrate that UALA brings a significant improvement of performance, while having a substantially lower reliance on the external world (i.e., reduced number of tool calls and tokens). Our analyses provide various insights including the great potential of UALA compared with agent fine-tuning, and underscore the unreliability of verbalised confidence of LLMs as a proxy for uncertainty.

* The code and data are at https://uala-agent.github.io. (Updated the design for multi-inference setup to be comparable with single-inference experiments.). arXiv admin note: text overlap with arXiv:2310.05915

Via

Access Paper or Ask Questions

OntoMedRec: Logically-Pretrained Model-Agnostic Ontology Encoders for Medication Recommendation

Jan 29, 2024

Weicong Tan, Weiqing Wang, Xin Zhou, Wray Buntine, Gordon Bingham

Figure 1 for OntoMedRec: Logically-Pretrained Model-Agnostic Ontology Encoders for Medication Recommendation

Figure 2 for OntoMedRec: Logically-Pretrained Model-Agnostic Ontology Encoders for Medication Recommendation

Figure 3 for OntoMedRec: Logically-Pretrained Model-Agnostic Ontology Encoders for Medication Recommendation

Figure 4 for OntoMedRec: Logically-Pretrained Model-Agnostic Ontology Encoders for Medication Recommendation

Abstract:Most existing medication recommendation models learn representations for medical concepts based on electronic health records (EHRs) and make recommendations with learnt representations. However, most medications appear in the dataset for limited times, resulting in insufficient learning of their representations. Medical ontologies are the hierarchical classification systems for medical terms where similar terms are in the same class on a certain level. In this paper, we propose OntoMedRec, the logically-pretrained and model-agnostic medical Ontology Encoders for Medication Recommendation that addresses data sparsity problem with medical ontologies. We conduct comprehensive experiments on benchmark datasets to evaluate the effectiveness of OntoMedRec, and the result shows the integration of OntoMedRec improves the performance of various models in both the entire EHR datasets and the admissions with few-shot medications. We provide the GitHub repository for the source code on https://anonymous.4open.science/r/OntoMedRec-D123

Via

Access Paper or Ask Questions

Harnessing the Power of Beta Scoring in Deep Active Learning for Multi-Label Text Classification

Jan 15, 2024

Wei Tan, Ngoc Dang Nguyen, Lan Du, Wray Buntine

Figure 1 for Harnessing the Power of Beta Scoring in Deep Active Learning for Multi-Label Text Classification

Figure 2 for Harnessing the Power of Beta Scoring in Deep Active Learning for Multi-Label Text Classification

Figure 3 for Harnessing the Power of Beta Scoring in Deep Active Learning for Multi-Label Text Classification

Figure 4 for Harnessing the Power of Beta Scoring in Deep Active Learning for Multi-Label Text Classification

Abstract:Within the scope of natural language processing, the domain of multi-label text classification is uniquely challenging due to its expansive and uneven label distribution. The complexity deepens due to the demand for an extensive set of annotated data for training an advanced deep learning model, especially in specialized fields where the labeling task can be labor-intensive and often requires domain-specific knowledge. Addressing these challenges, our study introduces a novel deep active learning strategy, capitalizing on the Beta family of proper scoring rules within the Expected Loss Reduction framework. It computes the expected increase in scores using the Beta Scoring Rules, which are then transformed into sample vector representations. These vector representations guide the diverse selection of informative samples, directly linking this process to the model's expected proper score. Comprehensive evaluations across both synthetic and real datasets reveal our method's capability to often outperform established acquisition techniques in multi-label text classification, presenting encouraging outcomes across various architectural and dataset scenarios.

* 7 pages AAAI 2024

Via

Access Paper or Ask Questions

Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning

Dec 15, 2023

Wei Tan, Lan Du, Wray Buntine

Figure 1 for Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning

Figure 2 for Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning

Figure 3 for Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning

Figure 4 for Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning

Abstract:The effectiveness of active learning largely depends on the sampling efficiency of the acquisition function. Expected Loss Reduction (ELR) focuses on a Bayesian estimate of the reduction in classification error, and more general costs fit in the same framework. We propose Bayesian Estimate of Mean Proper Scores (BEMPS) to estimate the increase in strictly proper scores such as log probability or negative mean square error within this framework. We also prove convergence results for this general class of costs. To facilitate better experimentation with the new acquisition functions, we develop a complementary batch AL algorithm that encourages diversity in the vector of expected changes in scores for unlabeled data. To allow high-performance classifiers, we combine deep ensembles, and dynamic validation set construction on pretrained models, and further speed up the ensemble process with the idea of Monte Carlo Dropout. Extensive experiments on both texts and images show that the use of mean square error and log probability with BEMPS yields robust acquisition functions and well-calibrated classifiers, and consistently outperforms the others tested. The advantages of BEMPS over the others are further supported by a set of qualitative analyses, where we visualise their sampling behaviour using data maps and t-SNE plots.

* TPAMI, 2023
* 16 pages, TPAMI. arXiv admin note: text overlap with arXiv:2110.14171

Via

Access Paper or Ask Questions

HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts

Nov 23, 2023

Do Huu Dat, Po Yuan Mao, Tien Hoang Nguyen, Wray Buntine, Mohammed Bennamoun

Figure 1 for HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts

Figure 2 for HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts

Figure 3 for HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts

Figure 4 for HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts

Abstract:Compositional Zero-Shot Learning (CZSL) has emerged as an essential paradigm in machine learning, aiming to overcome the constraints of traditional zero-shot learning by incorporating compositional thinking into its methodology. Conventional zero-shot learning has difficulty managing unfamiliar combinations of seen and unseen classes because it depends on pre-defined class embeddings. In contrast, Compositional Zero-Shot Learning uses the inherent hierarchies and structural connections among classes, creating new class representations by combining attributes, components, or other semantic elements. In our paper, we propose a novel framework that for the first time combines the Modern Hopfield Network with a Mixture of Experts (HOMOE) to classify the compositions of previously unseen objects. Specifically, the Modern Hopfield Network creates a memory that stores label prototypes and identifies relevant labels for a given input image. Following this, the Mixture of Expert models integrates the image with the fitting prototype to produce the final composition classification. Our approach achieves SOTA performance on several benchmarks, including MIT-States and UT-Zappos. We also examine how each component contributes to improved generalization.

Via

Access Paper or Ask Questions

Open-Set Graph Anomaly Detection via Normal Structure Regularisation

Nov 12, 2023

Qizhou Wang, Guansong Pang, Mahsa Salehi, Wray Buntine, Christopher Leckie

Figure 1 for Open-Set Graph Anomaly Detection via Normal Structure Regularisation

Figure 2 for Open-Set Graph Anomaly Detection via Normal Structure Regularisation

Figure 3 for Open-Set Graph Anomaly Detection via Normal Structure Regularisation

Figure 4 for Open-Set Graph Anomaly Detection via Normal Structure Regularisation

Abstract:This paper considers an under-explored Graph Anomaly Detection (GAD) task, namely open-set GAD, which aims to detect anomalous nodes using a small number of labelled training normal and anomaly nodes (known as seen anomalies) that cannot illustrate all possible inference-time abnormalities. The task has attracted growing attention due to the availability of anomaly prior knowledge from the label information that can help to substantially reduce detection errors. However, current methods tend to over-emphasise fitting the seen anomalies, leading to a weak generalisation ability to detect unseen anomalies, i.e., those that are not illustrated by the labelled anomaly nodes. Further, they were introduced to handle Euclidean data, failing to effectively capture important non-Euclidean features for GAD. In this work, we propose a novel open-set GAD approach, namely normal structure regularisation (NSReg), to leverage the rich normal graph structure embedded in the labelled nodes to tackle the aforementioned two issues. In particular, NSReg trains an anomaly-discriminative supervised graph anomaly detector, with a plug-and-play regularisation term to enforce compact, semantically-rich representations of normal nodes. To this end, the regularisation is designed to differentiate various types of normal nodes, including labelled normal nodes that are connected in their local neighbourhood, and those that are not connected. By doing so, it helps incorporate strong normality into the supervised anomaly detector learning, mitigating their overfitting to the seen anomalies. Extensive empirical results on real-world datasets demonstrate the superiority of our proposed NSReg for open-set GAD.

Via

Access Paper or Ask Questions