Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lidong Bing

Zero-Shot Text Classification via Self-Supervised Tuning

May 25, 2023

Chaoqun Liu, Wenxuan Zhang, Guizhen Chen, Xiaobao Wu, Anh Tuan Luu, Chip Hong Chang, Lidong Bing

Figure 1 for Zero-Shot Text Classification via Self-Supervised Tuning

Figure 2 for Zero-Shot Text Classification via Self-Supervised Tuning

Figure 3 for Zero-Shot Text Classification via Self-Supervised Tuning

Figure 4 for Zero-Shot Text Classification via Self-Supervised Tuning

Abstract:Existing solutions to zero-shot text classification either conduct prompting with pre-trained language models, which is sensitive to the choices of templates, or rely on large-scale annotated data of relevant tasks for meta-tuning. In this work, we propose a new paradigm based on self-supervised learning to solve zero-shot text classification tasks by tuning the language models with unlabeled data, called self-supervised tuning. By exploring the inherent structure of free texts, we propose a new learning objective called first sentence prediction to bridge the gap between unlabeled data and text classification tasks. After tuning the model to learn to predict the first sentence in a paragraph based on the rest, the model is able to conduct zero-shot inference on unseen tasks such as topic classification and sentiment analysis. Experimental results show that our model outperforms the state-of-the-art baselines on 7 out of 10 tasks. Moreover, the analysis reveals that our model is less sensitive to the prompt design. Our code and pre-trained models are publicly available at https://github.com/DAMO-NLP-SG/SSTuning .

* Accepted to the Findings of ACL 2023

Via

Access Paper or Ask Questions

Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

May 25, 2023

Shengqiong Wu, Hao Fei, Yixin Cao, Lidong Bing, Tat-Seng Chua

Abstract:Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. To combat that, we propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting. First, we represent the fine-grained semantic structures of the input image and text with the visual and textual scene graphs, which are further fused into a unified cross-modal graph (CMG). Based on CMG, we perform structure refinement with the guidance of the graph information bottleneck principle, actively denoising the less-informative features. Next, we perform topic modeling over the input image and text, incorporating latent multimodal topic features to enrich the contexts. On the benchmark MRE dataset, our system outperforms the current best model significantly. With further in-depth analyses, we reveal the great potential of our method for the MRE task. Our codes are open at https://github.com/ChocoWu/MRE-ISE.

* ACL 2023

Via

Access Paper or Ask Questions

Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction

May 25, 2023

Thong Nguyen, Xiaobao Wu, Xinshuai Dong, Anh Tuan Luu, Cong-Duy Nguyen, Zhen Hai, Lidong Bing

Figure 1 for Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction

Figure 2 for Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction

Figure 3 for Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction

Figure 4 for Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction

Abstract:Multimodal Review Helpfulness Prediction (MRHP) aims to rank product reviews based on predicted helpfulness scores and has been widely applied in e-commerce via presenting customers with useful reviews. Previous studies commonly employ fully-connected neural networks (FCNNs) as the final score predictor and pairwise loss as the training objective. However, FCNNs have been shown to perform inefficient splitting for review features, making the model difficult to clearly differentiate helpful from unhelpful reviews. Furthermore, pairwise objective, which works on review pairs, may not completely capture the MRHP goal to produce the ranking for the entire review list, and possibly induces low generalization during testing. To address these issues, we propose a listwise attention network that clearly captures the MRHP ranking context and a listwise optimization objective that enhances model generalization. We further propose gradient-boosted decision tree as the score predictor to efficaciously partition product reviews' representations. Extensive experiments demonstrate that our method achieves state-of-the-art results and polished generalization performance on two large-scale MRHP benchmark datasets.

* Published in ACL 2023 (Findings)

Via

Access Paper or Ask Questions

Unlocking Temporal Question Answering for Large Language Models Using Code Execution

May 24, 2023

Xingxuan Li, Liying Cheng, Qingyu Tan, Hwee Tou Ng, Shafiq Joty, Lidong Bing

Abstract:Large language models (LLMs) have made significant progress in natural language processing (NLP), and are utilized extensively in various applications. Recent works, such as chain-of-thought (CoT), have shown that intermediate reasoning steps can improve the performance of LLMs for complex reasoning tasks, such as math problems and symbolic question-answering tasks. However, we notice the challenge that LLMs face when it comes to temporal reasoning. Our preliminary experiments show that generating intermediate reasoning steps does not always boost the performance of complex temporal question-answering tasks. Therefore, we propose a novel framework that combines the extraction capability of LLMs and the logical reasoning capability of a Python solver to tackle this issue. Extensive experiments and analysis demonstrate the effectiveness of our framework in handling intricate time-bound reasoning tasks.

Via

Access Paper or Ask Questions

Sentiment Analysis in the Era of Large Language Models: A Reality Check

May 24, 2023

Wenxuan Zhang, Yue Deng, Bing Liu, Sinno Jialin Pan, Lidong Bing

Figure 1 for Sentiment Analysis in the Era of Large Language Models: A Reality Check

Figure 2 for Sentiment Analysis in the Era of Large Language Models: A Reality Check

Figure 3 for Sentiment Analysis in the Era of Large Language Models: A Reality Check

Figure 4 for Sentiment Analysis in the Era of Large Language Models: A Reality Check

Abstract:Sentiment analysis (SA) has been a long-standing research area in natural language processing. It can offer rich insights into human sentiments and opinions and has thus seen considerable interest from both academia and industry. With the advent of large language models (LLMs) such as ChatGPT, there is a great potential for their employment on SA problems. However, the extent to which existing LLMs can be leveraged for different sentiment analysis tasks remains unclear. This paper aims to provide a comprehensive investigation into the capabilities of LLMs in performing various sentiment analysis tasks, from conventional sentiment classification to aspect-based sentiment analysis and multifaceted analysis of subjective texts. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets. Our study reveals that while LLMs demonstrate satisfactory performance in simpler tasks, they lag behind in more complex tasks requiring deeper understanding or structured sentiment information. However, LLMs significantly outperform SLMs in few-shot learning settings, suggesting their potential when annotation resources are limited. We also highlight the limitations of current evaluation practices in assessing LLMs' SA abilities and propose a novel benchmark, \textsc{SentiEval}, for a more comprehensive and realistic evaluation. Data and code during our investigations are available at \url{https://github.com/DAMO-NLP-SG/LLM-Sentiment}.

Via

Access Paper or Ask Questions

Is GPT-4 a Good Data Analyst?

May 24, 2023

Liying Cheng, Xingxuan Li, Lidong Bing

Figure 1 for Is GPT-4 a Good Data Analyst?

Figure 2 for Is GPT-4 a Good Data Analyst?

Figure 3 for Is GPT-4 a Good Data Analyst?

Figure 4 for Is GPT-4 a Good Data Analyst?

Abstract:As large language models (LLMs) have demonstrated their powerful capabilities in plenty of domains and tasks, including context understanding, code generation, language generation, data storytelling, etc., many data analysts may raise concerns if their jobs will be replaced by AI. This controversial topic has drawn a lot of attention in public. However, we are still at a stage of divergent opinions without any definitive conclusion. Motivated by this, we raise the research question of "is GPT-4 a good data analyst?" in this work and aim to answer it by conducting head-to-head comparative studies. In detail, we regard GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains. We propose a framework to tackle the problems by carefully designing the prompts for GPT-4 to conduct experiments. We also design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4. Experimental results show that GPT-4 can achieve comparable performance to humans. We also provide in-depth discussions about our results to shed light on further studies before we reach the conclusion that GPT-4 can replace data analysts.

* 11 pages, 2 figures

Via

Access Paper or Ask Questions

mPMR: A Multilingual Pre-trained Machine Reader at Scale

May 23, 2023

Weiwen Xu, Xin Li, Wai Lam, Lidong Bing

Figure 1 for mPMR: A Multilingual Pre-trained Machine Reader at Scale

Figure 2 for mPMR: A Multilingual Pre-trained Machine Reader at Scale

Figure 3 for mPMR: A Multilingual Pre-trained Machine Reader at Scale

Figure 4 for mPMR: A Multilingual Pre-trained Machine Reader at Scale

Abstract:We present multilingual Pre-trained Machine Reader (mPMR), a novel method for multilingual machine reading comprehension (MRC)-style pre-training. mPMR aims to guide multilingual pre-trained language models (mPLMs) to perform natural language understanding (NLU) including both sequence classification and span extraction in multiple languages. To achieve cross-lingual generalization when only source-language fine-tuning data is available, existing mPLMs solely transfer NLU capability from a source language to target languages. In contrast, mPMR allows the direct inheritance of multilingual NLU capability from the MRC-style pre-training to downstream tasks. Therefore, mPMR acquires better NLU capability for target languages. mPMR also provides a unified solver for tackling cross-lingual span extraction and sequence classification, thereby enabling the extraction of rationales to explain the sentence-pair classification process.

* To appear at ACL 2023 main conference

Via

Access Paper or Ask Questions

Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction

May 23, 2023

Yew Ken Chia, Hui Chen, Wei Han, Guizhen Chen, Sharifah Mahani Aljunied, Soujanya Poria, Lidong Bing

Figure 1 for Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction

Figure 2 for Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction

Figure 3 for Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction

Figure 4 for Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction

Abstract:Aspect Sentiment Triplet Extraction (ASTE) is a subtask of Aspect-Based Sentiment Analysis (ABSA) that considers each opinion term, their expressed sentiment, and the corresponding aspect targets. However, existing methods are limited to the in-domain setting with two domains. Hence, we propose a domain-expanded benchmark to address the in-domain, out-of-domain and cross-domain settings. We support the new benchmark by annotating more than 4000 data samples for two new domains based on hotel and cosmetics reviews. Our analysis of five existing methods shows that while there is a significant gap between in-domain and out-of-domain performance, generative methods have a strong potential for domain generalization. Our datasets, code implementation and models are available at https://github.com/DAMO-NLP-SG/domain-expanded-aste .

Via

Access Paper or Ask Questions

Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning

May 23, 2023

Ran Zhou, Xin Li, Lidong Bing, Erik Cambria, Chunyan Miao

Abstract:In cross-lingual named entity recognition (NER), self-training is commonly used to bridge the linguistic gap by training on pseudo-labeled target-language data. However, due to sub-optimal performance on target languages, the pseudo labels are often noisy and limit the overall performance. In this work, we aim to improve self-training for cross-lingual NER by combining representation learning and pseudo label refinement in one coherent framework. Our proposed method, namely ContProto mainly comprises two components: (1) contrastive self-training and (2) prototype-based pseudo-labeling. Our contrastive self-training facilitates span classification by separating clusters of different classes, and enhances cross-lingual transferability by producing closely-aligned representations between the source and target language. Meanwhile, prototype-based pseudo-labeling effectively improves the accuracy of pseudo labels during training. We evaluate ContProto on multiple transfer pairs, and experimental results show our method brings in substantial improvements over current state-of-the-art methods.

* Accepted by ACL2023

Via

Access Paper or Ask Questions

Are Large Language Models Good Evaluators for Abstractive Summarization?

May 22, 2023

Chenhui Shen, Liying Cheng, Yang You, Lidong Bing

Abstract:Human evaluations are often required for abstractive summary evaluations to give fairer judgments. However, they are often time-consuming, costly, inconsistent, and non-reproducible. To overcome these challenges, we explore the potential of using an out-of-the-box LLM (i.e. "gpt-3.5-turbo") for summarization evaluation without manually selecting demonstrations or complex prompt tuning. We compare different evaluation methods, including 2 methods for Likert-scale scoring and 1 method for head-to-head comparisons, to investigate the performance of the LLM as a zero-shot evaluator. We further propose a meta-correlation metric to measure the stability of the LLM's evaluation capability. With extensive experiments, we show that certain prompt formats can produce better results than others. We also bring attention to the LLM's deteriorating evaluation capability with the rising qualities of summaries. In addition, we find that the LLM's evaluation capability also depends on the evaluated dimensions. We discuss the pros and cons of each method, make recommendations, and suggest some future directions for improvement.

* 11 pages, 10 figures

Via

Access Paper or Ask Questions