Alert button
Picture for Lay-Ki Soon

Lay-Ki Soon

Alert button

SocialDial: A Benchmark for Socially-Aware Dialogue Systems

Apr 24, 2023
Haolan Zhan, Zhuang Li, Yufei Wang, Linhao Luo, Tao Feng, Xiaoxi Kang, Yuncheng Hua, Lizhen Qu, Lay-Ki Soon, Suraj Sharma, Ingrid Zukerman, Zhaleh Semnani-Azad, Gholamreza Haffari

Figure 1 for SocialDial: A Benchmark for Socially-Aware Dialogue Systems
Figure 2 for SocialDial: A Benchmark for Socially-Aware Dialogue Systems
Figure 3 for SocialDial: A Benchmark for Socially-Aware Dialogue Systems
Figure 4 for SocialDial: A Benchmark for Socially-Aware Dialogue Systems

Dialogue systems have been widely applied in many scenarios and are now more powerful and ubiquitous than ever before. With large neural models and massive available data, current dialogue systems have access to more knowledge than any people in their life. However, current dialogue systems still do not perform at a human level. One major gap between conversational agents and humans lies in their abilities to be aware of social norms. The development of socially-aware dialogue systems is impeded due to the lack of resources. In this paper, we present the first socially-aware dialogue corpus - SocialDial, based on Chinese social culture. SocialDial consists of two parts: 1,563 multi-turn dialogues between two human speakers with fine-grained labels, and 4,870 synthetic conversations generated by ChatGPT. The human corpus covers five categories of social norms, which have 14 sub-categories in total. Specifically, it contains social factor annotations including social relation, context, social distance, and social norms. However, collecting sufficient socially-aware dialogues is costly. Thus, we harness the power of ChatGPT and devise an ontology-based synthetic data generation framework. This framework is able to generate synthetic data at scale. To ensure the quality of synthetic dialogues, we design several mechanisms for quality control during data collection. Finally, we evaluate our dataset using several pre-trained models, such as BERT and RoBERTa. Comprehensive empirical results based on state-of-the-art neural models demonstrate that modeling of social norms for dialogue systems is a promising research direction. To the best of our knowledge, SocialDial is the first socially-aware dialogue dataset that covers multiple social factors and has fine-grained labels.

* Accepted by SIGIR 2023 
Viaarxiv icon

Crude Oil-related Events Extraction and Processing: A Transfer Learning Approach

May 01, 2022
Meisin Lee, Lay-Ki Soon, Eu-Gene Siew

Figure 1 for Crude Oil-related Events Extraction and Processing: A Transfer Learning Approach
Figure 2 for Crude Oil-related Events Extraction and Processing: A Transfer Learning Approach
Figure 3 for Crude Oil-related Events Extraction and Processing: A Transfer Learning Approach
Figure 4 for Crude Oil-related Events Extraction and Processing: A Transfer Learning Approach

One of the challenges in event extraction via traditional supervised learning paradigm is the need for a sizeable annotated dataset to achieve satisfactory model performance. It is even more challenging when it comes to event extraction in the finance and economics domain, a domain with considerably fewer resources. This paper presents a complete framework for extracting and processing crude oil-related events found in CrudeOilNews corpus, addressing the issue of annotation scarcity and class imbalance by leveraging on the effectiveness of transfer learning. Apart from event extraction, we place special emphasis on event properties (Polarity, Modality, and Intensity) classification to determine the factual certainty of each event. We build baseline models first by supervised learning and then exploit Transfer Learning methods to boost event extraction model performance despite the limited amount of annotated data and severe class imbalance. This is done via methods within the transfer learning framework such as Domain Adaptive Pre-training, Multi-task Learning and Sequential Transfer Learning. Based on experiment results, we are able to improve all event extraction sub-task models both in F1 and MCC1-score as compared to baseline models trained via the standard supervised learning. Accurate and holistic event extraction from crude oil news is very useful for downstream tasks such as understanding event chains and learning event-event relations, which can be used for other downstream tasks such as commodity price prediction, summarisation, etc. to support a wide range of business decision making.

Viaarxiv icon

CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction

Apr 08, 2022
Meisin Lee, Lay-Ki Soon, Eu-Gene Siew, Ly Fie Sugianto

Figure 1 for CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction
Figure 2 for CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction
Figure 3 for CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction
Figure 4 for CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction

In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news articles were manually annotated, of which a subset of 25 news were used as the adjudicated reference test set for inter-annotator and system evaluation. Agreement was generally substantial and annotator performance was adequate, indicating that the annotation scheme produces consistent event annotations of high quality. Subsequently the dataset is expanded through (1) data augmentation and (2) Human-in-the-loop active learning. The resulting corpus has 425 news articles with approximately 11k events annotated. As part of active learning process, the corpus was used to train basic event extraction models for machine labeling, the resulting models also serve as a validation or as a pilot study demonstrating the use of the corpus in machine learning purposes. The annotated corpus is made available for academic research purpose at https://github.com/meisin/CrudeOilNews-Corpus.

* Accepted at LREC 2022. arXiv admin note: text overlap with arXiv:2105.08214 
Viaarxiv icon

Effective Use of Graph Convolution Network and Contextual Sub-Tree forCommodity News Event Extraction

Sep 27, 2021
Meisin Lee, Lay-Ki Soon, Eu-Gene Siew

Figure 1 for Effective Use of Graph Convolution Network and Contextual Sub-Tree forCommodity News Event Extraction
Figure 2 for Effective Use of Graph Convolution Network and Contextual Sub-Tree forCommodity News Event Extraction
Figure 3 for Effective Use of Graph Convolution Network and Contextual Sub-Tree forCommodity News Event Extraction
Figure 4 for Effective Use of Graph Convolution Network and Contextual Sub-Tree forCommodity News Event Extraction

Event extraction in commodity news is a less researched area as compared to generic event extraction. However, accurate event extraction from commodity news is useful in abroad range of applications such as under-standing event chains and learning event-event relations, which can then be used for commodity price prediction. The events found in commodity news exhibit characteristics different from generic events, hence posing a unique challenge in event extraction using existing methods. This paper proposes an effective use of Graph Convolutional Networks(GCN) with a pruned dependency parse tree, termed contextual sub-tree, for better event ex-traction in commodity news. The event ex-traction model is trained using feature embed-dings from ComBERT, a BERT-based masked language model that was produced through domain-adaptive pre-training on a commodity news corpus. Experimental results show the efficiency of the proposed solution, which out-performs existing methods with F1 scores as high as 0.90. Furthermore, our pre-trained language model outperforms GloVe by 23%, and BERT and RoBERTa by 7% in terms of argument roles classification. For the goal of re-producibility, the code and trained models are made publicly available1.

* Accepted in ECONLP workshop at EMNLP 2021 
Viaarxiv icon

The Commodities News Corpus: A Resource forUnderstanding Commodity News Better

May 23, 2021
Meisin Lee, Lay-Ki Soon, Eu-Gene Siew, Ly Fie Sugianto

Figure 1 for The Commodities News Corpus: A Resource forUnderstanding Commodity News Better
Figure 2 for The Commodities News Corpus: A Resource forUnderstanding Commodity News Better
Figure 3 for The Commodities News Corpus: A Resource forUnderstanding Commodity News Better
Figure 4 for The Commodities News Corpus: A Resource forUnderstanding Commodity News Better

Commodity News contains a wealth of information such as sum-mary of the recent commodity price movement and notable events that led tothe movement. Through event extraction, useful information extracted fromcommodity news is extremely useful in mining for causal relation betweenevents and commodity price movement, which can be used for commodity priceprediction. To facilitate the future research, we introduce a new dataset withthe following information identified and annotated: (i) entities (both nomi-nal and named), (ii) events (trigger words and argument roles), (iii) eventmetadata: modality, polarity and intensity and (iv) event-event relations.

* Submitted to journal, currently under review 
Viaarxiv icon