Alert button
Picture for Chenghua Lin

Chenghua Lin

Alert button

The Secret of Metaphor on Expressing Stronger Emotion

Jan 30, 2023
Yucheng Li, Frank Guerin, Chenghua Lin

Figure 1 for The Secret of Metaphor on Expressing Stronger Emotion
Figure 2 for The Secret of Metaphor on Expressing Stronger Emotion
Figure 3 for The Secret of Metaphor on Expressing Stronger Emotion
Figure 4 for The Secret of Metaphor on Expressing Stronger Emotion

Metaphors are proven to have stronger emotional impact than literal expressions. Although this conclusion is shown to be promising in benefiting various NLP applications, the reasons behind this phenomenon are not well studied. This paper conducts the first study in exploring how metaphors convey stronger emotion than their literal counterparts. We find that metaphors are generally more specific than literal expressions. The more specific property of metaphor can be one of the reasons for metaphors' superiority in emotion expression. When we compare metaphors with literal expressions with the same specificity level, the gap of emotion expressing ability between both reduces significantly. In addition, we observe specificity is crucial in literal language as well, as literal language can express stronger emotion by making it more specific.

* FigLang@EMNLP2022 
Viaarxiv icon

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

Jan 01, 2023
Ge Zhang, Yizhi Li, Yaoyao Wu, Linyuan Zhang, Chenghua Lin, Jiayi Geng, Shi Wang, Jie Fu

Figure 1 for CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation
Figure 2 for CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation
Figure 3 for CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation
Figure 4 for CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.

Viaarxiv icon

Routine Outcome Monitoring in Psychotherapy Treatment using Sentiment-Topic Modelling Approach

Dec 08, 2022
Noor Fazilla Abd Yusof, Chenghua Lin

Figure 1 for Routine Outcome Monitoring in Psychotherapy Treatment using Sentiment-Topic Modelling Approach
Figure 2 for Routine Outcome Monitoring in Psychotherapy Treatment using Sentiment-Topic Modelling Approach
Figure 3 for Routine Outcome Monitoring in Psychotherapy Treatment using Sentiment-Topic Modelling Approach
Figure 4 for Routine Outcome Monitoring in Psychotherapy Treatment using Sentiment-Topic Modelling Approach

Despite the importance of emphasizing the right psychotherapy treatment for an individual patient, assessing the outcome of the therapy session is equally crucial. Evidence showed that continuous monitoring patient's progress can significantly improve the therapy outcomes to an expected change. By monitoring the outcome, the patient's progress can be tracked closely to help clinicians identify patients who are not progressing in the treatment. These monitoring can help the clinician to consider any necessary actions for the patient's treatment as early as possible, e.g., recommend different types of treatment, or adjust the style of approach. Currently, the evaluation system is based on the clinical-rated and self-report questionnaires that measure patients' progress pre- and post-treatment. While outcome monitoring tends to improve the therapy outcomes, however, there are many challenges in the current method, e.g. time and financial burden for administering questionnaires, scoring and analysing the results. Therefore, a computational method for measuring and monitoring patient progress over the course of treatment is needed, in order to enhance the likelihood of positive treatment outcome. Moreover, this computational method could potentially lead to an inexpensive monitoring tool to evaluate patients' progress in clinical care that could be administered by a wider range of health-care professionals.

Viaarxiv icon

MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning

Dec 05, 2022
Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Chenghua Lin, Xingran Chen, Anton Ragni, Hanzhi Yin, Zhijie Hu, Haoyu He, Emmanouil Benetos, Norbert Gyenge, Ruibo Liu, Jie Fu

Figure 1 for MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning
Figure 2 for MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning

The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL). However, it still remains unexplored how to build a framework for learning useful representations of raw music waveforms in a self-supervised manner. In this work, we design Music2Vec, a framework exploring different SSL algorithmic components and tricks for music audio recordings. Our model achieves comparable results to the state-of-the-art (SOTA) music SSL model Jukebox, despite being significantly smaller with less than 2% of parameters of the latter. The model will be released on Huggingface(Please refer to: https://huggingface.co/m-a-p/music2vec-v1)

Viaarxiv icon

HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models

Nov 05, 2022
Yizhi Li, Ge Zhang, Bohao Yang, Chenghua Lin, Shi Wang, Anton Ragni, Jie Fu

Figure 1 for HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models
Figure 2 for HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models
Figure 3 for HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models
Figure 4 for HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models

Fairness has become a trending topic in natural language processing (NLP), which addresses biases targeting certain social groups such as genders and religions. However, regional bias in language models (LMs), a long-standing global discrimination problem, still remains unexplored. This paper bridges the gap by analysing the regional bias learned by the pre-trained language models that are broadly used in NLP tasks. In addition to verifying the existence of regional bias in LMs, we find that the biases on regional groups can be strongly influenced by the geographical clustering of the groups. We accordingly propose a HiErarchical Regional Bias evaluation method (HERB) utilising the information from the sub-region clusters to quantify the bias in pre-trained LMs. Experiments show that our hierarchical metric can effectively evaluate the regional bias with respect to comprehensive topics and measure the potential regional bias that can be propagated to downstream tasks. Our codes are available at https://github.com/Bernard-Yang/HERB.

* Accepted at AACL 2022 as Long Findings 
Viaarxiv icon

Improving Variational Autoencoders with Density Gap-based Regularization

Nov 01, 2022
Jianfei Zhang, Jun Bai, Chenghua Lin, Yanmeng Wang, Wenge Rong

Figure 1 for Improving Variational Autoencoders with Density Gap-based Regularization
Figure 2 for Improving Variational Autoencoders with Density Gap-based Regularization
Figure 3 for Improving Variational Autoencoders with Density Gap-based Regularization
Figure 4 for Improving Variational Autoencoders with Density Gap-based Regularization

Variational autoencoders (VAEs) are one of the powerful unsupervised learning frameworks in NLP for latent representation learning and latent-directed generation. The classic optimization goal of VAEs is to maximize the Evidence Lower Bound (ELBo), which consists of a conditional likelihood for generation and a negative Kullback-Leibler (KL) divergence for regularization. In practice, optimizing ELBo often leads the posterior distribution of all samples converge to the same degenerated local optimum, namely posterior collapse or KL vanishing. There are effective ways proposed to prevent posterior collapse in VAEs, but we observe that they in essence make trade-offs between posterior collapse and hole problem, i.e., mismatch between the aggregated posterior distribution and the prior distribution. To this end, we introduce new training objectives to tackle both two problems through a novel regularization based on the probabilistic density gap between the aggregated posterior distribution and the prior distribution. Through experiments on language modeling, latent space visualization and interpolation, we show that our proposed method can solve both problems effectively and thus outperforms the existing methods in latent-directed generation. To the best of our knowledge, we are the first to jointly solve the hole problem and the posterior collapse.

* Accepted to NeurIPS 2022 
Viaarxiv icon

Terminology-aware Medical Dialogue Generation

Oct 27, 2022
Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

Figure 1 for Terminology-aware Medical Dialogue Generation
Figure 2 for Terminology-aware Medical Dialogue Generation
Figure 3 for Terminology-aware Medical Dialogue Generation
Figure 4 for Terminology-aware Medical Dialogue Generation

Medical dialogue generation aims to generate responses according to a history of dialogue turns between doctors and patients. Unlike open-domain dialogue generation, this requires background knowledge specific to the medical domain. Existing generative frameworks for medical dialogue generation fall short of incorporating domain-specific knowledge, especially with regard to medical terminology. In this paper, we propose a novel framework to improve medical dialogue generation by considering features centered on domain-specific terminology. We leverage an attention mechanism to incorporate terminologically centred features, and fill in the semantic gap between medical background knowledge and common utterances by enforcing language models to learn terminology representations with an auxiliary terminology recognition task. Experimental results demonstrate the effectiveness of our approach, in which our proposed framework outperforms SOTA language models. Additionally, we provide a new dataset with medical terminology annotations to support the research on medical dialogue generation. Our dataset and code are available at https://github.com/tangg555/meddialog.

* Submitted to ICASSP 2023 
Viaarxiv icon

Improving Chinese Story Generation via Awareness of Syntactic Dependencies and Semantics

Oct 19, 2022
Henglin Huang, Chen Tang, Tyler Loakman, Frank Guerin, Chenghua Lin

Figure 1 for Improving Chinese Story Generation via Awareness of Syntactic Dependencies and Semantics
Figure 2 for Improving Chinese Story Generation via Awareness of Syntactic Dependencies and Semantics
Figure 3 for Improving Chinese Story Generation via Awareness of Syntactic Dependencies and Semantics
Figure 4 for Improving Chinese Story Generation via Awareness of Syntactic Dependencies and Semantics

Story generation aims to generate a long narrative conditioned on a given input. In spite of the success of prior works with the application of pre-trained models, current neural models for Chinese stories still struggle to generate high-quality long text narratives. We hypothesise that this stems from ambiguity in syntactically parsing the Chinese language, which does not have explicit delimiters for word segmentation. Consequently, neural models suffer from the inefficient capturing of features in Chinese narratives. In this paper, we present a new generation framework that enhances the feature capturing mechanism by informing the generation model of dependencies between words and additionally augmenting the semantic representation learning through synonym denoising training. We conduct a range of experiments, and the results demonstrate that our framework outperforms the state-of-the-art Chinese generation models on all evaluation metrics, demonstrating the benefits of enhanced dependency and semantic representation learning.

* AACL 2022  
Viaarxiv icon

NGEP: A Graph-based Event Planning Framework for Story Generation

Oct 19, 2022
Chen Tang, Zhihao Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

Figure 1 for NGEP: A Graph-based Event Planning Framework for Story Generation
Figure 2 for NGEP: A Graph-based Event Planning Framework for Story Generation
Figure 3 for NGEP: A Graph-based Event Planning Framework for Story Generation
Figure 4 for NGEP: A Graph-based Event Planning Framework for Story Generation

To improve the performance of long text generation, recent studies have leveraged automatically planned event structures (i.e. storylines) to guide story generation. Such prior works mostly employ end-to-end neural generation models to predict event sequences for a story. However, such generation models struggle to guarantee the narrative coherence of separate events due to the hallucination problem, and additionally the generated event sequences are often hard to control due to the end-to-end nature of the models. To address these challenges, we propose NGEP, an novel event planning framework which generates an event sequence by performing inference on an automatically constructed event graph and enhances generalisation ability through a neural event advisor. We conduct a range of experiments on multiple criteria, and the results demonstrate that our graph-based neural framework outperforms the state-of-the-art (SOTA) event planning approaches, considering both the performance of event sequence generation and the effectiveness on the downstream task of story generation.

* AACL 2022  
Viaarxiv icon

Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature

Oct 18, 2022
Tomas Goldsack, Zhihao Zhang, Chenghua Lin, Carolina Scarton

Figure 1 for Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature
Figure 2 for Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature
Figure 3 for Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature
Figure 4 for Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature

Lay summarisation aims to jointly summarise and simplify a given text, thus making its content more comprehensible to non-experts. Automatic approaches for lay summarisation can provide significant value in broadening access to scientific literature, enabling a greater degree of both interdisciplinary knowledge sharing and public understanding when it comes to research findings. However, current corpora for this task are limited in their size and scope, hindering the development of broadly applicable data-driven approaches. Aiming to rectify these issues, we present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale), each of which contains biomedical journal articles alongside expert-written lay summaries. We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets that can be leveraged to support the needs of different applications. Finally, we benchmark our datasets using mainstream summarisation approaches and perform a manual evaluation with domain experts, demonstrating their utility and casting light on the key challenges of this task.

* 16 pages, 9 figures. Accepted to EMNLP 2022 
Viaarxiv icon