Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chenghua Lin

Observing Micromotives and Macrobehavior of Large Language Models

Dec 10, 2024

Yuyang Cheng, Xingwei Qu, Tomas Goldsack, Chenghua Lin, Chung-Chi Chen

Abstract:Thomas C. Schelling, awarded the 2005 Nobel Memorial Prize in Economic Sciences, pointed out that ``individuals decisions (micromotives), while often personal and localized, can lead to societal outcomes (macrobehavior) that are far more complex and different from what the individuals intended.'' The current research related to large language models' (LLMs') micromotives, such as preferences or biases, assumes that users will make more appropriate decisions once LLMs are devoid of preferences or biases. Consequently, a series of studies has focused on removing bias from LLMs. In the NLP community, while there are many discussions on LLMs' micromotives, previous studies have seldom conducted a systematic examination of how LLMs may influence society's macrobehavior. In this paper, we follow the design of Schelling's model of segregation to observe the relationship between the micromotives and macrobehavior of LLMs. Our results indicate that, regardless of the level of bias in LLMs, a highly segregated society will emerge as more people follow LLMs' suggestions. We hope our discussion will spark further consideration of the fundamental assumption regarding the mitigation of LLMs' micromotives and encourage a reevaluation of how LLMs may influence users and society.

Via

Access Paper or Ask Questions

Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence

Dec 01, 2024

Qianren Mao, Weifeng Jiang, Junnan Liu, Chenghua Lin, Qian Li, Xianqing Wen, Jianxin Li, Jinhu Lu

Figure 1 for Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence

Figure 2 for Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence

Figure 3 for Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence

Figure 4 for Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence

Abstract:The semi-supervised learning (SSL) strategy in lightweight models requires reducing annotated samples and facilitating cost-effective inference. However, the constraint on model parameters, imposed by the scarcity of training labels, limits the SSL performance. In this paper, we introduce PS-NET, a novel framework tailored for semi-supervised text mining with lightweight models. PS-NET incorporates online distillation to train lightweight student models by imitating the Teacher model. It also integrates an ensemble of student peers that collaboratively instruct each other. Additionally, PS-NET implements a constant adversarial perturbation schema to further self-augmentation by progressive generalizing. Our PS-NET, equipped with a 2-layer distilled BERT, exhibits notable performance enhancements over SOTA lightweight SSL frameworks of FLiText and DisCo in SSL text classification with extremely rare labelled data.

Via

Access Paper or Ask Questions

CAST: Corpus-Aware Self-similarity Enhanced Topic modelling

Oct 19, 2024

Yanan Ma, Chenghao Xiao, Chenhan Yuan, Sabine N van der Veer, Lamiece Hassan, Chenghua Lin, Goran Nenadic

Figure 1 for CAST: Corpus-Aware Self-similarity Enhanced Topic modelling

Figure 2 for CAST: Corpus-Aware Self-similarity Enhanced Topic modelling

Figure 3 for CAST: Corpus-Aware Self-similarity Enhanced Topic modelling

Figure 4 for CAST: Corpus-Aware Self-similarity Enhanced Topic modelling

Abstract:Topic modelling is a pivotal unsupervised machine learning technique for extracting valuable insights from large document collections. Existing neural topic modelling methods often encode contextual information of documents, while ignoring contextual details of candidate centroid words, leading to the inaccurate selection of topic words due to the contextualization gap. In parallel, it is found that functional words are frequently selected over topical words. To address these limitations, we introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method that builds upon candidate centroid word embeddings contextualized on the dataset, and a novel self-similarity-based method to filter out less meaningful tokens. Inspired by findings in contrastive learning that self-similarities of functional token embeddings in different contexts are much lower than topical tokens, we find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words. Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data. Experiments on news benchmark datasets and one Twitter dataset demonstrate the method's superiority in generating coherent, diverse topics, and handling noisy data, outperforming strong baselines.

Via

Access Paper or Ask Questions

Can MLLMs Understand the Deep Implication Behind Chinese Images?

Oct 17, 2024

Chenhao Zhang, Xi Feng, Yuelin Bai, Xinrun Du, Jinchang Hou, Kaixin Deng, Guangzeng Han, Qinrui Li, Bingli Wang, Jiaheng Liu(+11 more)

Figure 1 for Can MLLMs Understand the Deep Implication Behind Chinese Images?

Figure 2 for Can MLLMs Understand the Deep Implication Behind Chinese Images?

Figure 3 for Can MLLMs Understand the Deep Implication Behind Chinese Images?

Figure 4 for Can MLLMs Understand the Deep Implication Behind Chinese Images?

Abstract:As the capabilities of Multimodal Large Language Models (MLLMs) continue to improve, the need for higher-order capability evaluation of MLLMs is increasing. However, there is a lack of work evaluating MLLM for higher-order perception and understanding of Chinese visual content. To fill the gap, we introduce the **C**hinese **I**mage **I**mplication understanding **Bench**mark, **CII-Bench**, which aims to assess the higher-order perception and understanding capabilities of MLLMs for Chinese images. CII-Bench stands out in several ways compared to existing benchmarks. Firstly, to ensure the authenticity of the Chinese context, images in CII-Bench are sourced from the Chinese Internet and manually reviewed, with corresponding answers also manually crafted. Additionally, CII-Bench incorporates images that represent Chinese traditional culture, such as famous Chinese traditional paintings, which can deeply reflect the model's understanding of Chinese traditional culture. Through extensive experiments on CII-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on CII-Bench. The highest accuracy of MLLMs attains 64.4%, where as human accuracy averages 78.2%, peaking at an impressive 81.0%. Subsequently, MLLMs perform worse on Chinese traditional culture images, suggesting limitations in their ability to understand high-level semantics and lack a deep knowledge base of Chinese traditional culture. Finally, it is observed that most models exhibit enhanced accuracy when image emotion hints are incorporated into the prompts. We believe that CII-Bench will enable MLLMs to gain a better understanding of Chinese semantics and Chinese-specific images, advancing the journey towards expert artificial general intelligence (AGI). Our project is publicly available at https://cii-bench.github.io/.

* 32 pages,18 figures. Project Page: https://cii-bench.github.io/ Code: https://github.com/MING_X/CII-Bench Dataset: https://huggingface.co/datasets/m-a-p/CII-Bench

Via

Access Paper or Ask Questions

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Oct 17, 2024

Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou(+7 more)

Figure 1 for A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Figure 2 for A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Figure 3 for A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Figure 4 for A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Abstract:Enabling Large Language Models (LLMs) to handle a wider range of complex tasks (e.g., coding, math) has drawn great attention from many researchers. As LLMs continue to evolve, merely increasing the number of model parameters yields diminishing performance improvements and heavy computational costs. Recently, OpenAI's o1 model has shown that inference strategies (i.e., Test-time Compute methods) can also significantly enhance the reasoning capabilities of LLMs. However, the mechanisms behind these methods are still unexplored. In our work, to investigate the reasoning patterns of o1, we compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general reasoning benchmarks in three domains (i.e., math, coding, commonsense reasoning). Specifically, first, our experiments show that the o1 model has achieved the best performance on most datasets. Second, as for the methods of searching diverse responses (e.g., BoN), we find the reward models' capability and the search space both limit the upper boundary of these methods. Third, as for the methods that break the problem into many sub-problems, the Agent Workflow has achieved better performance than Step-wise BoN due to the domain-specific system prompt for planning better reasoning processes. Fourth, it is worth mentioning that we have summarized six reasoning patterns of o1, and provided a detailed analysis on several reasoning benchmarks.

Via

Access Paper or Ask Questions

Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference

Oct 10, 2024

William Thorne, Ambrose Robinson, Bohua Peng, Chenghua Lin, Diana Maynard

Figure 1 for Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference

Figure 2 for Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference

Figure 3 for Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference

Figure 4 for Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference

Abstract:As the cultural heritage sector increasingly adopts technologies like Retrieval-Augmented Generation (RAG) to provide more personalised search experiences and enable conversations with collections data, the demand for specialised evaluation datasets has grown. While end-to-end system testing is essential, it's equally important to assess individual components. We target the final, answering task, which is well-suited to Machine Reading Comprehension (MRC). Although existing MRC datasets address general domains, they lack the specificity needed for cultural heritage information. Unfortunately, the manual creation of such datasets is prohibitively expensive for most heritage institutions. This paper presents a cost-effective approach for generating domain-specific MRC datasets with increased difficulty using Reinforcement Learning from Human Feedback (RLHF) from synthetic preference data. Our method leverages the performance of existing question-answering models on a subset of SQuAD to create a difficulty metric, assuming that more challenging questions are answered correctly less frequently. This research contributes: (1) A methodology for increasing question difficulty using PPO and synthetic data; (2) Empirical evidence of the method's effectiveness, including human evaluation; (3) An in-depth error analysis and study of emergent phenomena; and (4) An open-source codebase and set of three llama-2-chat adapters for reproducibility and adaptation.

* is to be published in NLP4DH 2024

Via

Access Paper or Ask Questions

On the Rigour of Scientific Writing: Criteria, Analysis, and Insights

Oct 07, 2024

Joseph James, Chenghao Xiao, Yucheng Li, Chenghua Lin

Abstract:Rigour is crucial for scientific research as it ensures the reproducibility and validity of results and findings. Despite its importance, little work exists on modelling rigour computationally, and there is a lack of analysis on whether these criteria can effectively signal or measure the rigour of scientific papers in practice. In this paper, we introduce a bottom-up, data-driven framework to automatically identify and define rigour criteria and assess their relevance in scientific writing. Our framework includes rigour keyword extraction, detailed rigour definition generation, and salient criteria identification. Furthermore, our framework is domain-agnostic and can be tailored to the evaluation of scientific rigour for different areas, accommodating the distinct salient criteria across fields. We conducted comprehensive experiments based on datasets collected from two high impact venues for Machine Learning and NLP (i.e., ICLR and ACL) to demonstrate the effectiveness of our framework in modelling rigour. In addition, we analyse linguistic patterns of rigour, revealing that framing certainty is crucial for enhancing the perception of scientific rigour, while suggestion certainty and probability uncertainty diminish it.

* Accepted Findings at EMNLP 2024

Via

Access Paper or Ask Questions

From Facts to Insights: A Study on the Generation and Evaluation of Analytical Reports for Deciphering Earnings Calls

Oct 01, 2024

Tomas Goldsack, Yang Wang, Chenghua Lin, Chung-Chi Chen

Abstract:This paper explores the use of Large Language Models (LLMs) in the generation and evaluation of analytical reports derived from Earnings Calls (ECs). Addressing a current gap in research, we explore the generation of analytical reports with LLMs in a multi-agent framework, designing specialized agents that introduce diverse viewpoints and desirable topics of analysis into the report generation process. Through multiple analyses, we examine the alignment between generated and human-written reports and the impact of both individual and collective agents. Our findings suggest that the introduction of additional agents results in more insightful reports, although reports generated by human experts remain preferred in the majority of cases. Finally, we address the challenging issue of report evaluation, we examine the limitations and strengths of LLMs in assessing the quality of generated reports in different settings, revealing a significant correlation with human experts across multiple dimensions.

* Pre-print

Via

Access Paper or Ask Questions

Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking

Sep 24, 2024

Jun Bai, Zhuofan Chen, Zhenzi Li, Hanhua Hong, Jianfei Zhang, Chen Li, Chenghua Lin, Wenge Rong

Figure 1 for Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking

Figure 2 for Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking

Figure 3 for Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking

Figure 4 for Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking

Abstract:Text ranking has witnessed significant advancements, attributed to the utilization of dual-encoder enhanced by Pre-trained Language Models (PLMs). Given the proliferation of available PLMs, selecting the most effective one for a given dataset has become a non-trivial challenge. As a promising alternative to human intuition and brute-force fine-tuning, Transferability Estimation (TE) has emerged as an effective approach to model selection. However, current TE methods are primarily designed for classification tasks, and their estimated transferability may not align well with the objectives of text ranking. To address this challenge, we propose to compute the expected rank as transferability, explicitly reflecting the model's ranking capability. Furthermore, to mitigate anisotropy and incorporate training dynamics, we adaptively scale isotropic sentence embeddings to yield an accurate expected rank score. Our resulting method, Adaptive Ranking Transferability (AiRTran), can effectively capture subtle differences between models. On challenging model selection scenarios across various text ranking datasets, it demonstrates significant improvements over previous classification-oriented TE methods, human intuition, and ChatGPT with minor time consumption.

* Accepted by EMNLP 2024 main conference

Via

Access Paper or Ask Questions

SongTrans: An unified song transcription and alignment method for lyrics and notes

Sep 22, 2024

Siwei Wu, Jinzheng He, Ruibin Yuan, Haojie Wei, Xipin Wei, Chenghua Lin, Jin Xu, Junyang Lin

Figure 1 for SongTrans: An unified song transcription and alignment method for lyrics and notes

Figure 2 for SongTrans: An unified song transcription and alignment method for lyrics and notes

Figure 3 for SongTrans: An unified song transcription and alignment method for lyrics and notes

Figure 4 for SongTrans: An unified song transcription and alignment method for lyrics and notes

Abstract:The quantity of processed data is crucial for advancing the field of singing voice synthesis. While there are tools available for lyric or note transcription tasks, they all need pre-processed data which is relatively time-consuming (e.g., vocal and accompaniment separation). Besides, most of these tools are designed to address a single task and struggle with aligning lyrics and notes (i.e., identifying the corresponding notes of each word in lyrics). To address those challenges, we first design a pipeline by optimizing existing tools and annotating numerous lyric-note pairs of songs. Then, based on the annotated data, we train a unified SongTrans model that can directly transcribe lyrics and notes while aligning them simultaneously, without requiring pre-processing songs. Our SongTrans model consists of two modules: (1) the \textbf{Autoregressive module} predicts the lyrics, along with the duration and note number corresponding to each word in a lyric. (2) the \textbf{Non-autoregressive module} predicts the pitch and duration of the notes. Our experiments demonstrate that SongTrans achieves state-of-the-art (SOTA) results in both lyric and note transcription tasks. Furthermore, it is the first model capable of aligning lyrics with notes. Experimental results demonstrate that the SongTrans model can effectively adapt to different types of songs (e.g., songs with accompaniment), showcasing its versatility for real-world applications.

Via

Access Paper or Ask Questions