In the realm of social media, understanding and predicting post reach is a significant challenge. This paper presents a Crowd Reaction AssessMent (CReAM) task designed to estimate if a given social media post will receive more reaction than another, a particularly essential task for digital marketers and content writers. We introduce the Crowd Reaction Estimation Dataset (CRED), consisting of pairs of tweets from The White House with comparative measures of retweet count. The proposed Generator-Guided Estimation Approach (GGEA) leverages generative Large Language Models (LLMs), such as ChatGPT, FLAN-UL2, and Claude, to guide classification models for making better predictions. Our results reveal that a fine-tuned FLANG-RoBERTa model, utilizing a cross-encoder architecture with tweet content and responses generated by Claude, performs optimally. We further use a T5-based paraphraser to generate paraphrases of a given post and demonstrate GGEA's ability to predict which post will elicit the most reactions. We believe this novel application of LLMs provides a significant advancement in predicting social media post reach.
In this paper, we address the hallucination problem commonly found in natural language generation tasks. Language models often generate fluent and convincing content but can lack consistency with the provided source, resulting in potential inaccuracies. We propose a new decoding method called Fidelity-Enriched Contrastive Search (FECS), which augments the contrastive search framework with context-aware regularization terms. FECS promotes tokens that are semantically similar to the provided source while penalizing repetitiveness in the generated text. We demonstrate its effectiveness across two tasks prone to hallucination: abstractive summarization and dialogue generation. Results show that FECS consistently enhances faithfulness across various language model sizes while maintaining output diversity comparable to well-performing decoding algorithms.
Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text. Notably, while contemporary encoder-decoder models excel based on the ROUGE metric, they often falter when it comes to the precise generation of numerals in headlines. We identify the lack of datasets providing fine-grained annotations for accurate numeral generation as a major roadblock. To address this, we introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation. Further, we evaluate five well-performing models from previous headline generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability. Our study reveals a need for improvement in numerical accuracy, demonstrating the potential of the NumHG dataset to drive progress in number-focused headline generation and stimulate further discussions in numeral-focused text generation.
Fake news and misinformation spread rapidly on the Internet. How to identify it and how to interpret the identification results have become important issues. In this paper, we propose a Dual Co-Attention Network (Dual-CAN) for fake news detection, which takes news content, social media replies, and external knowledge into consideration. Our experimental results support that the proposed Dual-CAN outperforms current representative models in two benchmark datasets. We further make in-depth discussions by comparing how models work in both datasets with empirical analysis of attention weights.
Making our research results positively impact on society and environment is one of the goals our community has been pursuing recently. Although financial technology (FinTech) is one of the popular application fields, we notice that there is no discussion on how NLP can help in FinTech for the social good. When mentioning FinTech for social good, people are talking about financial inclusion and green finance. However, the role of NLP in these directions only gets limited discussions. To fill this gap, this paper shares our idea of how we can use NLP in FinTech for social good. We hope readers can rethink the relationship between finance and NLP based on our sharing, and further join us in improving the financial literacy of individual investors and improving the supports for impact investment.
Opinion mining is a prevalent research issue in many domains. In the financial domain, however, it is still in the early stages. Most of the researches on this topic only focus on the coarse-grained market sentiment analysis, i.e., 2-way classification for bullish/bearish. Thanks to the recent financial technology (FinTech) development, some interdisciplinary researchers start to involve in the in-depth analysis of investors' opinions. In this position paper, we first define the financial opinions from both coarse-grained and fine-grained points of views, and then provide an overview on the issues already tackled. In addition to listing research issues of the existing topics, we further propose a road map of fine-grained financial opinion mining for future researches, and point out several challenges yet to explore. Moreover, we provide possible directions to deal with the proposed research issues.
Financial Technology (FinTech) is one of the worldwide rapidly-rising topics in the past five years according to the statistics of FinTech from Google Trends. In this position paper, we focus on the researches applying natural language processing (NLP) technologies in the finance domain. Our goal is to indicate the position we are now and provide the blueprint for future researches. We go through the application scenarios from three aspects including Know Your Customer (KYC), Know Your Product (KYP), and Satisfy Your Customer (SYC). Both formal documents and informal textual data are analyzed to understand corporate customers and personal customers. Furthermore, we talk over how to dynamically update the features of products from the prospect and the risk points of view. Finally, we discuss satisfying the customers in both B2C and C2C business models. After summarizing the past and the recent challenges, we highlight several promising future research directions in the trend of FinTech and the open finance tendency.
Numerals that contain much information in financial documents are crucial for financial decision making. They play different roles in financial analysis processes. This paper is aimed at understanding the meanings of numerals in financial tweets for fine-grained crowd-based forecasting. We propose a taxonomy that classifies the numerals in financial tweets into 7 categories, and further extend some of these categories into several subcategories. Neural network-based models with word and character-level encoders are proposed for 7-way classification and 17-way classification. We perform backtest to confirm the effectiveness of the numeric opinions made by the crowd. This work is the first attempt to understand numerals in financial social media data, and we provide the first comparison of fine-grained opinion of individual investors and analysts based on their forecast price. The numeral corpus used in our experiments, called FinNum 1.0 , is available for research purposes.