Recommender systems play a crucial role in helping users discover information that aligns with their interests based on their past behaviors. However, developing personalized recommendation systems becomes challenging when historical records of user-item interactions are unavailable, leading to what is known as the system cold-start recommendation problem. This issue is particularly prominent in start-up businesses or platforms with insufficient user engagement history. Previous studies focus on user or item cold-start scenarios, where systems could make recommendations for new users or items but are still trained with historical user-item interactions in the same domain, which cannot solve our problem. To bridge the gap, our research introduces an innovative and effective approach, capitalizing on the capabilities of pre-trained language models. We transform the recommendation process into sentiment analysis of natural languages containing information of user profiles and item attributes, where the sentiment polarity is predicted with prompt learning. By harnessing the extensive knowledge housed within language models, the prediction can be made without historical user-item interaction records. A benchmark is also introduced to evaluate the proposed method under the cold-start setting, and the results demonstrate the effectiveness of our method. To the best of our knowledge, this is the first study to tackle the system cold-start recommendation problem. The benchmark and implementation of the method are available at https://github.com/JacksonWuxs/PromptRec.
On December 7, 2020, Ghanaians participated in the polls to determine their president for the next four years. To gain insights from this presidential election, we conducted stance analysis (which is not always equivalent to sentiment analysis) to understand how Twitter, a popular social media platform, reflected the opinions of its users regarding the two main presidential candidates. We collected a total of 99,356 tweets using the Twitter API (Tweepy) and manually annotated 3,090 tweets into three classes: Against, Neutral, and Support. We then performed preprocessing on the tweets. The resulting dataset was evaluated using two lexicon-based approaches, VADER and TextBlob, as well as five supervised machine learning-based approaches: Support Vector Machine (SVM), Logistic Regression (LR), Multinomial Na\"ive Bayes (MNB), Stochastic Gradient Descent (SGD), and Random Forest (RF), based on metrics such as accuracy, precision, recall, and F1-score. The best performance was achieved by Logistic Regression with an accuracy of 71.13%. We utilized Logistic Regression to classify all the extracted tweets and subsequently conducted an analysis and discussion of the results. For access to our data and code, please visit: https://github.com/ShesterG/Stance-Detection-Ghana-2020-Elections.git
Semantic role labeling (SRL) is the process of detecting the predicate-argument structure of each predicate in a sentence. SRL plays a crucial role as a pre-processing step in many NLP applications such as topic and concept extraction, question answering, summarization, machine translation, sentiment analysis, and text mining. Recently, in many languages, unified SRL dragged lots of attention due to its outstanding performance, which is the result of overcoming the error propagation problem. However, regarding the Persian language, all previous works have focused on traditional methods of SRL leading to a drop in accuracy and imposing expensive feature extraction steps in terms of financial resources, time and energy consumption. In this work, we present an end-to-end SRL method that not only eliminates the need for feature extraction but also outperforms existing methods in facing new samples in practical situations. The proposed method does not employ any auxiliary features and shows more than 16 (83.16) percent improvement in accuracy against previous methods in similar circumstances.
Sentiment analysis is a sub-discipline in the field of natural language processing and computational linguistics and can be used for automated or semi-automated analyses of text documents. One of the aims of these analyses is to recognize an expressed attitude as positive or negative as it can be contained in comments on social media platforms or political documents and speeches as well as fictional and nonfictional texts. Regarding analyses of comments on social media platforms, this is an extension of the previous tutorial on semi-automated screenings of social media network data. A longitudinal perspective regarding social media comments as well as cross-sectional perspectives regarding fictional and nonfictional texts, e.g. entire books and libraries, can lead to extensive text documents. Their analyses can be simplified and accelerated by using sentiment analysis with acceptable inter-rater reliability. Therefore, this tutorial introduces the basic functions for performing a sentiment analysis with R and explains how text documents can be analysed step by step - regardless of their underlying formatting. All prerequisites and steps are described in detail and associated codes are available on GitHub. A comparison of two political speeches illustrates a possible use case.
We examine the potential of ChatGPT, and other large language models, in predicting stock market returns using sentiment analysis of news headlines. We use ChatGPT to indicate whether a given headline is good, bad, or irrelevant news for firms' stock prices. We then compute a numerical score and document a positive correlation between these ``ChatGPT scores'' and subsequent daily stock market returns. Further, ChatGPT outperforms traditional sentiment analysis methods. We find that more basic models such as GPT-1, GPT-2, and BERT cannot accurately forecast returns, indicating return predictability is an emerging capacity of complex models. Our results suggest that incorporating advanced language models into the investment decision-making process can yield more accurate predictions and enhance the performance of quantitative trading strategies.
In the field of car evaluation, more and more netizens choose to express their opinions on the Internet platform, and these comments will affect the decision-making of buyers and the trend of car word-of-mouth. As an important branch of natural language processing (NLP), sentiment analysis provides an effective research method for analyzing the sentiment types of massive car review texts. However, due to the lexical professionalism and large text noise of review texts in the automotive field, when a general sentiment analysis model is applied to car reviews, the accuracy of the model will be poor. To overcome these above challenges, we aim at the sentiment analysis task of car review texts. From the perspective of word vectors, pre-training is carried out by means of whole word mask of proprietary vocabulary in the automotive field, and then training data is carried out through the strategy of an adversarial training set. Based on this, we propose a car review text sentiment analysis model based on adversarial training and whole word mask BERT(ATWWM-BERT).
As an extensive research in the field of Natural language processing (NLP), aspect-based sentiment analysis (ABSA) is the task of predicting the sentiment expressed in a text relative to the corresponding aspect. Unfortunately, most languages lack of sufficient annotation resources, thus more and more recent researchers focus on cross-lingual aspect-based sentiment analysis (XABSA). However, most recent researches only concentrate on cross-lingual data alignment instead of model alignment. To this end, we propose a novel framework, CL-XABSA: Contrastive Learning for Cross-lingual Aspect-Based Sentiment Analysis. Specifically, we design two contrastive strategies, token level contrastive learning of token embeddings (TL-CTE) and sentiment level contrastive learning of token embeddings (SL-CTE), to regularize the semantic space of source and target language to be more uniform. Since our framework can receive datasets in multiple languages during training, our framework can be adapted not only for XABSA task, but also for multilingual aspect-based sentiment analysis (MABSA). To further improve the performance of our model, we perform knowledge distillation technology leveraging data from unlabeled target language. In the distillation XABSA task, we further explore the comparative effectiveness of different data (source dataset, translated dataset, and code-switched dataset). The results demonstrate that the proposed method has a certain improvement in the three tasks of XABSA, distillation XABSA and MABSA. For reproducibility, our code for this paper is available at https://github.com/GKLMIP/CL-XABSA.
The exceptional performance of pre-trained large language models has revolutionised various applications, but their adoption in production environments is hindered by prohibitive costs and inefficiencies, particularly when utilising long prompts. This paper proposes OverPrompt, an in-context learning method aimed at improving LLM efficiency and performance by processing multiple inputs in parallel. Evaluated across diverse datasets, OverPrompt enhances task efficiency and integrates a diverse range of examples for improved performance. Particularly, it amplifies fact-checking and sentiment analysis tasks when supplemented with contextual information. Synthetic data grouping further enhances performance, suggesting a viable approach for data augmentation.
In recent years, personality has been regarded as a valuable personal factor being incorporated into numerous tasks such as sentiment analysis and product recommendation. This has led to widespread attention to text-based personality recognition task, which aims to identify an individual's personality based on given text. Considering that ChatGPT has recently exhibited remarkable abilities on various natural language processing tasks, we provide a preliminary evaluation of ChatGPT on text-based personality recognition task for generating effective personality data. Concretely, we employ a variety of prompting strategies to explore ChatGPT's ability in recognizing personality from given text, especially the level-oriented prompting strategy we designed for guiding ChatGPT in analyzing given text at a specified level. We compare the performance of ChatGPT on two representative real-world datasets with traditional neural network, fine-tuned RoBERTa, and corresponding state-of-the-art task-specific model. The experimental results show that ChatGPT with zero-shot chain-of-thought prompting exhibits impressive personality recognition ability. Triggered by zero-shot chain-of-thought prompting, ChatGPT outperforms fine-tuned RoBERTa on the two datasets and is capable to provide natural language explanations through text-based logical reasoning. Furthermore, relative to zero-shot chain-of-thought prompting, zero-shot level-oriented chain-of-thought prompting enhances the personality prediction ability of ChatGPT and reduces the performance gap between ChatGPT and corresponding state-of-the-art task-specific model. Besides, we also conduct experiments to observe the fairness of ChatGPT when identifying personality and discover that ChatGPT shows unfairness to some sensitive demographic attributes such as gender and age.
M-SENA is an open-sourced platform for Multimodal Sentiment Analysis. It aims to facilitate advanced research by providing flexible toolkits, reliable benchmarks, and intuitive demonstrations. The platform features a fully modular video sentiment analysis framework consisting of data management, feature extraction, model training, and result analysis modules. In this paper, we first illustrate the overall architecture of the M-SENA platform and then introduce features of the core modules. Reliable baseline results of different modality features and MSA benchmarks are also reported. Moreover, we use model evaluation and analysis tools provided by M-SENA to present intermediate representation visualization, on-the-fly instance test, and generalization ability test results. The source code of the platform is publicly available at https://github.com/thuiar/M-SENA.