Alert button
Picture for Lefteris Loukas

Lefteris Loukas

Alert button

Breaking the Bank with ChatGPT: Few-Shot Text Classification for Finance

Aug 28, 2023
Lefteris Loukas, Ilias Stogiannidis, Prodromos Malakasiotis, Stavros Vassos

Figure 1 for Breaking the Bank with ChatGPT: Few-Shot Text Classification for Finance
Figure 2 for Breaking the Bank with ChatGPT: Few-Shot Text Classification for Finance
Figure 3 for Breaking the Bank with ChatGPT: Few-Shot Text Classification for Finance
Figure 4 for Breaking the Bank with ChatGPT: Few-Shot Text Classification for Finance

We propose the use of conversational GPT models for easy and quick few-shot text classification in the financial domain using the Banking77 dataset. Our approach involves in-context learning with GPT-3.5 and GPT-4, which minimizes the technical expertise required and eliminates the need for expensive GPU computing while yielding quick and accurate results. Additionally, we fine-tune other pre-trained, masked language models with SetFit, a recent contrastive learning technique, to achieve state-of-the-art results both in full-data and few-shot settings. Our findings show that querying GPT-3.5 and GPT-4 can outperform fine-tuned, non-generative models even with fewer examples. However, subscription fees associated with these solutions may be considered costly for small organizations. Lastly, we find that generative models perform better on the given task when shown representative samples selected by a human expert rather than when shown random ones. We conclude that a) our proposed methods offer a practical solution for few-shot tasks in datasets with limited label availability, and b) our state-of-the-art results can inspire future work in the area.

* Early pre-print; Accepted at the 5th FinNLP workshop @ IJCAI-2023 
Viaarxiv icon

Financial misstatement detection: a realistic evaluation

May 27, 2023
Elias Zavitsanos, Dimitris Mavroeidis, Konstantinos Bougiatiotis, Eirini Spyropoulou, Lefteris Loukas, Georgios Paliouras

Figure 1 for Financial misstatement detection: a realistic evaluation
Figure 2 for Financial misstatement detection: a realistic evaluation
Figure 3 for Financial misstatement detection: a realistic evaluation
Figure 4 for Financial misstatement detection: a realistic evaluation

In this work, we examine the evaluation process for the task of detecting financial reports with a high risk of containing a misstatement. This task is often referred to, in the literature, as ``misstatement detection in financial reports''. We provide an extensive review of the related literature. We propose a new, realistic evaluation framework for the task which, unlike a large part of the previous work: (a) focuses on the misstatement class and its rarity, (b) considers the dimension of time when splitting data into training and test and (c) considers the fact that misstatements can take a long time to detect. Most importantly, we show that the evaluation process significantly affects system performance, and we analyze the performance of different models and feature types in the new realistic framework.

* Proceedings of the Second ACM International Conference on AI in Finance, no 34, 2021  
* 9 pages, ICAIF2021 
Viaarxiv icon

FiNER: Financial Numeric Entity Recognition for XBRL Tagging

Mar 12, 2022
Lefteris Loukas, Manos Fergadiotis, Ilias Chalkidis, Eirini Spyropoulou, Prodromos Malakasiotis, Ion Androutsopoulos, Georgios Paliouras

Figure 1 for FiNER: Financial Numeric Entity Recognition for XBRL Tagging
Figure 2 for FiNER: Financial Numeric Entity Recognition for XBRL Tagging
Figure 3 for FiNER: Financial Numeric Entity Recognition for XBRL Tagging
Figure 4 for FiNER: Financial Numeric Entity Recognition for XBRL Tagging

Publicly traded companies are required to submit periodic reports with eXtensive Business Reporting Language (XBRL) word-level tags. Manually tagging the reports is tedious and costly. We, therefore, introduce XBRL tagging as a new entity extraction task for the financial domain and release FiNER-139, a dataset of 1.1M sentences with gold XBRL tags. Unlike typical entity extraction datasets, FiNER-139 uses a much larger label set of 139 entity types. Most annotated tokens are numeric, with the correct tag per token depending mostly on context, rather than the token itself. We show that subword fragmentation of numeric expressions harms BERT's performance, allowing word-level BILSTMs to perform better. To improve BERT's performance, we propose two simple and effective solutions that replace numeric expressions with pseudo-tokens reflecting original token shapes and numeric magnitudes. We also experiment with FIN-BERT, an existing BERT model for the financial domain, and release our own BERT (SEC-BERT), pre-trained on financial filings, which performs best. Through data and error analysis, we finally identify possible limitations to inspire future work on XBRL tagging.

* 13 pages, long paper at ACL 2022 
Viaarxiv icon

EDGAR-CORPUS: Billions of Tokens Make The World Go Round

Oct 01, 2021
Lefteris Loukas, Manos Fergadiotis, Ion Androutsopoulos, Prodromos Malakasiotis

Figure 1 for EDGAR-CORPUS: Billions of Tokens Make The World Go Round
Figure 2 for EDGAR-CORPUS: Billions of Tokens Make The World Go Round
Figure 3 for EDGAR-CORPUS: Billions of Tokens Make The World Go Round
Figure 4 for EDGAR-CORPUS: Billions of Tokens Make The World Go Round

We release EDGAR-CORPUS, a novel corpus comprising annual reports from all the publicly traded companies in the US spanning a period of more than 25 years. To the best of our knowledge, EDGAR-CORPUS is the largest financial NLP corpus available to date. All the reports are downloaded, split into their corresponding items (sections), and provided in a clean, easy-to-use JSON format. We use EDGAR-CORPUS to train and release EDGAR-W2V, which are WORD2VEC embeddings for the financial domain. We employ these embeddings in a battery of financial NLP tasks and showcase their superiority over generic GloVe embeddings and other existing financial word embeddings. We also open-source EDGAR-CRAWLER, a toolkit that facilitates downloading and extracting future annual reports.

* 6 pages, short paper at ECONLP 2021 Workshop, in conjunction with EMNLP 2021 
Viaarxiv icon

DICoE@FinSim-3: Financial Hypernym Detection using Augmented Terms and Distance-based Features

Sep 30, 2021
Lefteris Loukas, Konstantinos Bougiatiotis, Manos Fergadiotis, Dimitris Mavroeidis, Elias Zavitsanos

Figure 1 for DICoE@FinSim-3: Financial Hypernym Detection using Augmented Terms and Distance-based Features
Figure 2 for DICoE@FinSim-3: Financial Hypernym Detection using Augmented Terms and Distance-based Features
Figure 3 for DICoE@FinSim-3: Financial Hypernym Detection using Augmented Terms and Distance-based Features
Figure 4 for DICoE@FinSim-3: Financial Hypernym Detection using Augmented Terms and Distance-based Features

We present the submission of team DICoE for FinSim-3, the 3rd Shared Task on Learning Semantic Similarities for the Financial Domain. The task provides a set of terms in the financial domain and requires to classify them into the most relevant hypernym from a financial ontology. After augmenting the terms with their Investopedia definitions, our system employs a Logistic Regression classifier over financial word embeddings and a mix of hand-crafted and distance-based features. Also, for the first time in this task, we employ different replacement methods for out-of-vocabulary terms, leading to improved performance. Finally, we have also experimented with word representations generated from various financial corpora. Our best-performing submission ranked 4th on the task's leaderboard.

* 6 pages, Proceedings of the Third Workshop on Financial Technology and Natural Language Processing (FinNLP@IJCAI-2021) 
Viaarxiv icon