Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rian Dolphin

Taxonomy-Aligned Risk Extraction from 10-K Filings with Autonomous Improvement Using LLMs

Jan 21, 2026

Rian Dolphin, Joe Dursun, Jarrett Blankenship, Katie Adams, Quinton Pike

Abstract:We present a methodology for extracting structured risk factors from corporate 10-K filings while maintaining adherence to a predefined hierarchical taxonomy. Our three-stage pipeline combines LLM extraction with supporting quotes, embedding-based semantic mapping to taxonomy categories, and LLM-as-a-judge validation that filters spurious assignments. To evaluate our approach, we extract 10,688 risk factors from S&P 500 companies and examine risk profile similarity across industry clusters. Beyond extraction, we introduce autonomous taxonomy maintenance where an AI agent analyzes evaluation feedback to identify problematic categories, diagnose failure patterns, and propose refinements, achieving 104.7% improvement in embedding separation in a case study. External validation confirms the taxonomy captures economically meaningful structure: same-industry companies exhibit 63% higher risk profile similarity than cross-industry pairs (Cohen's d=1.06, AUC 0.82, p<0.001). The methodology generalizes to any domain requiring taxonomy-aligned extraction from unstructured text, with autonomous improvement enabling continuous quality maintenance and enhancement as systems process more documents.

* 4 figures, 9 pages

Via

Access Paper or Ask Questions

Contrastive Learning of Asset Embeddings from Financial Time Series

Jul 26, 2024

Rian Dolphin, Barry Smyth, Ruihai Dong

Abstract:Representation learning has emerged as a powerful paradigm for extracting valuable latent features from complex, high-dimensional data. In financial domains, learning informative representations for assets can be used for tasks like sector classification, and risk management. However, the complex and stochastic nature of financial markets poses unique challenges. We propose a novel contrastive learning framework to generate asset embeddings from financial time series data. Our approach leverages the similarity of asset returns over many subwindows to generate informative positive and negative samples, using a statistical sampling strategy based on hypothesis testing to address the noisy nature of financial data. We explore various contrastive loss functions that capture the relationships between assets in different ways to learn a discriminative representation space. Experiments on real-world datasets demonstrate the effectiveness of the learned asset embeddings on benchmark industry classification and portfolio optimization tasks. In each case our novel approaches significantly outperform existing baselines highlighting the potential for contrastive learning to capture meaningful and actionable relationships in financial data.

* 9 pages, 4 figures, 4 tables

Via

Access Paper or Ask Questions

Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach

Jul 22, 2024

Rian Dolphin, Joe Dursun, Jonathan Chow, Jarrett Blankenship, Katie Adams, Quinton Pike

Figure 1 for Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach

Figure 2 for Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach

Figure 3 for Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach

Figure 4 for Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach

Abstract:Financial news plays a crucial role in decision-making processes across the financial sector, yet the efficient processing of this information into a structured format remains challenging. This paper presents a novel approach to financial news processing that leverages Large Language Models (LLMs) to overcome limitations that previously prevented the extraction of structured data from unstructured financial news. We introduce a system that extracts relevant company tickers from raw news article content, performs sentiment analysis at the company level, and generates summaries, all without relying on pre-structured data feeds. Our methodology combines the generative capabilities of LLMs, and recent prompting techniques, with a robust validation framework that uses a tailored string similarity approach. Evaluation on a dataset of 5530 financial news articles demonstrates the effectiveness of our approach, with 90% of articles not missing any tickers compared with current data providers, and 22% of articles having additional relevant tickers. In addition to this paper, the methodology has been implemented at scale with the resulting processed data made available through a live API endpoint, which is updated in real-time with the latest news. To the best of our knowledge, we are the first data provider to offer granular, per-company sentiment analysis from news articles, enhancing the depth of information available to market participants. We also release the evaluation dataset of 5530 processed articles as a static file, which we hope will facilitate further research leveraging financial news.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Industry Classification Using a Novel Financial Time-Series Case Representation

Apr 29, 2023

Rian Dolphin, Barry Smyth, Ruihai Dong

Abstract:The financial domain has proven to be a fertile source of challenging machine learning problems across a variety of tasks including prediction, clustering, and classification. Researchers can access an abundance of time-series data and even modest performance improvements can be translated into significant additional value. In this work, we consider the use of case-based reasoning for an important task in this domain, by using historical stock returns time-series data for industry sector classification. We discuss why time-series data can present some significant representational challenges for conventional case-based reasoning approaches, and in response, we propose a novel representation based on stock returns embeddings, which can be readily calculated from raw stock returns data. We argue that this representation is well suited to case-based reasoning and evaluate our approach using a large-scale public dataset for the industry sector classification task, demonstrating substantial performance improvements over several baselines using more conventional representations.

* 15 pages

Via

Access Paper or Ask Questions

Stock Embeddings: Learning Distributed Representations for Financial Assets

Feb 14, 2022

Rian Dolphin, Barry Smyth, Ruihai Dong

Figure 1 for Stock Embeddings: Learning Distributed Representations for Financial Assets

Figure 2 for Stock Embeddings: Learning Distributed Representations for Financial Assets

Figure 3 for Stock Embeddings: Learning Distributed Representations for Financial Assets

Figure 4 for Stock Embeddings: Learning Distributed Representations for Financial Assets

Abstract:Identifying meaningful relationships between the price movements of financial assets is a challenging but important problem in a variety of financial applications. However with recent research, particularly those using machine learning and deep learning techniques, focused mostly on price forecasting, the literature investigating the modelling of asset correlations has lagged somewhat. To address this, inspired by recent successes in natural language processing, we propose a neural model for training stock embeddings, which harnesses the dynamics of historical returns data in order to learn the nuanced relationships that exist between financial assets. We describe our approach in detail and discuss a number of ways that it can be used in the financial domain. Furthermore, we present the evaluation results to demonstrate the utility of this approach, compared to several important benchmarks, in two real-world financial analytics tasks.

* Currently under review. 9 pages, 4 figures

Via

Access Paper or Ask Questions

Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities

Jul 07, 2021

Rian Dolphin, Barry Smyth, Yang Xu, Ruihai Dong

Figure 1 for Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities

Figure 2 for Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities

Figure 3 for Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities

Figure 4 for Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities

Abstract:Forecasting stock returns is a challenging problem due to the highly stochastic nature of the market and the vast array of factors and events that can influence trading volume and prices. Nevertheless it has proven to be an attractive target for machine learning research because of the potential for even modest levels of prediction accuracy to deliver significant benefits. In this paper, we describe a case-based reasoning approach to predicting stock market returns using only historical pricing data. We argue that one of the impediments for case-based stock prediction has been the lack of a suitable similarity metric when it comes to identifying similar pricing histories as the basis for a future prediction -- traditional Euclidean and correlation based approaches are not effective for a variety of reasons -- and in this regard, a key contribution of this work is the development of a novel similarity metric for comparing historical pricing data. We demonstrate the benefits of this metric and the case-based approach in a real-world application in comparison to a variety of conventional benchmarks.

* 15 pages. Accepted for presentation at the International Conference on Case-Based Reasoning 2021 (ICCBR)

Via

Access Paper or Ask Questions