Effective processing of financial transactions is essential for banking data analysis. However, in this domain, most methods focus on specialized solutions to stand-alone problems instead of constructing universal representations suitable for many problems. We present a representation learning framework that addresses diverse business challenges. We also suggest novel generative models that account for data specifics, and a way to integrate external information into a client's representation, leveraging insights from other customers' actions. Finally, we offer a benchmark, describing representation quality globally, concerning the entire transaction history; locally, reflecting the client's current state; and dynamically, capturing representation evolution over time. Our generative approach demonstrates superior performance in local tasks, with an increase in ROC-AUC of up to 14\% for the next MCC prediction task and up to 46\% for downstream tasks from existing contrastive baselines. Incorporating external information improves the scores by an additional 20\%.
The change point is a moment of an abrupt alteration in the data distribution. Current methods for change point detection are based on recurrent neural methods suitable for sequential data. However, recent works show that transformers based on attention mechanisms perform better than standard recurrent models for many tasks. The most benefit is noticeable in the case of longer sequences. In this paper, we investigate different attentions for the change point detection task and proposed specific form of attention related to the task at hand. We show that using a special form of attention outperforms state-of-the-art results.
A change points detection aims to catch an abrupt disorder in data distribution. Common approaches assume that there are only two fixed distributions for data: one before and another after a change point. Real-world data are richer than this assumption. There can be multiple different distributions before and after a change. We propose an approach that works in the multiple-distributions scenario. Our approach learn representations for semi-structured data suitable for change point detection, while a common classifiers-based approach fails. Moreover, our model is more robust, when predicting change points. The datasets used for benchmarking are sequences of images with and without change points in them.
One of the first steps during the investigation of geological objects is the interwell correlation. It provides information on the structure of the objects under study, as it comprises the framework for constructing geological models and assessing hydrocarbon reserves. Today, the detailed interwell correlation relies on manual analysis of well-logging data. Thus, it is time-consuming and of a subjective nature. The essence of the interwell correlation constitutes an assessment of the similarities between geological profiles. There were many attempts to automate the process of interwell correlation by means of rule-based approaches, classic machine learning approaches, and deep learning approaches in the past. However, most approaches are of limited usage and inherent subjectivity of experts. We propose a novel framework to solve the geological profile similarity estimation based on a deep learning model. Our similarity model takes well-logging data as input and provides the similarity of wells as output. The developed framework enables (1) extracting patterns and essential characteristics of geological profiles within the wells and (2) model training following the unsupervised paradigm without the need for manual analysis and interpretation of well-logging data. For model testing, we used two open datasets originating in New Zealand and Norway. Our data-based similarity models provide high performance: the accuracy of our model is $0.926$ compared to $0.787$ for baselines based on the popular gradient boosting approach. With them, an oil\&gas practitioner can improve interwell correlation quality and reduce operation time.
Change points are abrupt alterations in the distribution of sequential data. A change-point detection (CPD) model aims at quick detection of such changes. Classic approaches perform poorly for semi-structured sequential data because of the absence of adequate data representation learning. To deal with it, we introduce a principled differentiable loss function that considers the specificity of the CPD task. The theoretical results suggest that this function approximates well classic rigorous solutions. For such loss function, we propose an end-to-end method for the training of deep representation learning CPD models. Our experiments provide evidence that the proposed approach improves baseline results of change point detection for various data types, including real-world videos and image sequences, and improve representations for them.