In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation. This approach hinges on decomposing any given vector, matrix, or tensor into two factors. The first factor maintains a small infinity norm, while the second exhibits a similarly constrained norm when multiplied by an orthogonal matrix. Surprisingly, the entries of factors after decomposition are well-concentrated around several peaks, which allows us to efficiently replace them with corresponding centroids for quantization purposes. We study the theoretical properties of the proposed approach and rigorously evaluate our compression algorithm in the context of next-word prediction tasks and on a set of downstream tasks for text classification. Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance while ensuring data compression, marking a significant advancement in the field of data quantization.
We propose a novel deep learning architecture for three-dimensional porous media structure reconstruction from two-dimensional slices. A high-level idea is that we fit a distribution on all possible three-dimensional structures of a specific type based on the given dataset of samples. Then, given partial information (central slices) we recover the three-dimensional structure that is built around such slices. Technically, it is implemented as a deep neural network with encoder, generator and discriminator modules. Numerical experiments show that this method gives a good reconstruction in terms of Minkowski functionals.
In order to bridge the gap of more than 15m between the drilling bit and high-fidelity rock type sensors during the directional drilling, we present a novel approach for identifying rock type at the drilling bit. The approach is based on application of machine learning techniques for Measurements While Drilling (MWD) data. We demonstrate capabilities of the developed approach for distinguishing between the rock types corresponding to (1) a target oil bearing interval of a reservoir and (2) a non-productive shale layer and compare it to more traditional physics-driven approaches. The dataset includes MWD data and lithology mapping along multiple wellbores obtained by processing of Logging While Drilling (LWD) measurements from a massive drilling effort on one of the major newly developed oilfield in the North of Western Siberia. We compare various machine-learning algorithms, examine extra features coming from physical modeling of drilling mechanics, and show that the classification error can be reduced from 13.5% to 9%.