Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Igor Rivin

Probing Structural Mathematical Reasoning in Language Models with Algebraic Trapdoors

May 05, 2026

Igor Rivin

Abstract:We introduce a benchmark suite for evaluating structural mathematical reasoning in language models, built on subgroup-construction problems in SL(3, Z) with cryptographic-style verifier-prover asymmetry. Each instance presents a finitely generated subgroup as a list of integer matrices and asks for an arithmetic invariant -- index, surjection-at-prime, or membership -- that the construction-time information (N, K) pins down in O(1) closed form, but that the solver, lacking that information, must derive by either Aschbacher-classification analysis or by a membership query in SL(3, Z) of unknown decidability. The benchmark therefore distinguishes models with internalized algebraic priors (Aschbacher classes, McLaughlin's theorem, Property (T), the congruence subgroup property) from models that rely on general-purpose computation. We report empirical results across five representative reasoning traces from two state-of-the-art models. The headline result: on the index variant, one model spent 152 minutes of reasoning, explicitly identified the kernel-side membership question as the bottleneck, attempted constructive verification, and abstained with "DON'T KNOW" rather than commit to its computed cokernel candidate -- demonstrating calibrated meta-cognition on the open-decidability boundary that the benchmark was designed to probe. We argue that the benchmark exposes a four-way classification of model behavior (commit-correct, commit-wrong, abstain-correct, abstain-wrong) that standard answer-key scoring conflates.

Via

Access Paper or Ask Questions

DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale

Apr 28, 2026

Alexander Kolpakov, Igor Rivin

Abstract:Dimensionality reduction methods such as UMAP and t-SNE are central tools for visualising high-dimensional data, but their local-neighborhood objectives can preserve sampling noise while distorting global topology. We show that standard local metrics reward this noise memorisation: top-performing embeddings invent cycles and disconnected islands absent from the data. We introduce a topology-faithfulness benchmark based on noisy manifolds with known homology, tune DiRe against it, and find Pareto-optimal configurations that match or beat GPU-accelerated UMAP on classification while recovering exact first Betti numbers on stress tests. On 723K arXiv paper embeddings, DiRe preserves 3-4 times more topological structure than UMAP at comparable wall-clock.

* 5 pages, 4 figures; GitHub repositories (https://github.com/sashakolpakov/dire-rapids) (https://github.com/igorrivin/dire-rapids-arxiv); HuggingFace dataset (https://huggingface.co/datasets/igriv/dire-arxiv-bge-small-embeddings)

Via

Access Paper or Ask Questions

Fast Geometric Embedding for Node Influence Maximization

Jun 09, 2025

Alexander Kolpakov, Igor Rivin

Abstract:Computing classical centrality measures such as betweenness and closeness is computationally expensive on large-scale graphs. In this work, we introduce an efficient force layout algorithm that embeds a graph into a low-dimensional space, where the radial distance from the origin serves as a proxy for various centrality measures. We evaluate our method on multiple graph families and demonstrate strong correlations with degree, PageRank, and paths-based centralities. As an application, it turns out that the proposed embedding allows to find high-influence nodes in a network, and provides a fast and scalable alternative to the standard greedy algorithm.

* 8 pages, 4 figures, 18 tables; Github repository available (https://github.com/sashakolpakov/graphem/); Package available on PyPi (https://pypi.org/project/graphem-jax/)

Via

Access Paper or Ask Questions

DiRe-JAX: A JAX based Dimensionality Reduction Algorithm for Large-scale Data

Mar 06, 2025

Alexander Kolpakov, Igor Rivin

Figure 1 for DiRe-JAX: A JAX based Dimensionality Reduction Algorithm for Large-scale Data

Figure 2 for DiRe-JAX: A JAX based Dimensionality Reduction Algorithm for Large-scale Data

Figure 3 for DiRe-JAX: A JAX based Dimensionality Reduction Algorithm for Large-scale Data

Figure 4 for DiRe-JAX: A JAX based Dimensionality Reduction Algorithm for Large-scale Data

Abstract:DiRe - JAX is a new dimensionality reduction toolkit designed to address some of the challenges faced by traditional methods like UMAP and tSNE such as loss of global structure and computational efficiency. Built on the JAX framework, DiRe leverages modern hardware acceleration to provide an efficient, scalable, and interpretable solution for visualizing complex data structures, and for quantitative analysis of lower-dimensional embeddings. The toolkit shows considerable promise in preserving both local and global structures within the data as compared to state-of-the-art UMAP and tSNE implementations. This makes it suitable for a wide range of applications in machine learning, bio-informatics, and data science.

* 22 pages, 12 figures Github repository available at https://github.com/sashakolpakov/dire-jax Package available on PyPi https://pypi.org/project/dire-jax/

Via

Access Paper or Ask Questions

A ripple in time: a discontinuity in American history

Dec 02, 2023

Alexander Kolpakov, Igor Rivin

Abstract:In this note we use the State of the Union Address dataset from Kaggle to make some surprising (and some not so surprising) observations pertaining to the general timeline of American history, and the character and nature of the addresses themselves. Our main approach is using vector embeddings, such as BERT (DistilBERT) and GPT-2. While it is widely believed that BERT (and its variations) is most suitable for NLP classification tasks, we find out that GPT-2 in conjunction with nonlinear dimension reduction methods such as UMAP provide better separation and stronger clustering. This makes GPT-2 + UMAP an interesting alternative. In our case, no model fine-tuning is required, and the pre-trained out-of-the-box GPT-2 model is enough. We also used a fine-tuned DistilBERT model for classification (detecting which president delivered which address), with very good results (accuracy 93% - 95% depending on the run). All computations can be replicated by using the accompanying code on GitHub.

* 7 pages, 8 figures; GitHub repository https://github.com/sashakolpakov/ripple_in_time

Via

Access Paper or Ask Questions

The performance of the batch learner algorithm

Jan 14, 2002

Igor Rivin

Abstract:We analyze completely the convergence speed of the \emph{batch learning algorithm}, and compare its speed to that of the memoryless learning algorithm and of learning with memory. We show that the batch learning algorithm is never worse than the memoryless learning algorithm (at least asymptotically). Its performance \emph{vis-a-vis} learning with full memory is less clearcut, and depends on certain probabilistic assumptions.

* Supercedes a part of cs.LG/0107033

Via

Access Paper or Ask Questions

Mathematics of learning

Dec 03, 2001

Natalia Komarova, Igor Rivin

Abstract:We study the convergence properties of a pair of learning algorithms (learning with and without memory). This leads us to study the dominant eigenvalue of a class of random matrices. This turns out to be related to the roots of the derivative of random polynomials (generated by picking their roots uniformly at random in the interval [0, 1], although our results extend to other distributions). This, in turn, requires the study of the statistical behavior of the harmonic mean of random variables as above, which leads us to delicate question of the rate of convergence to stable laws and tail estimates for stable laws. The reader can find the proofs of most of the results announced here in the paper entitled "Harmonic mean, random polynomials, and random matrices", by the same authors.

* Minor revisions

Via

Access Paper or Ask Questions

Harmonic mean, random polynomials and stochastic matrices

Dec 03, 2001

Natalia Komarova, Igor Rivin

Figure 1 for Harmonic mean, random polynomials and stochastic matrices

Abstract:Motivated by a problem in learning theory, we are led to study the dominant eigenvalue of a class of random matrices. This turns out to be related to the roots of the derivative of random polynomials (generated by picking their roots uniformly at random in the interval [0, 1], although our results extend to other distributions). This, in turn, requires the study of the statistical behavior of the harmonic mean of random variables as above, and that, in turn, leads us to delicate question of the rate of convergence to stable laws and tail estimates for stable laws.

Via

Access Paper or Ask Questions

Yet another zeta function and learning

Jul 25, 2001

Igor Rivin

Abstract:We study the convergence speed of the batch learning algorithm, and compare its speed to that of the memoryless learning algorithm and of learning with memory (as analyzed in joint work with N. Komarova). We obtain precise results and show in particular that the batch learning algorithm is never worse than the memoryless learning algorithm (at least asymptotically). Its performance vis-a-vis learning with full memory is less clearcut, and depends on certainprobabilistic assumptions. These results necessitate theintroduction of the moment zeta function of a probability distribution and the study of some of its properties.

Via

Access Paper or Ask Questions