Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vasily Motolygin

Pipeline for Verifying LLM-Generated Mathematical Solutions

Feb 24, 2026

Varvara Sazonova, Dmitri Shmelkin, Stanislav Kikot, Vasily Motolygin

Abstract:With the growing popularity of Large Reasoning Models and their results in solving mathematical problems, it becomes crucial to measure their capabilities. We introduce a pipeline for both automatic and interactive verification as a more accurate alternative to only checking the answer which is currently the most popular approach for benchmarks. The pipeline can also be used as a generator of correct solutions both in formal and informal languages. 3 AI agents, which can be chosen for the benchmark accordingly, are included in the structure. The key idea is the use of prompts to obtain the solution in the specific form which allows for easier verification using proof assistants and possible use of small models ($\le 8B$). Experiments on several datasets suggest low probability of False Positives. The open-source implementation with instructions on setting up a server is available at https://github.com/LogicEnj/lean4_verification_pipeline.

Via

Access Paper or Ask Questions

On The Expressive Power of Knowledge Graph Embedding Methods

Jul 26, 2024

Jiexing Gao, Dmitry Rodin, Vasily Motolygin, Denis Zaytsev

Figure 1 for On The Expressive Power of Knowledge Graph Embedding Methods

Abstract:Knowledge Graph Embedding (KGE) is a popular approach, which aims to represent entities and relations of a knowledge graph in latent spaces. Their representations are known as embeddings. To measure the plausibility of triplets, score functions are defined over embedding spaces. Despite wide dissemination of KGE in various tasks, KGE methods have limitations in reasoning abilities. In this paper we propose a mathematical framework to compare reasoning abilities of KGE methods. We show that STransE has a higher capability than TransComplEx, and then present new STransCoRe method, which improves the STransE by combining it with the TransCoRe insights, which can reduce the STransE space complexity.

* This paper may involve data that is not readily available to the public

Via

Access Paper or Ask Questions