Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yike Zhao

RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning

Nov 06, 2025

Xinyuan Li, Murong Xu, Wenbiao Tao, Hanlun Zhu, Yike Zhao, Jipeng Zhang, Yunshi Lan

Abstract:Large language models (LLMs) achieve high performance on mathematical reasoning, but these results can be inflated by training data leakage or superficial pattern matching rather than genuine reasoning. To this end, an adversarial perturbation-based evaluation is needed to measure true mathematical reasoning ability. Current rule-based perturbation methods often generate ill-posed questions and impede the systematic evaluation of question difficulty and the evolution of benchmarks. To bridge this gap, we propose RIDE, a novel adversarial question-rewriting framework that leverages Item Response Theory (IRT) to rigorously measure question difficulty and to generate intrinsically more challenging, well-posed variations of mathematical problems. We employ 35 LLMs to simulate students and build a difficulty ranker from their responses. This ranker provides a reward signal during reinforcement learning and guides a question-rewriting model to reformulate existing questions across difficulty levels. Applying RIDE to competition-level mathematical benchmarks yields perturbed versions that degrade advanced LLM performance, with experiments showing an average 21.73% drop across 26 models, thereby exposing limited robustness in mathematical reasoning and confirming the validity of our evaluation approach.

Via

Access Paper or Ask Questions

More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning

Oct 08, 2025

Yike Zhao, Simin Guo, Ziqing Yang, Shifan Han, Dahua Lin, Fei Tan

Figure 1 for More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning

Figure 2 for More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning

Figure 3 for More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning

Figure 4 for More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning

Abstract:The reasoning capabilities of Large Language Models (LLMs) play a critical role in many downstream tasks, yet depend strongly on the quality of training data. Despite various proposed data construction methods, their practical utility in real-world pipelines remains underexplored. In this work, we conduct a comprehensive analysis of open-source datasets and data synthesis techniques for mathematical reasoning, evaluating them under a unified pipeline designed to mirror training and deployment scenarios. We further distill effective data selection strategies and identify practical methods suitable for industrial applications. Our findings highlight that structuring data in more interpretable formats, or distilling from stronger models often outweighs simply scaling up data volume. This study provides actionable guidance for integrating training data to enhance LLM capabilities, supporting both cost-effective data curation and scalable model enhancement. We hope this work will inspire further research on how to balance "more data" versus "better data" for real-world reasoning tasks.

* 12 pages, 3 figures, submitted to EMNLP 2025 Industry Track

Via

Access Paper or Ask Questions

Diffusion Stochastic Learning Over Adaptive Competing Networks

Apr 28, 2025

Yike Zhao, Haoyuan Cai, Ali H. Sayed

Figure 1 for Diffusion Stochastic Learning Over Adaptive Competing Networks

Figure 2 for Diffusion Stochastic Learning Over Adaptive Competing Networks

Figure 3 for Diffusion Stochastic Learning Over Adaptive Competing Networks

Figure 4 for Diffusion Stochastic Learning Over Adaptive Competing Networks

Abstract:This paper studies a stochastic dynamic game between two competing teams, each consisting of a network of collaborating agents. Unlike fully cooperative settings, where all agents share a common objective, each team in this game aims to minimize its own distinct objective. In the adversarial setting, their objectives could be conflicting as in zero-sum games. Throughout the competition, agents share strategic information within their own team while simultaneously inferring and adapting to the strategies of the opposing team. We propose diffusion learning algorithms to address two important classes of this network game: i) a zero-sum game characterized by weak cross-team subgraph interactions, and ii) a general non-zero-sum game exhibiting strong cross-team subgraph interactions. We analyze the stability performance of the proposed algorithms under reasonable assumptions and illustrate the theoretical results through experiments on Cournot team competition and decentralized GAN training.

Via

Access Paper or Ask Questions