Picture for Wen-Ding Li

Wen-Ding Li

Tony

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Add code
Apr 15, 2025
Viaarxiv icon

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Add code
Apr 14, 2025
Viaarxiv icon

Challenges and Paths Towards AI for Software Engineering

Add code
Mar 28, 2025
Viaarxiv icon

MMTEB: Massive Multilingual Text Embedding Benchmark

Add code
Feb 19, 2025
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models

Add code
Dec 04, 2024
Figure 1 for Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
Figure 2 for Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
Figure 3 for Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
Figure 4 for Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
Viaarxiv icon

Combining Induction and Transduction for Abstract Reasoning

Add code
Nov 04, 2024
Viaarxiv icon

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Add code
Jun 26, 2024
Figure 1 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 2 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 3 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 4 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Viaarxiv icon

Is Programming by Example solved by LLMs?

Add code
Jun 12, 2024
Viaarxiv icon

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Add code
Mar 12, 2024
Viaarxiv icon