Picture for Dailin Li

Dailin Li

Mask-Proof: An LLM-based Automated Data Curation Pipeline on Mathematical Proofs

Add code
Jun 13, 2026
Viaarxiv icon

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

Add code
Apr 20, 2026
Viaarxiv icon

TRIP-Bench: A Benchmark for Long-Horizon Interactive Agents in Real-World Scenarios

Add code
Feb 02, 2026
Viaarxiv icon

SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

Add code
Nov 07, 2025
Viaarxiv icon

MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook

Add code
Sep 17, 2025
Figure 1 for MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Figure 2 for MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Figure 3 for MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Figure 4 for MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Viaarxiv icon

Efficient Tuning of Large Language Models for Knowledge-Grounded Dialogue Generation

Add code
Apr 10, 2025
Viaarxiv icon