Picture for Yifan Yao

Yifan Yao

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

Add code
Apr 20, 2026
Viaarxiv icon

CodeTracer: Towards Traceable Agent States

Add code
Apr 14, 2026
Viaarxiv icon

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Add code
Dec 23, 2025
Viaarxiv icon

SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

Add code
Nov 07, 2025
Viaarxiv icon

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Add code
Jul 08, 2025
Figure 1 for CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Figure 2 for CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Figure 3 for CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Figure 4 for CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Viaarxiv icon

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Add code
May 21, 2025
Figure 1 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Figure 2 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Figure 3 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Figure 4 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Viaarxiv icon

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

Add code
Apr 21, 2025
Viaarxiv icon

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Add code
Feb 20, 2025
Viaarxiv icon

A Survey on Large Language Model Security and Privacy: The Good, the Bad, and the Ugly

Add code
Dec 04, 2023
Figure 1 for A Survey on Large Language Model  Security and Privacy: The Good, the Bad, and the Ugly
Figure 2 for A Survey on Large Language Model  Security and Privacy: The Good, the Bad, and the Ugly
Figure 3 for A Survey on Large Language Model  Security and Privacy: The Good, the Bad, and the Ugly
Figure 4 for A Survey on Large Language Model  Security and Privacy: The Good, the Bad, and the Ugly
Viaarxiv icon

Towards Few-shot Out-of-Distribution Detection

Add code
Nov 20, 2023
Figure 1 for Towards Few-shot Out-of-Distribution Detection
Figure 2 for Towards Few-shot Out-of-Distribution Detection
Figure 3 for Towards Few-shot Out-of-Distribution Detection
Figure 4 for Towards Few-shot Out-of-Distribution Detection
Viaarxiv icon