Picture for Yizhi Li

Yizhi Li

Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements

Add code
Dec 31, 2025
Viaarxiv icon

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Add code
Dec 31, 2025
Viaarxiv icon

Context as a Tool: Context Management for Long-Horizon SWE-Agents

Add code
Dec 26, 2025
Viaarxiv icon

CodeSimpleQA: Scaling Factuality in Code Large Language Models

Add code
Dec 22, 2025
Viaarxiv icon

$M^3-Verse$: A "Spot the Difference" Challenge for Large Multimodal Models

Add code
Dec 21, 2025
Viaarxiv icon

AutoMV: An Automatic Multi-Agent System for Music Video Generation

Add code
Dec 13, 2025
Viaarxiv icon

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Add code
Aug 24, 2025
Figure 1 for TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Figure 2 for TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Figure 3 for TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Figure 4 for TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Viaarxiv icon

First Return, Entropy-Eliciting Explore

Add code
Jul 09, 2025
Viaarxiv icon

Overview of the NLPCC 2025 Shared Task: Gender Bias Mitigation Challenge

Add code
Jun 14, 2025
Viaarxiv icon

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding

Add code
May 29, 2025
Figure 1 for ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
Figure 2 for ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
Figure 3 for ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
Figure 4 for ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
Viaarxiv icon