Picture for Yining Zheng

Yining Zheng

Beyond Rating: A Comprehensive Evaluation and Benchmark for AI Reviews

Add code
Apr 22, 2026
Viaarxiv icon

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Add code
Apr 13, 2026
Viaarxiv icon

AI Can Learn Scientific Taste

Add code
Mar 15, 2026
Viaarxiv icon

AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts

Add code
Jan 29, 2026
Viaarxiv icon

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Add code
Jan 16, 2026
Viaarxiv icon

Multi-hop Reasoning via Early Knowledge Alignment

Add code
Dec 23, 2025
Figure 1 for Multi-hop Reasoning via Early Knowledge Alignment
Figure 2 for Multi-hop Reasoning via Early Knowledge Alignment
Figure 3 for Multi-hop Reasoning via Early Knowledge Alignment
Figure 4 for Multi-hop Reasoning via Early Knowledge Alignment
Viaarxiv icon

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Add code
Nov 06, 2025
Figure 1 for Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Figure 2 for Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Figure 3 for Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Figure 4 for Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Viaarxiv icon

R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning

Add code
May 26, 2025
Figure 1 for R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
Figure 2 for R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
Figure 3 for R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
Figure 4 for R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
Viaarxiv icon

FamilyTool: A Multi-hop Personalized Tool Use Benchmark

Add code
Apr 09, 2025
Figure 1 for FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Figure 2 for FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Figure 3 for FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Figure 4 for FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Viaarxiv icon

How to Mitigate Overfitting in Weak-to-strong Generalization?

Add code
Mar 06, 2025
Figure 1 for How to Mitigate Overfitting in Weak-to-strong Generalization?
Figure 2 for How to Mitigate Overfitting in Weak-to-strong Generalization?
Figure 3 for How to Mitigate Overfitting in Weak-to-strong Generalization?
Figure 4 for How to Mitigate Overfitting in Weak-to-strong Generalization?
Viaarxiv icon