Picture for Wenqi Zhang

Wenqi Zhang

OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education

Add code
Oct 30, 2025
Viaarxiv icon

Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning

Add code
Oct 09, 2025
Viaarxiv icon

Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models

Add code
Aug 07, 2025
Figure 1 for Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
Figure 2 for Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
Figure 3 for Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
Figure 4 for Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
Viaarxiv icon

OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks

Add code
Aug 07, 2025
Viaarxiv icon

TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

Add code
May 30, 2025
Viaarxiv icon

ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

Add code
May 27, 2025
Figure 1 for ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
Figure 2 for ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
Figure 3 for ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
Figure 4 for ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
Viaarxiv icon

Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems

Add code
May 23, 2025
Viaarxiv icon

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Add code
May 21, 2025
Viaarxiv icon

Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning

Add code
May 21, 2025
Viaarxiv icon

Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

Add code
Apr 29, 2025
Viaarxiv icon