Picture for Jiaheng Liu

Jiaheng Liu

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Add code
Dec 18, 2025
Viaarxiv icon

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Add code
Dec 14, 2025
Viaarxiv icon

AutoMV: An Automatic Multi-Agent System for Music Video Generation

Add code
Dec 13, 2025
Viaarxiv icon

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Add code
Nov 13, 2025
Figure 1 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 2 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 3 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 4 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Viaarxiv icon

SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

Add code
Nov 07, 2025
Viaarxiv icon

Scaling Latent Reasoning via Looped Language Models

Add code
Oct 29, 2025
Figure 1 for Scaling Latent Reasoning via Looped Language Models
Figure 2 for Scaling Latent Reasoning via Looped Language Models
Figure 3 for Scaling Latent Reasoning via Looped Language Models
Figure 4 for Scaling Latent Reasoning via Looped Language Models
Viaarxiv icon

Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Add code
Sep 04, 2025
Viaarxiv icon

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Add code
Aug 11, 2025
Viaarxiv icon

IFEvalCode: Controlled Code Generation

Add code
Jul 30, 2025
Viaarxiv icon

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Add code
Jul 08, 2025
Viaarxiv icon