Picture for Jianpeng Jiao

Jianpeng Jiao

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Add code
Dec 14, 2025
Viaarxiv icon

DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains

Add code
Nov 14, 2025
Viaarxiv icon

LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation

Add code
Nov 09, 2025
Viaarxiv icon

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

Add code
Sep 16, 2025
Figure 1 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Figure 2 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Figure 3 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Figure 4 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Viaarxiv icon

MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation

Add code
May 27, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Figure 1 for Seed1.5-VL Technical Report
Figure 2 for Seed1.5-VL Technical Report
Figure 3 for Seed1.5-VL Technical Report
Figure 4 for Seed1.5-VL Technical Report
Viaarxiv icon