Picture for Jinjie Gu

Jinjie Gu

LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

Add code
Mar 03, 2026
Viaarxiv icon

LiveClin: A Live Clinical Benchmark without Leakage

Add code
Feb 18, 2026
Viaarxiv icon

WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning

Add code
Feb 13, 2026
Viaarxiv icon

ClinAlign: Scaling Healthcare Alignment from Clinician Preference

Add code
Feb 11, 2026
Viaarxiv icon

V2P: Visual Attention Calibration for GUI Grounding via Background Suppression and Center Peaking

Add code
Jan 11, 2026
Viaarxiv icon

MedDialogRubrics: A Comprehensive Benchmark and Evaluation Framework for Multi-turn Medical Consultations in Large Language Models

Add code
Jan 07, 2026
Viaarxiv icon

Perplexity-Aware Data Scaling Law: Perplexity Landscapes Predict Performance for Continual Pre-training

Add code
Dec 25, 2025
Viaarxiv icon

Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Add code
Nov 18, 2025
Figure 1 for Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Figure 2 for Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Figure 3 for Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Figure 4 for Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Viaarxiv icon

GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning

Add code
Nov 10, 2025
Viaarxiv icon

HAD: HAllucination Detection Language Models Based on a Comprehensive Hallucination Taxonomy

Add code
Oct 22, 2025
Viaarxiv icon