Picture for Yujiong Shen

Yujiong Shen

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees

Add code
Mar 24, 2026
Viaarxiv icon

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

Add code
Feb 13, 2026
Viaarxiv icon

CL-bench: A Benchmark for Context Learning

Add code
Feb 03, 2026
Viaarxiv icon

Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies

Add code
Jan 18, 2026
Viaarxiv icon

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

Add code
Jan 04, 2026
Viaarxiv icon

LLMEval-3: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models

Add code
Aug 07, 2025
Viaarxiv icon

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Add code
Jun 04, 2025
Figure 1 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 2 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 3 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 4 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Viaarxiv icon

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

Add code
Mar 19, 2025
Viaarxiv icon

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts

Add code
Mar 09, 2025
Figure 1 for PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Figure 2 for PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Figure 3 for PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Figure 4 for PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Viaarxiv icon

Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training

Add code
Feb 06, 2025
Viaarxiv icon