Picture for Yujiong Shen

Yujiong Shen

CL-bench: A Benchmark for Context Learning

Add code
Feb 03, 2026
Viaarxiv icon

Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies

Add code
Jan 18, 2026
Viaarxiv icon

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

Add code
Jan 04, 2026
Viaarxiv icon

LLMEval-3: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models

Add code
Aug 07, 2025
Viaarxiv icon

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Add code
Jun 04, 2025
Figure 1 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 2 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 3 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 4 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Viaarxiv icon

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

Add code
Mar 19, 2025
Viaarxiv icon

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts

Add code
Mar 09, 2025
Figure 1 for PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Figure 2 for PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Figure 3 for PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Figure 4 for PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Viaarxiv icon

Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training

Add code
Feb 06, 2025
Viaarxiv icon

TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

Add code
Jul 31, 2024
Figure 1 for TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities
Figure 2 for TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities
Figure 3 for TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities
Figure 4 for TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities
Viaarxiv icon

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Add code
Jan 29, 2024
Figure 1 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 2 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 3 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 4 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Viaarxiv icon