Picture for Yubo Ma

Yubo Ma

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

Add code
Apr 26, 2025
Viaarxiv icon

Synergistic Weak-Strong Collaboration by Aligning Preferences

Add code
Apr 22, 2025
Viaarxiv icon

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Add code
Jan 21, 2025
Figure 1 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 2 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 3 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 4 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Viaarxiv icon

Long Context vs. RAG for LLMs: An Evaluation and Revisits

Add code
Dec 27, 2024
Figure 1 for Long Context vs. RAG for LLMs: An Evaluation and Revisits
Figure 2 for Long Context vs. RAG for LLMs: An Evaluation and Revisits
Figure 3 for Long Context vs. RAG for LLMs: An Evaluation and Revisits
Figure 4 for Long Context vs. RAG for LLMs: An Evaluation and Revisits
Viaarxiv icon

AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge

Add code
Dec 18, 2024
Figure 1 for AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
Figure 2 for AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
Figure 3 for AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
Figure 4 for AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
Viaarxiv icon

Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation

Add code
Sep 25, 2024
Figure 1 for Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation
Figure 2 for Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation
Figure 3 for Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation
Figure 4 for Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation
Viaarxiv icon

TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning

Add code
Sep 18, 2024
Figure 1 for TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning
Figure 2 for TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning
Figure 3 for TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning
Figure 4 for TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning
Viaarxiv icon

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Add code
Jul 01, 2024
Viaarxiv icon

SciAgent: Tool-augmented Language Models for Scientific Reasoning

Add code
Feb 21, 2024
Figure 1 for SciAgent: Tool-augmented Language Models for Scientific Reasoning
Figure 2 for SciAgent: Tool-augmented Language Models for Scientific Reasoning
Figure 3 for SciAgent: Tool-augmented Language Models for Scientific Reasoning
Figure 4 for SciAgent: Tool-augmented Language Models for Scientific Reasoning
Viaarxiv icon

Learning To Teach Large Language Models Logical Reasoning

Add code
Oct 13, 2023
Viaarxiv icon