Picture for Songyang Zhang

Songyang Zhang

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Add code
Jul 16, 2024
Viaarxiv icon

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

Add code
Jul 15, 2024
Figure 1 for CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
Figure 2 for CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
Figure 3 for CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
Figure 4 for CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
Viaarxiv icon

GTA: A Benchmark for General Tool Agents

Add code
Jul 11, 2024
Viaarxiv icon

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Add code
Jul 03, 2024
Figure 1 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 2 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 3 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 4 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Viaarxiv icon

InternLM-Law: An Open Source Chinese Legal Large Language Model

Add code
Jun 21, 2024
Viaarxiv icon

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Add code
Jun 20, 2024
Figure 1 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Figure 2 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Figure 3 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Figure 4 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Viaarxiv icon

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark

Add code
May 20, 2024
Viaarxiv icon

FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data

Add code
May 07, 2024
Figure 1 for FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data
Figure 2 for FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data
Figure 3 for FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data
Figure 4 for FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data
Viaarxiv icon

TiRE-GAN: Task-Incentivized Generative Learning Models for Radiomap Estimation with Radio Propagation Model

Add code
May 04, 2024
Viaarxiv icon

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

Add code
Apr 29, 2024
Viaarxiv icon