Picture for Ge Zhang

Ge Zhang

LongIns: A Challenging Long-context Instruction-based Exam for LLMs

Add code
Jun 26, 2024
Figure 1 for LongIns: A Challenging Long-context Instruction-based Exam for LLMs
Figure 2 for LongIns: A Challenging Long-context Instruction-based Exam for LLMs
Figure 3 for LongIns: A Challenging Long-context Instruction-based Exam for LLMs
Figure 4 for LongIns: A Challenging Long-context Instruction-based Exam for LLMs
Viaarxiv icon

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Add code
Jun 24, 2024
Figure 1 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 2 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 3 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 4 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Viaarxiv icon

GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

Add code
Jun 24, 2024
Figure 1 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Figure 2 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Figure 3 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Figure 4 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Viaarxiv icon

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Add code
Jun 20, 2024
Viaarxiv icon

MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

Add code
Jun 19, 2024
Viaarxiv icon

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Add code
Jun 11, 2024
Viaarxiv icon

VCR: Visual Caption Restoration

Add code
Jun 10, 2024
Viaarxiv icon

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Add code
Jun 04, 2024
Figure 1 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 2 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 3 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 4 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Viaarxiv icon

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

Add code
Jun 03, 2024
Figure 1 for D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Figure 2 for D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Figure 3 for D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Figure 4 for D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Viaarxiv icon

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Add code
May 29, 2024
Viaarxiv icon