Picture for Wenhu Chen

Wenhu Chen

LongIns: A Challenging Long-context Instruction-based Exam for LLMs

Add code
Jun 26, 2024
Figure 1 for LongIns: A Challenging Long-context Instruction-based Exam for LLMs
Figure 2 for LongIns: A Challenging Long-context Instruction-based Exam for LLMs
Figure 3 for LongIns: A Challenging Long-context Instruction-based Exam for LLMs
Figure 4 for LongIns: A Challenging Long-context Instruction-based Exam for LLMs
Viaarxiv icon

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Add code
Jun 24, 2024
Figure 1 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 2 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 3 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 4 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Viaarxiv icon

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

Add code
Jun 21, 2024
Viaarxiv icon

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Add code
Jun 20, 2024
Figure 1 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 2 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 3 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 4 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Viaarxiv icon

Unifying Multimodal Retrieval via Document Screenshot Embedding

Add code
Jun 17, 2024
Viaarxiv icon

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Add code
Jun 16, 2024
Viaarxiv icon

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

Add code
Jun 12, 2024
Viaarxiv icon

GenAI Arena: An Open Evaluation Platform for Generative Models

Add code
Jun 06, 2024
Viaarxiv icon

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Add code
Jun 04, 2024
Figure 1 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 2 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 3 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 4 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Viaarxiv icon

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Add code
May 29, 2024
Viaarxiv icon