Picture for Yi-Fan Zhang

Yi-Fan Zhang

Kwai Keye-VL Technical Report

Add code
Jul 02, 2025
Viaarxiv icon

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Add code
May 27, 2025
Viaarxiv icon

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Add code
May 05, 2025
Viaarxiv icon

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Add code
Apr 07, 2025
Viaarxiv icon

Aligning Multimodal LLM with Human Preference: A Survey

Add code
Mar 18, 2025
Viaarxiv icon

From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education

Add code
Feb 19, 2025
Figure 1 for From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Figure 2 for From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Figure 3 for From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Figure 4 for From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Viaarxiv icon

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Viaarxiv icon

TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting

Add code
Dec 30, 2024
Figure 1 for TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting
Figure 2 for TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting
Figure 3 for TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting
Figure 4 for TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting
Viaarxiv icon

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Add code
Nov 22, 2024
Figure 1 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Figure 2 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Figure 3 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Figure 4 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Viaarxiv icon

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

Add code
Oct 06, 2024
Figure 1 for ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Figure 2 for ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Figure 3 for ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Figure 4 for ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Viaarxiv icon