Picture for Yi-Fan Zhang

Yi-Fan Zhang

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

Add code
Oct 10, 2025
Viaarxiv icon

BaseReward: A Strong Baseline for Multimodal Reward Model

Add code
Sep 19, 2025
Viaarxiv icon

Kwai Keye-VL Technical Report

Add code
Jul 02, 2025
Viaarxiv icon

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Add code
May 27, 2025
Viaarxiv icon

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Add code
May 05, 2025
Viaarxiv icon

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Add code
Apr 07, 2025
Viaarxiv icon

Aligning Multimodal LLM with Human Preference: A Survey

Add code
Mar 18, 2025
Figure 1 for Aligning Multimodal LLM with Human Preference: A Survey
Figure 2 for Aligning Multimodal LLM with Human Preference: A Survey
Figure 3 for Aligning Multimodal LLM with Human Preference: A Survey
Figure 4 for Aligning Multimodal LLM with Human Preference: A Survey
Viaarxiv icon

From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education

Add code
Feb 19, 2025
Figure 1 for From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Figure 2 for From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Figure 3 for From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Figure 4 for From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Viaarxiv icon

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Viaarxiv icon

TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting

Add code
Dec 30, 2024
Figure 1 for TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting
Figure 2 for TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting
Figure 3 for TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting
Figure 4 for TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting
Viaarxiv icon