Picture for Yuanxin Liu

Yuanxin Liu

VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

Add code
May 29, 2025
Viaarxiv icon

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Add code
May 28, 2025
Viaarxiv icon

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Add code
Apr 24, 2025
Viaarxiv icon

Kimi-VL Technical Report

Add code
Apr 10, 2025
Viaarxiv icon

UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

Add code
Mar 13, 2025
Viaarxiv icon

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

Add code
Jan 31, 2025
Figure 1 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Figure 2 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Figure 3 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Figure 4 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Viaarxiv icon

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

Add code
Dec 16, 2024
Figure 1 for PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
Figure 2 for PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
Figure 3 for PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
Figure 4 for PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
Viaarxiv icon

Temporal Reasoning Transfer from Text to Video

Add code
Oct 08, 2024
Figure 1 for Temporal Reasoning Transfer from Text to Video
Figure 2 for Temporal Reasoning Transfer from Text to Video
Figure 3 for Temporal Reasoning Transfer from Text to Video
Figure 4 for Temporal Reasoning Transfer from Text to Video
Viaarxiv icon

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

Add code
May 31, 2024
Figure 1 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 2 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 3 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 4 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Viaarxiv icon

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

Add code
Mar 28, 2024
Figure 1 for Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Figure 2 for Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Figure 3 for Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Figure 4 for Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Viaarxiv icon