Picture for Yifan Du

Yifan Du

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Add code
Jul 02, 2025
Viaarxiv icon

AVC-DPO: Aligned Video Captioning via Direct Preference Optimization

Add code
Jul 02, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Add code
Jan 03, 2025
Viaarxiv icon

Exploring the Design Space of Visual Context Representation in Video MLLMs

Add code
Oct 17, 2024
Figure 1 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 2 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 3 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 4 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Viaarxiv icon

Towards Event-oriented Long Video Understanding

Add code
Jun 20, 2024
Viaarxiv icon

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

Add code
Jun 13, 2024
Figure 1 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 2 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 3 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 4 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Viaarxiv icon

What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

Add code
Nov 02, 2023
Figure 1 for What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Figure 2 for What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Figure 3 for What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Figure 4 for What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Viaarxiv icon

Learning to Imagine: Visually-Augmented Natural Language Generation

Add code
Jun 04, 2023
Figure 1 for Learning to Imagine: Visually-Augmented Natural Language Generation
Figure 2 for Learning to Imagine: Visually-Augmented Natural Language Generation
Figure 3 for Learning to Imagine: Visually-Augmented Natural Language Generation
Figure 4 for Learning to Imagine: Visually-Augmented Natural Language Generation
Viaarxiv icon

Zero-shot Visual Question Answering with Language Model Feedback

Add code
May 26, 2023
Viaarxiv icon