Picture for Jifeng Dai

Jifeng Dai

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning

Add code
May 07, 2025
Viaarxiv icon

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Add code
Apr 21, 2025
Figure 1 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Figure 2 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Figure 3 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Figure 4 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

LangBridge: Interpreting Image as a Combination of Language Embeddings

Add code
Mar 26, 2025
Figure 1 for LangBridge: Interpreting Image as a Combination of Language Embeddings
Figure 2 for LangBridge: Interpreting Image as a Combination of Language Embeddings
Figure 3 for LangBridge: Interpreting Image as a Combination of Language Embeddings
Figure 4 for LangBridge: Interpreting Image as a Combination of Language Embeddings
Viaarxiv icon

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Add code
Mar 25, 2025
Viaarxiv icon

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Add code
Mar 13, 2025
Figure 1 for VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Figure 2 for VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Figure 3 for VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Figure 4 for VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Viaarxiv icon

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Add code
Mar 13, 2025
Viaarxiv icon

MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism

Add code
Mar 03, 2025
Figure 1 for MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism
Figure 2 for MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism
Figure 3 for MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism
Figure 4 for MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

Add code
Dec 20, 2024
Viaarxiv icon