Picture for Jifeng Dai

Jifeng Dai

GenExam: A Multidisciplinary Text-to-Image Exam

Add code
Sep 17, 2025
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Viaarxiv icon

Spatial Frequency Modulation for Semantic Segmentation

Add code
Jul 16, 2025
Figure 1 for Spatial Frequency Modulation for Semantic Segmentation
Figure 2 for Spatial Frequency Modulation for Semantic Segmentation
Figure 3 for Spatial Frequency Modulation for Semantic Segmentation
Figure 4 for Spatial Frequency Modulation for Semantic Segmentation
Viaarxiv icon

CoMemo: LVLMs Need Image Context with Image Memory

Add code
Jun 06, 2025
Viaarxiv icon

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

Add code
Jun 04, 2025
Viaarxiv icon

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings

Add code
May 29, 2025
Viaarxiv icon

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Add code
May 29, 2025
Viaarxiv icon

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space

Add code
May 22, 2025
Viaarxiv icon

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning

Add code
May 07, 2025
Viaarxiv icon

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Add code
Apr 21, 2025
Figure 1 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Figure 2 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Figure 3 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Figure 4 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Viaarxiv icon