Picture for Dasen Dai

Dasen Dai

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Add code
May 06, 2026
Viaarxiv icon

UIPress: Bringing Optical Token Compression to UI-to-Code Generation

Add code
Apr 10, 2026
Viaarxiv icon

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

Add code
Apr 01, 2026
Viaarxiv icon

PaperVoyager : Building Interactive Web with Visual Language Models

Add code
Mar 24, 2026
Viaarxiv icon

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

Add code
Mar 02, 2026
Viaarxiv icon

FMVP: Masked Flow Matching for Adversarial Video Purification

Add code
Jan 05, 2026
Viaarxiv icon

VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models

Add code
Feb 23, 2025
Figure 1 for VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models
Figure 2 for VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models
Figure 3 for VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models
Figure 4 for VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models
Viaarxiv icon