Picture for Jaemin Cho

Jaemin Cho

MolmoAct2: Action Reasoning Models for Real-world Deployment

Add code
May 04, 2026
Viaarxiv icon

WildDet3D: Scaling Promptable 3D Detection in the Wild

Add code
Apr 09, 2026
Viaarxiv icon

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Add code
Mar 25, 2026
Viaarxiv icon

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

Add code
Mar 17, 2026
Viaarxiv icon

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Add code
Mar 04, 2026
Viaarxiv icon

AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

Add code
Feb 16, 2026
Viaarxiv icon

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Add code
Jul 09, 2025
Figure 1 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Figure 2 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Viaarxiv icon

CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval

Add code
Jun 06, 2025
Viaarxiv icon

EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance

Add code
May 28, 2025
Viaarxiv icon

CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Add code
Apr 21, 2025
Viaarxiv icon