Picture for Sangho Lee

Sangho Lee

MolmoAct: Action Reasoning Models that can Reason in Space

Add code
Aug 12, 2025
Viaarxiv icon

ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams

Add code
Apr 21, 2025
Viaarxiv icon

MAMS: Model-Agnostic Module Selection Framework for Video Captioning

Add code
Jan 30, 2025
Figure 1 for MAMS: Model-Agnostic Module Selection Framework for Video Captioning
Figure 2 for MAMS: Model-Agnostic Module Selection Framework for Video Captioning
Figure 3 for MAMS: Model-Agnostic Module Selection Framework for Video Captioning
Figure 4 for MAMS: Model-Agnostic Module Selection Framework for Video Captioning
Viaarxiv icon

One Diffusion to Generate Them All

Add code
Nov 25, 2024
Figure 1 for One Diffusion to Generate Them All
Figure 2 for One Diffusion to Generate Them All
Figure 3 for One Diffusion to Generate Them All
Figure 4 for One Diffusion to Generate Them All
Viaarxiv icon

Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Add code
Nov 03, 2024
Figure 1 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Figure 2 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Figure 3 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Figure 4 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Viaarxiv icon

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Add code
Dec 28, 2023
Figure 1 for Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Figure 2 for Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Figure 3 for Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Figure 4 for Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Viaarxiv icon

Integrated Path Tracking with DYC and MPC using LSTM Based Tire Force Estimator for Four-wheel Independent Steering and Driving Vehicle

Add code
Dec 13, 2023
Viaarxiv icon

Can Language Models Laugh at YouTube Short-form Videos?

Add code
Oct 26, 2023
Figure 1 for Can Language Models Laugh at YouTube Short-form Videos?
Figure 2 for Can Language Models Laugh at YouTube Short-form Videos?
Figure 3 for Can Language Models Laugh at YouTube Short-form Videos?
Figure 4 for Can Language Models Laugh at YouTube Short-form Videos?
Viaarxiv icon

X-CANIDS: Signal-Aware Explainable Intrusion Detection System for Controller Area Network-Based In-Vehicle Network

Add code
Mar 22, 2023
Figure 1 for X-CANIDS: Signal-Aware Explainable Intrusion Detection System for Controller Area Network-Based In-Vehicle Network
Figure 2 for X-CANIDS: Signal-Aware Explainable Intrusion Detection System for Controller Area Network-Based In-Vehicle Network
Figure 3 for X-CANIDS: Signal-Aware Explainable Intrusion Detection System for Controller Area Network-Based In-Vehicle Network
Figure 4 for X-CANIDS: Signal-Aware Explainable Intrusion Detection System for Controller Area Network-Based In-Vehicle Network
Viaarxiv icon