Picture for Jae Sung Park

Jae Sung Park

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition

Add code
May 05, 2026
Viaarxiv icon

MolmoAct2: Action Reasoning Models for Real-world Deployment

Add code
May 04, 2026
Viaarxiv icon

MolmoPoint: Better Pointing for VLMs with Grounding Tokens

Add code
Mar 30, 2026
Viaarxiv icon

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Add code
Jan 15, 2026
Viaarxiv icon

Synthetic Visual Genome

Add code
Jun 09, 2025
Viaarxiv icon

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Add code
Nov 12, 2024
Figure 1 for BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions
Figure 2 for BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions
Figure 3 for BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions
Figure 4 for BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions
Viaarxiv icon

ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition

Add code
Oct 08, 2024
Figure 1 for ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
Figure 2 for ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
Figure 3 for ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
Figure 4 for ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
Viaarxiv icon

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

Add code
Jul 02, 2024
Figure 1 for Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Figure 2 for Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Figure 3 for Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Figure 4 for Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Viaarxiv icon

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Add code
May 29, 2024
Figure 1 for Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Figure 2 for Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Figure 3 for Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Figure 4 for Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Viaarxiv icon