Picture for Rohun Tripathi

Rohun Tripathi

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition

Add code
May 05, 2026
Viaarxiv icon

MolmoPoint: Better Pointing for VLMs with Grounding Tokens

Add code
Mar 30, 2026
Viaarxiv icon

Unified Spatio-Temporal Token Scoring for Efficient Video VLMs

Add code
Mar 18, 2026
Viaarxiv icon

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation

Add code
Mar 17, 2026
Viaarxiv icon

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Add code
Jan 15, 2026
Viaarxiv icon

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Add code
Dec 15, 2025
Viaarxiv icon

HinTel-AlignBench: A Framework and Benchmark for Hindi-Telugu with English-Aligned Samples

Add code
Nov 19, 2025
Viaarxiv icon

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers

Add code
Mar 21, 2023
Viaarxiv icon

ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors

Add code
Aug 21, 2020
Figure 1 for ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors
Figure 2 for ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors
Figure 3 for ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors
Figure 4 for ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors
Viaarxiv icon