Picture for Karan Sapra

Karan Sapra

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

Add code
Mar 14, 2026
Viaarxiv icon

Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

Add code
Mar 05, 2026
Viaarxiv icon

Stateful Token Reduction for Long-Video Hybrid VLMs

Add code
Feb 27, 2026
Viaarxiv icon

NVIDIA Nemotron Nano V2 VL

Add code
Nov 07, 2025
Viaarxiv icon

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Add code
Apr 10, 2025
Figure 1 for Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Figure 2 for Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Figure 3 for Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Figure 4 for Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Viaarxiv icon

AIDE: Agentically Improve Visual Language Model with Domain Experts

Add code
Feb 13, 2025
Figure 1 for AIDE: Agentically Improve Visual Language Model with Domain Experts
Figure 2 for AIDE: Agentically Improve Visual Language Model with Domain Experts
Figure 3 for AIDE: Agentically Improve Visual Language Model with Domain Experts
Figure 4 for AIDE: Agentically Improve Visual Language Model with Domain Experts
Viaarxiv icon

Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents

Add code
Feb 06, 2025
Figure 1 for Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Figure 2 for Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Figure 3 for Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Figure 4 for Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Viaarxiv icon

OMCAT: Omni Context Aware Transformer

Add code
Oct 15, 2024
Figure 1 for OMCAT: Omni Context Aware Transformer
Figure 2 for OMCAT: Omni Context Aware Transformer
Figure 3 for OMCAT: Omni Context Aware Transformer
Figure 4 for OMCAT: Omni Context Aware Transformer
Viaarxiv icon

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Add code
Aug 28, 2024
Figure 1 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 2 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 3 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 4 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Viaarxiv icon

Personalized Federated Learning with First Order Model Optimization

Add code
Jan 28, 2021
Figure 1 for Personalized Federated Learning with First Order Model Optimization
Figure 2 for Personalized Federated Learning with First Order Model Optimization
Figure 3 for Personalized Federated Learning with First Order Model Optimization
Figure 4 for Personalized Federated Learning with First Order Model Optimization
Viaarxiv icon