Picture for Babak Damavandi

Babak Damavandi

Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

Add code
Jun 03, 2026
Viaarxiv icon

CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark

Add code
Oct 30, 2025
Figure 1 for CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Figure 2 for CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Figure 3 for CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Figure 4 for CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Viaarxiv icon

Proactive Assistant Dialogue Generation from Streaming Egocentric Videos

Add code
Jun 06, 2025
Viaarxiv icon

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Add code
Apr 17, 2025
Figure 1 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 2 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 3 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 4 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Viaarxiv icon

Doppelgänger's Watch: A Split Objective Approach to Large Language Models

Add code
Sep 09, 2024
Viaarxiv icon

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

Add code
Mar 07, 2024
Figure 1 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 2 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 3 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 4 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Viaarxiv icon

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Add code
Sep 27, 2023
Figure 1 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 2 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 3 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 4 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Viaarxiv icon

Navigating Connected Memories with a Task-oriented Dialog System

Add code
Nov 15, 2022
Figure 1 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 2 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 3 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 4 for Navigating Connected Memories with a Task-oriented Dialog System
Viaarxiv icon

Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation

Add code
Nov 08, 2022
Figure 1 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Figure 2 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Figure 3 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Figure 4 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Viaarxiv icon

IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

Add code
Oct 26, 2022
Figure 1 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Figure 2 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Figure 3 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Figure 4 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Viaarxiv icon