Picture for Alkesh Patel

Alkesh Patel

LVSum: A Benchmark for Timestamp-Aware Long Video Summarization

Add code
Apr 11, 2026
Viaarxiv icon

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

Add code
Apr 01, 2026
Viaarxiv icon

Advancing Egocentric Video Question Answering with Multimodal Large Language Models

Add code
Apr 06, 2025
Viaarxiv icon

MARRS: Multimodal Reference Resolution System

Add code
Nov 03, 2023
Figure 1 for MARRS: Multimodal Reference Resolution System
Figure 2 for MARRS: Multimodal Reference Resolution System
Figure 3 for MARRS: Multimodal Reference Resolution System
Figure 4 for MARRS: Multimodal Reference Resolution System
Viaarxiv icon

Referring to Screen Texts with Voice Assistants

Add code
Jun 10, 2023
Figure 1 for Referring to Screen Texts with Voice Assistants
Figure 2 for Referring to Screen Texts with Voice Assistants
Figure 3 for Referring to Screen Texts with Voice Assistants
Figure 4 for Referring to Screen Texts with Voice Assistants
Viaarxiv icon

MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants

Add code
Oct 31, 2021
Figure 1 for MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
Figure 2 for MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
Figure 3 for MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
Figure 4 for MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
Viaarxiv icon

Generating Natural Questions from Images for Multimodal Assistants

Add code
Nov 17, 2020
Figure 1 for Generating Natural Questions from Images for Multimodal Assistants
Figure 2 for Generating Natural Questions from Images for Multimodal Assistants
Figure 3 for Generating Natural Questions from Images for Multimodal Assistants
Figure 4 for Generating Natural Questions from Images for Multimodal Assistants
Viaarxiv icon

Noise-robust Named Entity Understanding for Virtual Assistants

Add code
May 29, 2020
Figure 1 for Noise-robust Named Entity Understanding for Virtual Assistants
Figure 2 for Noise-robust Named Entity Understanding for Virtual Assistants
Figure 3 for Noise-robust Named Entity Understanding for Virtual Assistants
Figure 4 for Noise-robust Named Entity Understanding for Virtual Assistants
Viaarxiv icon