Picture for Seungwhan Moon

Seungwhan Moon

Pixel-Grounded Retrieval for Knowledgeable Large Multimodal Models

Add code
Jan 27, 2026
Viaarxiv icon

Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage

Add code
Oct 02, 2025
Figure 1 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Figure 2 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Figure 3 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Figure 4 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Viaarxiv icon

Proactive Assistant Dialogue Generation from Streaming Egocentric Videos

Add code
Jun 06, 2025
Viaarxiv icon

VisualLens: Personalization through Visual History

Add code
Nov 25, 2024
Figure 1 for VisualLens: Personalization through Visual History
Figure 2 for VisualLens: Personalization through Visual History
Figure 3 for VisualLens: Personalization through Visual History
Figure 4 for VisualLens: Personalization through Visual History
Viaarxiv icon

Doppelgänger's Watch: A Split Objective Approach to Large Language Models

Add code
Sep 09, 2024
Viaarxiv icon

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

Add code
Mar 07, 2024
Figure 1 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 2 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 3 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 4 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Viaarxiv icon

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Add code
Feb 16, 2024
Figure 1 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Figure 2 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Figure 3 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Figure 4 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Viaarxiv icon

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Add code
Sep 27, 2023
Figure 1 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 2 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 3 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 4 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Viaarxiv icon

Embodied Executable Policy Learning with Language-based Scene Summarization

Add code
Jun 09, 2023
Viaarxiv icon

Normalized Contrastive Learning for Text-Video Retrieval

Add code
Nov 30, 2022
Figure 1 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 2 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 3 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 4 for Normalized Contrastive Learning for Text-Video Retrieval
Viaarxiv icon