Picture for Seungwhan Moon

Seungwhan Moon

Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage

Add code
Oct 02, 2025
Viaarxiv icon

Proactive Assistant Dialogue Generation from Streaming Egocentric Videos

Add code
Jun 06, 2025
Viaarxiv icon

VisualLens: Personalization through Visual History

Add code
Nov 25, 2024
Figure 1 for VisualLens: Personalization through Visual History
Figure 2 for VisualLens: Personalization through Visual History
Figure 3 for VisualLens: Personalization through Visual History
Figure 4 for VisualLens: Personalization through Visual History
Viaarxiv icon

Doppelgänger's Watch: A Split Objective Approach to Large Language Models

Add code
Sep 09, 2024
Viaarxiv icon

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

Add code
Mar 07, 2024
Figure 1 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 2 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 3 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 4 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Viaarxiv icon

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Add code
Feb 16, 2024
Figure 1 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Figure 2 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Figure 3 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Figure 4 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Viaarxiv icon

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Add code
Sep 27, 2023
Figure 1 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 2 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 3 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 4 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Viaarxiv icon

Embodied Executable Policy Learning with Language-based Scene Summarization

Add code
Jun 09, 2023
Viaarxiv icon

Normalized Contrastive Learning for Text-Video Retrieval

Add code
Nov 30, 2022
Figure 1 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 2 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 3 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 4 for Normalized Contrastive Learning for Text-Video Retrieval
Viaarxiv icon

Navigating Connected Memories with a Task-oriented Dialog System

Add code
Nov 15, 2022
Figure 1 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 2 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 3 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 4 for Navigating Connected Memories with a Task-oriented Dialog System
Viaarxiv icon