Picture for Seungwhan Moon

Seungwhan Moon

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

Add code
Mar 07, 2024
Figure 1 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 2 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 3 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 4 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Viaarxiv icon

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Add code
Feb 16, 2024
Figure 1 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Figure 2 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Figure 3 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Figure 4 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Viaarxiv icon

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Add code
Sep 27, 2023
Figure 1 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 2 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 3 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Figure 4 for AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Viaarxiv icon

Embodied Executable Policy Learning with Language-based Scene Summarization

Add code
Jun 09, 2023
Figure 1 for Embodied Executable Policy Learning with Language-based Scene Summarization
Figure 2 for Embodied Executable Policy Learning with Language-based Scene Summarization
Figure 3 for Embodied Executable Policy Learning with Language-based Scene Summarization
Figure 4 for Embodied Executable Policy Learning with Language-based Scene Summarization
Viaarxiv icon

Normalized Contrastive Learning for Text-Video Retrieval

Add code
Nov 30, 2022
Figure 1 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 2 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 3 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 4 for Normalized Contrastive Learning for Text-Video Retrieval
Viaarxiv icon

Navigating Connected Memories with a Task-oriented Dialog System

Add code
Nov 15, 2022
Figure 1 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 2 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 3 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 4 for Navigating Connected Memories with a Task-oriented Dialog System
Viaarxiv icon

Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation

Add code
Nov 08, 2022
Figure 1 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Figure 2 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Figure 3 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Figure 4 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Viaarxiv icon

IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

Add code
Oct 26, 2022
Figure 1 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Figure 2 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Figure 3 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Figure 4 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Viaarxiv icon

Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks

Add code
Oct 10, 2022
Figure 1 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Figure 2 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Figure 3 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Figure 4 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Viaarxiv icon

KETOD: Knowledge-Enriched Task-Oriented Dialogue

Add code
May 11, 2022
Figure 1 for KETOD: Knowledge-Enriched Task-Oriented Dialogue
Figure 2 for KETOD: Knowledge-Enriched Task-Oriented Dialogue
Figure 3 for KETOD: Knowledge-Enriched Task-Oriented Dialogue
Figure 4 for KETOD: Knowledge-Enriched Task-Oriented Dialogue
Viaarxiv icon