Alert button
Picture for Seungwhan Moon

Seungwhan Moon

Alert button

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

Mar 07, 2024
Jielin Qiu, Andrea Madotto, Zhaojiang Lin, Paul A. Crook, Yifan Ethan Xu, Xin Luna Dong, Christos Faloutsos, Lei Li, Babak Damavandi, Seungwhan Moon

Figure 1 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 2 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 3 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Figure 4 for SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Viaarxiv icon

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Feb 16, 2024
Zekun Li, Zhiyu Zoey Chen, Mike Ross, Patrick Huber, Seungwhan Moon, Zhaojiang Lin, Xin Luna Dong, Adithya Sagar, Xifeng Yan, Paul A. Crook

Viaarxiv icon

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Sep 27, 2023
Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Tushar Nagarajan, Matt Smith, Shashank Jain, Chun-Fu Yeh, Prakash Murugesan, Peyman Heidari, Yue Liu, Kavya Srinet, Babak Damavandi, Anuj Kumar

Viaarxiv icon

Embodied Executable Policy Learning with Language-based Scene Summarization

Jun 09, 2023
Jielin Qiu, Mengdi Xu, William Han, Seungwhan Moon, Ding Zhao

Figure 1 for Embodied Executable Policy Learning with Language-based Scene Summarization
Figure 2 for Embodied Executable Policy Learning with Language-based Scene Summarization
Figure 3 for Embodied Executable Policy Learning with Language-based Scene Summarization
Figure 4 for Embodied Executable Policy Learning with Language-based Scene Summarization
Viaarxiv icon

Normalized Contrastive Learning for Text-Video Retrieval

Nov 30, 2022
Yookoon Park, Mahmoud Azab, Bo Xiong, Seungwhan Moon, Florian Metze, Gourab Kundu, Kirmani Ahmed

Figure 1 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 2 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 3 for Normalized Contrastive Learning for Text-Video Retrieval
Figure 4 for Normalized Contrastive Learning for Text-Video Retrieval
Viaarxiv icon

Navigating Connected Memories with a Task-oriented Dialog System

Nov 15, 2022
Seungwhan Moon, Satwik Kottur, Alborz Geramifard, Babak Damavandi

Figure 1 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 2 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 3 for Navigating Connected Memories with a Task-oriented Dialog System
Figure 4 for Navigating Connected Memories with a Task-oriented Dialog System
Viaarxiv icon

Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation

Nov 08, 2022
Satwik Kottur, Seungwhan Moon, Aram H. Markosyan, Hardik Shah, Babak Damavandi, Alborz Geramifard

Figure 1 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Figure 2 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Figure 3 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Figure 4 for Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Viaarxiv icon

IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

Oct 26, 2022
Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Alireza Dirafzoon, Aparajita Saraf, Amy Bearman, Babak Damavandi

Figure 1 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Figure 2 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Figure 3 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Figure 4 for IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Viaarxiv icon

Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks

Oct 10, 2022
Pedro Rodriguez, Mahmoud Azab, Becka Silvert, Renato Sanchez, Linzy Labson, Hardik Shah, Seungwhan Moon

Figure 1 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Figure 2 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Figure 3 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Figure 4 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Viaarxiv icon

KETOD: Knowledge-Enriched Task-Oriented Dialogue

May 11, 2022
Zhiyu Chen, Bing Liu, Seungwhan Moon, Chinnadhurai Sankar, Paul Crook, William Yang Wang

Figure 1 for KETOD: Knowledge-Enriched Task-Oriented Dialogue
Figure 2 for KETOD: Knowledge-Enriched Task-Oriented Dialogue
Figure 3 for KETOD: Knowledge-Enriched Task-Oriented Dialogue
Figure 4 for KETOD: Knowledge-Enriched Task-Oriented Dialogue
Viaarxiv icon