Alert button
Picture for Paul Hongsuck Seo

Paul Hongsuck Seo

Alert button

Learning Correlation Structures for Vision Transformers

Add code
Bookmark button
Alert button
Apr 05, 2024
Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid, Minsu Cho

Viaarxiv icon

Zero-shot Referring Image Segmentation with Global-Local Context Features

Add code
Bookmark button
Alert button
Apr 03, 2023
Seonghoon Yu, Paul Hongsuck Seo, Jeany Son

Figure 1 for Zero-shot Referring Image Segmentation with Global-Local Context Features
Figure 2 for Zero-shot Referring Image Segmentation with Global-Local Context Features
Figure 3 for Zero-shot Referring Image Segmentation with Global-Local Context Features
Viaarxiv icon

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

Add code
Bookmark button
Alert button
Mar 29, 2023
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

Figure 1 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 2 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 3 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 4 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Viaarxiv icon

IFSeg: Image-free Semantic Segmentation via Vision-Language Model

Add code
Bookmark button
Alert button
Mar 25, 2023
Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin

Figure 1 for IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Figure 2 for IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Figure 3 for IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Figure 4 for IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Viaarxiv icon

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Add code
Bookmark button
Alert button
Mar 21, 2023
Seokju Cho, Heeseong Shin, Sunghwan Hong, Seungjun An, Seungjun Lee, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim

Figure 1 for CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Figure 2 for CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Figure 3 for CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Figure 4 for CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Viaarxiv icon

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Add code
Bookmark button
Alert button
Mar 21, 2023
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid

Figure 1 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 2 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 3 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 4 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Viaarxiv icon

AVATAR submission to the Ego4D AV Transcription Challenge

Add code
Bookmark button
Alert button
Nov 18, 2022
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

Figure 1 for AVATAR submission to the Ego4D AV Transcription Challenge
Figure 2 for AVATAR submission to the Ego4D AV Transcription Challenge
Figure 3 for AVATAR submission to the Ego4D AV Transcription Challenge
Figure 4 for AVATAR submission to the Ego4D AV Transcription Challenge
Viaarxiv icon

AVATAR: Unconstrained Audiovisual Speech Recognition

Add code
Bookmark button
Alert button
Jun 15, 2022
Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid

Figure 1 for AVATAR: Unconstrained Audiovisual Speech Recognition
Figure 2 for AVATAR: Unconstrained Audiovisual Speech Recognition
Figure 3 for AVATAR: Unconstrained Audiovisual Speech Recognition
Figure 4 for AVATAR: Unconstrained Audiovisual Speech Recognition
Viaarxiv icon

Learning Audio-Video Modalities from Image Captions

Add code
Bookmark button
Alert button
Apr 01, 2022
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid

Figure 1 for Learning Audio-Video Modalities from Image Captions
Figure 2 for Learning Audio-Video Modalities from Image Captions
Figure 3 for Learning Audio-Video Modalities from Image Captions
Figure 4 for Learning Audio-Video Modalities from Image Captions
Viaarxiv icon