Picture for Wooyoung Kang

Wooyoung Kang

Honeybee: Locality-enhanced Projector for Multimodal LLM

Add code
Dec 11, 2023
Figure 1 for Honeybee: Locality-enhanced Projector for Multimodal LLM
Figure 2 for Honeybee: Locality-enhanced Projector for Multimodal LLM
Figure 3 for Honeybee: Locality-enhanced Projector for Multimodal LLM
Figure 4 for Honeybee: Locality-enhanced Projector for Multimodal LLM
Viaarxiv icon

Large Language Models are Temporal and Causal Reasoners for Video Question Answering

Add code
Nov 06, 2023
Figure 1 for Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Figure 2 for Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Figure 3 for Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Figure 4 for Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Viaarxiv icon

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Add code
Sep 11, 2023
Figure 1 for NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Figure 2 for NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Figure 3 for NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Figure 4 for NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Viaarxiv icon

Open-Vocabulary Object Detection using Pseudo Caption Labels

Add code
Mar 23, 2023
Figure 1 for Open-Vocabulary Object Detection using Pseudo Caption Labels
Figure 2 for Open-Vocabulary Object Detection using Pseudo Caption Labels
Figure 3 for Open-Vocabulary Object Detection using Pseudo Caption Labels
Figure 4 for Open-Vocabulary Object Detection using Pseudo Caption Labels
Viaarxiv icon

Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

Add code
Dec 27, 2022
Figure 1 for Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Figure 2 for Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Figure 3 for Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Figure 4 for Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Viaarxiv icon

Dense but Efficient VideoQA for Intricate Compositional Reasoning

Add code
Oct 19, 2022
Figure 1 for Dense but Efficient VideoQA for Intricate Compositional Reasoning
Figure 2 for Dense but Efficient VideoQA for Intricate Compositional Reasoning
Figure 3 for Dense but Efficient VideoQA for Intricate Compositional Reasoning
Figure 4 for Dense but Efficient VideoQA for Intricate Compositional Reasoning
Viaarxiv icon