Alert button
Picture for Weicheng Kuo

Weicheng Kuo

Alert button

3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation

Jan 04, 2024
Zihao Xiao, Longlong Jing, Shangxuan Wu, Alex Zihao Zhu, Jingwei Ji, Chiyu Max Jiang, Wei-Chih Hung, Thomas Funkhouser, Weicheng Kuo, Anelia Angelova, Yin Zhou, Shiwei Sheng

Viaarxiv icon

Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection

Sep 29, 2023
Dahun Kim, Anelia Angelova, Weicheng Kuo

Viaarxiv icon

Contrastive Feature Masking Open-Vocabulary Vision Transformer

Sep 02, 2023
Dahun Kim, Anelia Angelova, Weicheng Kuo

Figure 1 for Contrastive Feature Masking Open-Vocabulary Vision Transformer
Figure 2 for Contrastive Feature Masking Open-Vocabulary Vision Transformer
Figure 3 for Contrastive Feature Masking Open-Vocabulary Vision Transformer
Figure 4 for Contrastive Feature Masking Open-Vocabulary Vision Transformer
Viaarxiv icon

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Jun 02, 2023
Xiuye Gu, Yin Cui, Jonathan Huang, Abdullah Rashwan, Xuan Yang, Xingyi Zhou, Golnaz Ghiasi, Weicheng Kuo, Huizhong Chen, Liang-Chieh Chen, David A Ross

Figure 1 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 2 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 3 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 4 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Viaarxiv icon

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

May 11, 2023
Dahun Kim, Anelia Angelova, Weicheng Kuo

Figure 1 for Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Figure 2 for Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Figure 3 for Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Figure 4 for Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Viaarxiv icon

RECLIP: Resource-efficient CLIP by Training with Small Images

Apr 12, 2023
Runze Li, Dahun Kim, Bir Bhanu, Weicheng Kuo

Figure 1 for RECLIP: Resource-efficient CLIP by Training with Small Images
Figure 2 for RECLIP: Resource-efficient CLIP by Training with Small Images
Figure 3 for RECLIP: Resource-efficient CLIP by Training with Small Images
Figure 4 for RECLIP: Resource-efficient CLIP by Training with Small Images
Viaarxiv icon

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Mar 30, 2023
Weicheng Kuo, AJ Piergiovanni, Dahun Kim, Xiyang Luo, Ben Caine, Wei Li, Abhijit Ogale, Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, Anelia Angelova

Figure 1 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Figure 2 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Figure 3 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Figure 4 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Viaarxiv icon

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

Dec 06, 2022
AJ Piergiovanni, Weicheng Kuo, Anelia Angelova

Figure 1 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Figure 2 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Figure 3 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Figure 4 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Viaarxiv icon

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Sep 30, 2022
Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova

Figure 1 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 2 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 3 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 4 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Viaarxiv icon

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Sep 16, 2022
Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 2 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 3 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 4 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Viaarxiv icon