Alert button
Picture for Ronghang Hu

Ronghang Hu

Alert button

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Add code
Bookmark button
Alert button
Jan 02, 2023
Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie

Figure 1 for ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Figure 2 for ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Figure 3 for ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Figure 4 for ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Viaarxiv icon

UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding

Add code
Bookmark button
Alert button
Dec 01, 2022
Dave Zhenyu Chen, Ronghang Hu, Xinlei Chen, Matthias Nießner, Angel X. Chang

Figure 1 for UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
Figure 2 for UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
Figure 3 for UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
Figure 4 for UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
Viaarxiv icon

Scaling Language-Image Pre-training via Masking

Add code
Bookmark button
Alert button
Dec 01, 2022
Yanghao Li, Haoqi Fan, Ronghang Hu, Christoph Feichtenhofer, Kaiming He

Figure 1 for Scaling Language-Image Pre-training via Masking
Figure 2 for Scaling Language-Image Pre-training via Masking
Figure 3 for Scaling Language-Image Pre-training via Masking
Figure 4 for Scaling Language-Image Pre-training via Masking
Viaarxiv icon

Exploring Long-Sequence Masked Autoencoders

Add code
Bookmark button
Alert button
Oct 13, 2022
Ronghang Hu, Shoubhik Debnath, Saining Xie, Xinlei Chen

Figure 1 for Exploring Long-Sequence Masked Autoencoders
Figure 2 for Exploring Long-Sequence Masked Autoencoders
Figure 3 for Exploring Long-Sequence Masked Autoencoders
Figure 4 for Exploring Long-Sequence Masked Autoencoders
Viaarxiv icon

FLAVA: A Foundational Language And Vision Alignment Model

Add code
Bookmark button
Alert button
Dec 08, 2021
Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, Douwe Kiela

Figure 1 for FLAVA: A Foundational Language And Vision Alignment Model
Figure 2 for FLAVA: A Foundational Language And Vision Alignment Model
Figure 3 for FLAVA: A Foundational Language And Vision Alignment Model
Figure 4 for FLAVA: A Foundational Language And Vision Alignment Model
Viaarxiv icon

Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

Add code
Bookmark button
Alert button
Feb 22, 2021
Ronghang Hu, Amanpreet Singh

Figure 1 for Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer
Figure 2 for Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer
Figure 3 for Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer
Figure 4 for Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer
Viaarxiv icon

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image

Add code
Bookmark button
Alert button
Dec 17, 2020
Ronghang Hu, Deepak Pathak

Figure 1 for Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image
Figure 2 for Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image
Figure 3 for Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image
Figure 4 for Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image
Viaarxiv icon

TextCaps: a Dataset for Image Captioning with Reading Comprehension

Add code
Bookmark button
Alert button
Mar 24, 2020
Oleksii Sidorov, Ronghang Hu, Marcus Rohrbach, Amanpreet Singh

Figure 1 for TextCaps: a Dataset for Image Captioning with Reading Comprehension
Figure 2 for TextCaps: a Dataset for Image Captioning with Reading Comprehension
Figure 3 for TextCaps: a Dataset for Image Captioning with Reading Comprehension
Figure 4 for TextCaps: a Dataset for Image Captioning with Reading Comprehension
Viaarxiv icon

Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA

Add code
Bookmark button
Alert button
Dec 05, 2019
Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach

Figure 1 for Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Figure 2 for Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Figure 3 for Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Figure 4 for Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Viaarxiv icon