Picture for Rohit Girdhar

Rohit Girdhar

Jack

Omnivore: A Single Model for Many Visual Modalities

Add code
Jan 20, 2022
Figure 1 for Omnivore: A Single Model for Many Visual Modalities
Figure 2 for Omnivore: A Single Model for Many Visual Modalities
Figure 3 for Omnivore: A Single Model for Many Visual Modalities
Figure 4 for Omnivore: A Single Model for Many Visual Modalities
Viaarxiv icon

Detecting Twenty-thousand Classes using Image-level Supervision

Add code
Jan 10, 2022
Figure 1 for Detecting Twenty-thousand Classes using Image-level Supervision
Figure 2 for Detecting Twenty-thousand Classes using Image-level Supervision
Figure 3 for Detecting Twenty-thousand Classes using Image-level Supervision
Figure 4 for Detecting Twenty-thousand Classes using Image-level Supervision
Viaarxiv icon

Mask2Former for Video Instance Segmentation

Add code
Dec 20, 2021
Figure 1 for Mask2Former for Video Instance Segmentation
Figure 2 for Mask2Former for Video Instance Segmentation
Figure 3 for Mask2Former for Video Instance Segmentation
Viaarxiv icon

Masked-attention Mask Transformer for Universal Image Segmentation

Add code
Dec 10, 2021
Figure 1 for Masked-attention Mask Transformer for Universal Image Segmentation
Figure 2 for Masked-attention Mask Transformer for Universal Image Segmentation
Figure 3 for Masked-attention Mask Transformer for Universal Image Segmentation
Figure 4 for Masked-attention Mask Transformer for Universal Image Segmentation
Viaarxiv icon

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Add code
Oct 13, 2021
Figure 1 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Figure 2 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Figure 3 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Figure 4 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Viaarxiv icon

An End-to-End Transformer Model for 3D Object Detection

Add code
Sep 16, 2021
Figure 1 for An End-to-End Transformer Model for 3D Object Detection
Figure 2 for An End-to-End Transformer Model for 3D Object Detection
Figure 3 for An End-to-End Transformer Model for 3D Object Detection
Figure 4 for An End-to-End Transformer Model for 3D Object Detection
Viaarxiv icon

Anticipative Video Transformer

Add code
Jun 03, 2021
Figure 1 for Anticipative Video Transformer
Figure 2 for Anticipative Video Transformer
Figure 3 for Anticipative Video Transformer
Figure 4 for Anticipative Video Transformer
Viaarxiv icon

3D Spatial Recognition without Spatially Labeled 3D

Add code
May 13, 2021
Figure 1 for 3D Spatial Recognition without Spatially Labeled 3D
Figure 2 for 3D Spatial Recognition without Spatially Labeled 3D
Figure 3 for 3D Spatial Recognition without Spatially Labeled 3D
Figure 4 for 3D Spatial Recognition without Spatially Labeled 3D
Viaarxiv icon

Physical Reasoning Using Dynamics-Aware Models

Add code
Feb 20, 2021
Figure 1 for Physical Reasoning Using Dynamics-Aware Models
Figure 2 for Physical Reasoning Using Dynamics-Aware Models
Figure 3 for Physical Reasoning Using Dynamics-Aware Models
Figure 4 for Physical Reasoning Using Dynamics-Aware Models
Viaarxiv icon

Self-Supervised Pretraining of 3D Features on any Point-Cloud

Add code
Jan 07, 2021
Figure 1 for Self-Supervised Pretraining of 3D Features on any Point-Cloud
Figure 2 for Self-Supervised Pretraining of 3D Features on any Point-Cloud
Figure 3 for Self-Supervised Pretraining of 3D Features on any Point-Cloud
Figure 4 for Self-Supervised Pretraining of 3D Features on any Point-Cloud
Viaarxiv icon