Picture for Haoqi Fan

Haoqi Fan

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Add code
Jun 01, 2023
Figure 1 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Figure 2 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Figure 3 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Figure 4 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Viaarxiv icon

Diffusion Models as Masked Autoencoders

Add code
Apr 06, 2023
Figure 1 for Diffusion Models as Masked Autoencoders
Figure 2 for Diffusion Models as Masked Autoencoders
Figure 3 for Diffusion Models as Masked Autoencoders
Figure 4 for Diffusion Models as Masked Autoencoders
Viaarxiv icon

The effectiveness of MAE pre-pretraining for billion-scale pretraining

Mar 23, 2023
Figure 1 for The effectiveness of MAE pre-pretraining for billion-scale pretraining
Figure 2 for The effectiveness of MAE pre-pretraining for billion-scale pretraining
Figure 3 for The effectiveness of MAE pre-pretraining for billion-scale pretraining
Figure 4 for The effectiveness of MAE pre-pretraining for billion-scale pretraining
Viaarxiv icon

Reversible Vision Transformers

Add code
Feb 09, 2023
Figure 1 for Reversible Vision Transformers
Figure 2 for Reversible Vision Transformers
Figure 3 for Reversible Vision Transformers
Figure 4 for Reversible Vision Transformers
Viaarxiv icon

MAViL: Masked Audio-Video Learners

Add code
Dec 15, 2022
Figure 1 for MAViL: Masked Audio-Video Learners
Figure 2 for MAViL: Masked Audio-Video Learners
Figure 3 for MAViL: Masked Audio-Video Learners
Figure 4 for MAViL: Masked Audio-Video Learners
Viaarxiv icon

Scaling Language-Image Pre-training via Masking

Add code
Dec 01, 2022
Figure 1 for Scaling Language-Image Pre-training via Masking
Figure 2 for Scaling Language-Image Pre-training via Masking
Figure 3 for Scaling Language-Image Pre-training via Masking
Figure 4 for Scaling Language-Image Pre-training via Masking
Viaarxiv icon

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference

Nov 18, 2022
Figure 1 for Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
Figure 2 for Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
Figure 3 for Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
Figure 4 for Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
Viaarxiv icon

Masked Autoencoders As Spatiotemporal Learners

Add code
May 18, 2022
Figure 1 for Masked Autoencoders As Spatiotemporal Learners
Figure 2 for Masked Autoencoders As Spatiotemporal Learners
Figure 3 for Masked Autoencoders As Spatiotemporal Learners
Figure 4 for Masked Autoencoders As Spatiotemporal Learners
Viaarxiv icon

On the Importance of Asymmetry for Siamese Representation Learning

Add code
Apr 01, 2022
Figure 1 for On the Importance of Asymmetry for Siamese Representation Learning
Figure 2 for On the Importance of Asymmetry for Siamese Representation Learning
Figure 3 for On the Importance of Asymmetry for Siamese Representation Learning
Figure 4 for On the Importance of Asymmetry for Siamese Representation Learning
Viaarxiv icon

Unified Transformer Tracker for Object Tracking

Add code
Mar 29, 2022
Figure 1 for Unified Transformer Tracker for Object Tracking
Figure 2 for Unified Transformer Tracker for Object Tracking
Figure 3 for Unified Transformer Tracker for Object Tracking
Figure 4 for Unified Transformer Tracker for Object Tracking
Viaarxiv icon