Picture for Yanghao Li

Yanghao Li

MAViL: Masked Audio-Video Learners

Add code
Dec 15, 2022
Figure 1 for MAViL: Masked Audio-Video Learners
Figure 2 for MAViL: Masked Audio-Video Learners
Figure 3 for MAViL: Masked Audio-Video Learners
Figure 4 for MAViL: Masked Audio-Video Learners
Viaarxiv icon

Scaling Language-Image Pre-training via Masking

Add code
Dec 01, 2022
Figure 1 for Scaling Language-Image Pre-training via Masking
Figure 2 for Scaling Language-Image Pre-training via Masking
Figure 3 for Scaling Language-Image Pre-training via Masking
Figure 4 for Scaling Language-Image Pre-training via Masking
Viaarxiv icon

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization

Add code
Nov 18, 2022
Figure 1 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Figure 2 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Figure 3 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Figure 4 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Viaarxiv icon

Bit Allocation using Optimization

Add code
Sep 20, 2022
Figure 1 for Bit Allocation using Optimization
Figure 2 for Bit Allocation using Optimization
Figure 3 for Bit Allocation using Optimization
Figure 4 for Bit Allocation using Optimization
Viaarxiv icon

Negative Frames Matter in Egocentric Visual Query 2D Localization

Add code
Aug 03, 2022
Figure 1 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Figure 2 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Figure 3 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Figure 4 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Viaarxiv icon

Masked Autoencoders As Spatiotemporal Learners

Add code
May 18, 2022
Figure 1 for Masked Autoencoders As Spatiotemporal Learners
Figure 2 for Masked Autoencoders As Spatiotemporal Learners
Figure 3 for Masked Autoencoders As Spatiotemporal Learners
Figure 4 for Masked Autoencoders As Spatiotemporal Learners
Viaarxiv icon

Exploring Plain Vision Transformer Backbones for Object Detection

Add code
Mar 30, 2022
Figure 1 for Exploring Plain Vision Transformer Backbones for Object Detection
Figure 2 for Exploring Plain Vision Transformer Backbones for Object Detection
Figure 3 for Exploring Plain Vision Transformer Backbones for Object Detection
Figure 4 for Exploring Plain Vision Transformer Backbones for Object Detection
Viaarxiv icon

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

Add code
Jan 20, 2022
Figure 1 for MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Figure 2 for MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Figure 3 for MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Figure 4 for MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Viaarxiv icon

Improved Multiscale Vision Transformers for Classification and Detection

Add code
Dec 02, 2021
Figure 1 for Improved Multiscale Vision Transformers for Classification and Detection
Figure 2 for Improved Multiscale Vision Transformers for Classification and Detection
Figure 3 for Improved Multiscale Vision Transformers for Classification and Detection
Figure 4 for Improved Multiscale Vision Transformers for Classification and Detection
Viaarxiv icon

Masked Autoencoders Are Scalable Vision Learners

Add code
Dec 02, 2021
Figure 1 for Masked Autoencoders Are Scalable Vision Learners
Figure 2 for Masked Autoencoders Are Scalable Vision Learners
Figure 3 for Masked Autoencoders Are Scalable Vision Learners
Figure 4 for Masked Autoencoders Are Scalable Vision Learners
Viaarxiv icon