Picture for Huiyu Wang

Huiyu Wang

Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data

Add code
Jul 18, 2024
Viaarxiv icon

Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes

Add code
Apr 11, 2024
Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Nov 30, 2023
Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon

Diffusion Models as Masked Autoencoders

Add code
Apr 06, 2023
Figure 1 for Diffusion Models as Masked Autoencoders
Figure 2 for Diffusion Models as Masked Autoencoders
Figure 3 for Diffusion Models as Masked Autoencoders
Figure 4 for Diffusion Models as Masked Autoencoders
Viaarxiv icon

Ego-Only: Egocentric Action Detection without Exocentric Pretraining

Add code
Jan 03, 2023
Figure 1 for Ego-Only: Egocentric Action Detection without Exocentric Pretraining
Figure 2 for Ego-Only: Egocentric Action Detection without Exocentric Pretraining
Figure 3 for Ego-Only: Egocentric Action Detection without Exocentric Pretraining
Figure 4 for Ego-Only: Egocentric Action Detection without Exocentric Pretraining
Viaarxiv icon

Unleashing the Power of Visual Prompting At the Pixel Level

Add code
Dec 20, 2022
Figure 1 for Unleashing the Power of Visual Prompting At the Pixel Level
Figure 2 for Unleashing the Power of Visual Prompting At the Pixel Level
Figure 3 for Unleashing the Power of Visual Prompting At the Pixel Level
Figure 4 for Unleashing the Power of Visual Prompting At the Pixel Level
Viaarxiv icon

SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training

Add code
Nov 30, 2022
Figure 1 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Figure 2 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Figure 3 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Figure 4 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Viaarxiv icon

Masked Autoencoders Enable Efficient Knowledge Distillers

Add code
Aug 25, 2022
Figure 1 for Masked Autoencoders Enable Efficient Knowledge Distillers
Figure 2 for Masked Autoencoders Enable Efficient Knowledge Distillers
Figure 3 for Masked Autoencoders Enable Efficient Knowledge Distillers
Figure 4 for Masked Autoencoders Enable Efficient Knowledge Distillers
Viaarxiv icon

k-means Mask Transformer

Add code
Jul 08, 2022
Figure 1 for k-means Mask Transformer
Figure 2 for k-means Mask Transformer
Figure 3 for k-means Mask Transformer
Figure 4 for k-means Mask Transformer
Viaarxiv icon

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

Add code
Jun 17, 2022
Figure 1 for CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
Figure 2 for CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
Figure 3 for CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
Figure 4 for CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
Viaarxiv icon