Picture for Xitong Yang

Xitong Yang

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization

Add code
Feb 01, 2023
Figure 1 for Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
Figure 2 for Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
Figure 3 for Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
Figure 4 for Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
Viaarxiv icon

Vision Transformers Are Good Mask Auto-Labelers

Add code
Jan 10, 2023
Viaarxiv icon

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

Add code
Mar 29, 2022
Figure 1 for ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Figure 2 for ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Figure 3 for ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Figure 4 for ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Viaarxiv icon

Efficient Video Transformers with Spatial-Temporal Token Selection

Add code
Nov 23, 2021
Figure 1 for Efficient Video Transformers with Spatial-Temporal Token Selection
Figure 2 for Efficient Video Transformers with Spatial-Temporal Token Selection
Figure 3 for Efficient Video Transformers with Spatial-Temporal Token Selection
Figure 4 for Efficient Video Transformers with Spatial-Temporal Token Selection
Viaarxiv icon

Semi-Supervised Vision Transformers

Add code
Nov 22, 2021
Figure 1 for Semi-Supervised Vision Transformers
Figure 2 for Semi-Supervised Vision Transformers
Figure 3 for Semi-Supervised Vision Transformers
Figure 4 for Semi-Supervised Vision Transformers
Viaarxiv icon

Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories

Add code
Apr 02, 2021
Figure 1 for Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories
Figure 2 for Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories
Figure 3 for Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories
Figure 4 for Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories
Viaarxiv icon

GTA: Global Temporal Attention for Video Action Understanding

Add code
Dec 15, 2020
Figure 1 for GTA: Global Temporal Attention for Video Action Understanding
Figure 2 for GTA: Global Temporal Attention for Video Action Understanding
Figure 3 for GTA: Global Temporal Attention for Video Action Understanding
Figure 4 for GTA: Global Temporal Attention for Video Action Understanding
Viaarxiv icon

Hierarchical Contrastive Motion Learning for Video Action Recognition

Add code
Jul 20, 2020
Figure 1 for Hierarchical Contrastive Motion Learning for Video Action Recognition
Figure 2 for Hierarchical Contrastive Motion Learning for Video Action Recognition
Figure 3 for Hierarchical Contrastive Motion Learning for Video Action Recognition
Figure 4 for Hierarchical Contrastive Motion Learning for Video Action Recognition
Viaarxiv icon

A Generic Visualization Approach for Convolutional Neural Networks

Add code
Jul 19, 2020
Figure 1 for A Generic Visualization Approach for Convolutional Neural Networks
Figure 2 for A Generic Visualization Approach for Convolutional Neural Networks
Figure 3 for A Generic Visualization Approach for Convolutional Neural Networks
Figure 4 for A Generic Visualization Approach for Convolutional Neural Networks
Viaarxiv icon

Cross-X Learning for Fine-Grained Visual Categorization

Add code
Sep 10, 2019
Figure 1 for Cross-X Learning for Fine-Grained Visual Categorization
Figure 2 for Cross-X Learning for Fine-Grained Visual Categorization
Figure 3 for Cross-X Learning for Fine-Grained Visual Categorization
Figure 4 for Cross-X Learning for Fine-Grained Visual Categorization
Viaarxiv icon