Picture for Xitong Yang

Xitong Yang

Video ReCap: Recursive Captioning of Hour-Long Videos

Feb 28, 2024
Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Nov 30, 2023
Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon

Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

Add code
Oct 08, 2023
Figure 1 for Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Figure 2 for Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Figure 3 for Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Figure 4 for Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Viaarxiv icon

Towards Scalable Neural Representation for Diverse Videos

Add code
Mar 24, 2023
Figure 1 for Towards Scalable Neural Representation for Diverse Videos
Figure 2 for Towards Scalable Neural Representation for Diverse Videos
Figure 3 for Towards Scalable Neural Representation for Diverse Videos
Figure 4 for Towards Scalable Neural Representation for Diverse Videos
Viaarxiv icon

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

Feb 16, 2023
Figure 1 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Figure 2 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Figure 3 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Figure 4 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Viaarxiv icon

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization

Add code
Feb 01, 2023
Figure 1 for Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
Figure 2 for Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
Figure 3 for Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
Figure 4 for Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
Viaarxiv icon

Vision Transformers Are Good Mask Auto-Labelers

Add code
Jan 10, 2023
Figure 1 for Vision Transformers Are Good Mask Auto-Labelers
Figure 2 for Vision Transformers Are Good Mask Auto-Labelers
Figure 3 for Vision Transformers Are Good Mask Auto-Labelers
Figure 4 for Vision Transformers Are Good Mask Auto-Labelers
Viaarxiv icon

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

Add code
Mar 29, 2022
Figure 1 for ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Figure 2 for ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Figure 3 for ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Figure 4 for ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Viaarxiv icon

Efficient Video Transformers with Spatial-Temporal Token Selection

Add code
Nov 23, 2021
Figure 1 for Efficient Video Transformers with Spatial-Temporal Token Selection
Figure 2 for Efficient Video Transformers with Spatial-Temporal Token Selection
Figure 3 for Efficient Video Transformers with Spatial-Temporal Token Selection
Figure 4 for Efficient Video Transformers with Spatial-Temporal Token Selection
Viaarxiv icon

Semi-Supervised Vision Transformers

Add code
Nov 22, 2021
Figure 1 for Semi-Supervised Vision Transformers
Figure 2 for Semi-Supervised Vision Transformers
Figure 3 for Semi-Supervised Vision Transformers
Figure 4 for Semi-Supervised Vision Transformers
Viaarxiv icon