Picture for Xuehan Xiong

Xuehan Xiong

Carnegie Mellon University

Streaming Dense Video Captioning

Add code
Apr 01, 2024
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

UnLoc: A Unified Framework for Video Localization Tasks

Add code
Aug 21, 2023
Figure 1 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 2 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 3 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 4 for UnLoc: A Unified Framework for Video Localization Tasks
Viaarxiv icon

End-to-End Spatio-Temporal Action Localisation with Video Transformers

Add code
Apr 24, 2023
Figure 1 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 2 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 3 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 4 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Viaarxiv icon

Beyond Transfer Learning: Co-finetuning for Action Localisation

Add code
Jul 08, 2022
Figure 1 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Figure 2 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Figure 3 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Figure 4 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Viaarxiv icon

M&M Mix: A Multimodal Multiview Transformer Ensemble

Add code
Jun 20, 2022
Figure 1 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Figure 2 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Figure 3 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Figure 4 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Viaarxiv icon

Multiview Transformers for Video Recognition

Add code
Jan 20, 2022
Figure 1 for Multiview Transformers for Video Recognition
Figure 2 for Multiview Transformers for Video Recognition
Figure 3 for Multiview Transformers for Video Recognition
Figure 4 for Multiview Transformers for Video Recognition
Viaarxiv icon

Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts

Add code
Jan 11, 2021
Figure 1 for Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Figure 2 for Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Figure 3 for Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Figure 4 for Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Viaarxiv icon

Spatial-Temporal Alignment Network for Action Recognition and Detection

Add code
Dec 04, 2020
Figure 1 for Spatial-Temporal Alignment Network for Action Recognition and Detection
Figure 2 for Spatial-Temporal Alignment Network for Action Recognition and Detection
Figure 3 for Spatial-Temporal Alignment Network for Action Recognition and Detection
Figure 4 for Spatial-Temporal Alignment Network for Action Recognition and Detection
Viaarxiv icon