Picture for Yizhuo Li

Yizhuo Li

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Add code
Dec 03, 2023
Figure 1 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 2 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 3 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 4 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Viaarxiv icon

Harvest Video Foundation Models via Efficient Post-Pretraining

Add code
Oct 30, 2023
Figure 1 for Harvest Video Foundation Models via Efficient Post-Pretraining
Figure 2 for Harvest Video Foundation Models via Efficient Post-Pretraining
Figure 3 for Harvest Video Foundation Models via Efficient Post-Pretraining
Figure 4 for Harvest Video Foundation Models via Efficient Post-Pretraining
Viaarxiv icon

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Add code
Jul 13, 2023
Figure 1 for InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Figure 2 for InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Figure 3 for InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Figure 4 for InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Viaarxiv icon

VideoChat: Chat-Centric Video Understanding

Add code
May 10, 2023
Figure 1 for VideoChat: Chat-Centric Video Understanding
Figure 2 for VideoChat: Chat-Centric Video Understanding
Figure 3 for VideoChat: Chat-Centric Video Understanding
Figure 4 for VideoChat: Chat-Centric Video Understanding
Viaarxiv icon

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Add code
Mar 28, 2023
Figure 1 for Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Figure 2 for Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Figure 3 for Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Figure 4 for Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Viaarxiv icon

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

Add code
Dec 07, 2022
Figure 1 for InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Figure 2 for InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Figure 3 for InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Figure 4 for InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Viaarxiv icon

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

Add code
Nov 17, 2022
Figure 1 for InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Figure 2 for InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Figure 3 for InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Figure 4 for InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Viaarxiv icon

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

Add code
Nov 17, 2022
Figure 1 for UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Figure 2 for UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Figure 3 for UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Figure 4 for UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Viaarxiv icon

HAKE: A Knowledge Engine Foundation for Human Activity Understanding

Add code
Feb 14, 2022
Figure 1 for HAKE: A Knowledge Engine Foundation for Human Activity Understanding
Figure 2 for HAKE: A Knowledge Engine Foundation for Human Activity Understanding
Figure 3 for HAKE: A Knowledge Engine Foundation for Human Activity Understanding
Figure 4 for HAKE: A Knowledge Engine Foundation for Human Activity Understanding
Viaarxiv icon

Test-Time Personalization with a Transformer for Human Pose Estimation

Add code
Jul 05, 2021
Figure 1 for Test-Time Personalization with a Transformer for Human Pose Estimation
Figure 2 for Test-Time Personalization with a Transformer for Human Pose Estimation
Figure 3 for Test-Time Personalization with a Transformer for Human Pose Estimation
Figure 4 for Test-Time Personalization with a Transformer for Human Pose Estimation
Viaarxiv icon