Picture for Kunchang Li

Kunchang Li

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model

Add code
Jul 09, 2024
Viaarxiv icon

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Add code
Mar 22, 2024
Viaarxiv icon

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Add code
Mar 14, 2024
Figure 1 for Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Figure 2 for Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Figure 3 for Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Figure 4 for Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Viaarxiv icon

VideoMamba: State Space Model for Efficient Video Understanding

Add code
Mar 12, 2024
Figure 1 for VideoMamba: State Space Model for Efficient Video Understanding
Figure 2 for VideoMamba: State Space Model for Efficient Video Understanding
Figure 3 for VideoMamba: State Space Model for Efficient Video Understanding
Figure 4 for VideoMamba: State Space Model for Efficient Video Understanding
Viaarxiv icon

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition

Add code
Feb 29, 2024
Figure 1 for Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Figure 2 for Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Figure 3 for Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Figure 4 for Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Viaarxiv icon

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Add code
Jan 29, 2024
Figure 1 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 2 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 3 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 4 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Viaarxiv icon

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Add code
Jan 25, 2024
Viaarxiv icon

Vlogger: Make Your Dream A Vlog

Add code
Jan 17, 2024
Viaarxiv icon

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Add code
Dec 03, 2023
Figure 1 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 2 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 3 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 4 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Viaarxiv icon

Harvest Video Foundation Models via Efficient Post-Pretraining

Add code
Oct 30, 2023
Figure 1 for Harvest Video Foundation Models via Efficient Post-Pretraining
Figure 2 for Harvest Video Foundation Models via Efficient Post-Pretraining
Figure 3 for Harvest Video Foundation Models via Efficient Post-Pretraining
Figure 4 for Harvest Video Foundation Models via Efficient Post-Pretraining
Viaarxiv icon