Picture for Xiaojie Jin

Xiaojie Jin

VCoME: Verbal Video Composition with Multimodal Editing Effects

Add code
Jul 05, 2024
Viaarxiv icon

Hierarchical Memory for Long Video QA

Add code
Jun 30, 2024
Viaarxiv icon

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

Add code
Jun 12, 2024
Figure 1 for Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
Figure 2 for Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
Figure 3 for Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
Figure 4 for Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
Viaarxiv icon

The SkatingVerse Workshop & Challenge: Methods and Results

Add code
May 27, 2024
Figure 1 for The SkatingVerse Workshop & Challenge: Methods and Results
Figure 2 for The SkatingVerse Workshop & Challenge: Methods and Results
Viaarxiv icon

Video Recognition in Portrait Mode

Add code
Dec 21, 2023
Viaarxiv icon

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

Add code
Dec 12, 2023
Viaarxiv icon

PixelLM: Pixel Reasoning with Large Multimodal Model

Add code
Dec 04, 2023
Figure 1 for PixelLM: Pixel Reasoning with Large Multimodal Model
Figure 2 for PixelLM: Pixel Reasoning with Large Multimodal Model
Figure 3 for PixelLM: Pixel Reasoning with Large Multimodal Model
Figure 4 for PixelLM: Pixel Reasoning with Large Multimodal Model
Viaarxiv icon

Selective Feature Adapter for Dense Vision Transformers

Add code
Oct 03, 2023
Figure 1 for Selective Feature Adapter for Dense Vision Transformers
Figure 2 for Selective Feature Adapter for Dense Vision Transformers
Figure 3 for Selective Feature Adapter for Dense Vision Transformers
Figure 4 for Selective Feature Adapter for Dense Vision Transformers
Viaarxiv icon

Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling

Add code
Aug 17, 2023
Figure 1 for Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
Figure 2 for Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
Figure 3 for Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
Figure 4 for Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
Viaarxiv icon

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Add code
Jun 15, 2023
Figure 1 for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Figure 2 for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Figure 3 for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Figure 4 for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Viaarxiv icon