Picture for Xiaojie Jin

Xiaojie Jin

The SkatingVerse Workshop & Challenge: Methods and Results

Add code
May 27, 2024
Figure 1 for The SkatingVerse Workshop & Challenge: Methods and Results
Figure 2 for The SkatingVerse Workshop & Challenge: Methods and Results
Viaarxiv icon

Video Recognition in Portrait Mode

Add code
Dec 21, 2023
Figure 1 for Video Recognition in Portrait Mode
Figure 2 for Video Recognition in Portrait Mode
Figure 3 for Video Recognition in Portrait Mode
Figure 4 for Video Recognition in Portrait Mode
Viaarxiv icon

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

Add code
Dec 12, 2023
Figure 1 for Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens
Figure 2 for Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens
Figure 3 for Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens
Figure 4 for Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens
Viaarxiv icon

PixelLM: Pixel Reasoning with Large Multimodal Model

Add code
Dec 04, 2023
Figure 1 for PixelLM: Pixel Reasoning with Large Multimodal Model
Figure 2 for PixelLM: Pixel Reasoning with Large Multimodal Model
Figure 3 for PixelLM: Pixel Reasoning with Large Multimodal Model
Figure 4 for PixelLM: Pixel Reasoning with Large Multimodal Model
Viaarxiv icon

Selective Feature Adapter for Dense Vision Transformers

Add code
Oct 03, 2023
Figure 1 for Selective Feature Adapter for Dense Vision Transformers
Figure 2 for Selective Feature Adapter for Dense Vision Transformers
Figure 3 for Selective Feature Adapter for Dense Vision Transformers
Figure 4 for Selective Feature Adapter for Dense Vision Transformers
Viaarxiv icon

Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling

Add code
Aug 17, 2023
Viaarxiv icon

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Add code
Jun 15, 2023
Viaarxiv icon

Delving Deeper into Data Scaling in Masked Image Modeling

Add code
May 24, 2023
Figure 1 for Delving Deeper into Data Scaling in Masked Image Modeling
Figure 2 for Delving Deeper into Data Scaling in Masked Image Modeling
Figure 3 for Delving Deeper into Data Scaling in Masked Image Modeling
Figure 4 for Delving Deeper into Data Scaling in Masked Image Modeling
Viaarxiv icon

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

Add code
May 22, 2023
Viaarxiv icon

Multimodal Video Adapter for Parameter Efficient Video Text Retrieval

Add code
Jan 19, 2023
Figure 1 for Multimodal Video Adapter for Parameter Efficient Video Text Retrieval
Figure 2 for Multimodal Video Adapter for Parameter Efficient Video Text Retrieval
Figure 3 for Multimodal Video Adapter for Parameter Efficient Video Text Retrieval
Figure 4 for Multimodal Video Adapter for Parameter Efficient Video Text Retrieval
Viaarxiv icon