Picture for Can Qin

Can Qin

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Add code
Oct 21, 2024
Viaarxiv icon

Triple Point Masking

Add code
Sep 26, 2024
Viaarxiv icon

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Add code
Aug 22, 2024
Figure 1 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 2 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 3 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 4 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Viaarxiv icon

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Add code
Aug 16, 2024
Figure 1 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 2 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 3 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 4 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Viaarxiv icon

STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical

Add code
Jun 28, 2024
Viaarxiv icon

MuseumMaker: Continual Style Customization without Catastrophic Forgetting

Add code
Apr 29, 2024
Viaarxiv icon

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Add code
Mar 17, 2024
Viaarxiv icon

M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking

Add code
Dec 11, 2023
Viaarxiv icon

Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection

Add code
Aug 13, 2023
Viaarxiv icon

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

Add code
May 25, 2023
Viaarxiv icon