Picture for Boqing Gong

Boqing Gong

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

Distilling Vision-Language Models on Millions of Videos

Add code
Jan 11, 2024
Figure 1 for Distilling Vision-Language Models on Millions of Videos
Figure 2 for Distilling Vision-Language Models on Millions of Videos
Figure 3 for Distilling Vision-Language Models on Millions of Videos
Figure 4 for Distilling Vision-Language Models on Millions of Videos
Viaarxiv icon

Instruct-Imagen: Image Generation with Multi-modal Instruction

Add code
Jan 03, 2024
Figure 1 for Instruct-Imagen: Image Generation with Multi-modal Instruction
Figure 2 for Instruct-Imagen: Image Generation with Multi-modal Instruction
Figure 3 for Instruct-Imagen: Image Generation with Multi-modal Instruction
Figure 4 for Instruct-Imagen: Image Generation with Multi-modal Instruction
Viaarxiv icon

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Add code
Nov 10, 2023
Figure 1 for Towards A Unified Neural Architecture for Visual Recognition and Reasoning
Figure 2 for Towards A Unified Neural Architecture for Visual Recognition and Reasoning
Figure 3 for Towards A Unified Neural Architecture for Visual Recognition and Reasoning
Figure 4 for Towards A Unified Neural Architecture for Visual Recognition and Reasoning
Viaarxiv icon

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Add code
Oct 09, 2023
Viaarxiv icon

Module-wise Adaptive Distillation for Multimodality Foundation Models

Add code
Oct 06, 2023
Figure 1 for Module-wise Adaptive Distillation for Multimodality Foundation Models
Figure 2 for Module-wise Adaptive Distillation for Multimodality Foundation Models
Figure 3 for Module-wise Adaptive Distillation for Multimodality Foundation Models
Figure 4 for Module-wise Adaptive Distillation for Multimodality Foundation Models
Viaarxiv icon

Multi-modal Domain Adaptation for REG via Relation Transfer

Add code
Sep 23, 2023
Viaarxiv icon

Video Timeline Modeling For News Story Understanding

Add code
Sep 23, 2023
Figure 1 for Video Timeline Modeling For News Story Understanding
Figure 2 for Video Timeline Modeling For News Story Understanding
Figure 3 for Video Timeline Modeling For News Story Understanding
Figure 4 for Video Timeline Modeling For News Story Understanding
Viaarxiv icon

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Add code
Jul 06, 2023
Figure 1 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Figure 2 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Figure 3 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Figure 4 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Viaarxiv icon

Federated Learning of Shareable Bases for Personalization-Friendly Image Classification

Add code
Apr 16, 2023
Viaarxiv icon