Picture for Jinguo Zhu

Jinguo Zhu

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

Add code
Oct 21, 2024
Viaarxiv icon

Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection

Add code
Jul 27, 2024
Viaarxiv icon

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

Add code
Jun 11, 2024
Figure 1 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 2 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 3 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 4 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Viaarxiv icon

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Add code
Apr 22, 2024
Viaarxiv icon

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

Add code
Dec 14, 2023
Viaarxiv icon

VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models

Add code
Oct 07, 2023
Viaarxiv icon

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Add code
Nov 17, 2022
Viaarxiv icon

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

Add code
Jun 09, 2022
Figure 1 for Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Figure 2 for Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Figure 3 for Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Figure 4 for Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Viaarxiv icon

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Add code
Dec 02, 2021
Figure 1 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Figure 2 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Figure 3 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Figure 4 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Viaarxiv icon

Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification

Add code
May 26, 2021
Figure 1 for Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification
Figure 2 for Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification
Figure 3 for Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification
Figure 4 for Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification
Viaarxiv icon