Picture for Weiming Zhuang

Weiming Zhuang

On the Limits of Token Reduction for Efficient Unified Vision Language Training

Add code
May 31, 2026
Viaarxiv icon

VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations

Add code
Apr 27, 2026
Viaarxiv icon

Empirical Recipes for Efficient and Compact Vision-Language Models

Add code
Mar 17, 2026
Viaarxiv icon

UniCompress: Token Compression for Unified Vision-Language Understanding and Generation

Add code
Mar 11, 2026
Viaarxiv icon

UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models

Add code
Aug 27, 2025
Figure 1 for UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models
Figure 2 for UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models
Figure 3 for UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models
Figure 4 for UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models
Viaarxiv icon

Investigating the Design Space of Visual Grounding in Multimodal Large Language Model

Add code
Aug 11, 2025
Figure 1 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Figure 2 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Figure 3 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Figure 4 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Viaarxiv icon

Boundary Attention Constrained Zero-Shot Layout-To-Image Generation

Add code
Nov 15, 2024
Figure 1 for Boundary Attention Constrained Zero-Shot Layout-To-Image Generation
Figure 2 for Boundary Attention Constrained Zero-Shot Layout-To-Image Generation
Figure 3 for Boundary Attention Constrained Zero-Shot Layout-To-Image Generation
Figure 4 for Boundary Attention Constrained Zero-Shot Layout-To-Image Generation
Viaarxiv icon

Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models

Add code
Nov 01, 2024
Figure 1 for Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models
Figure 2 for Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models
Figure 3 for Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models
Figure 4 for Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models
Viaarxiv icon

A Simple Background Augmentation Method for Object Detection with Diffusion Model

Add code
Aug 01, 2024
Figure 1 for A Simple Background Augmentation Method for Object Detection with Diffusion Model
Figure 2 for A Simple Background Augmentation Method for Object Detection with Diffusion Model
Figure 3 for A Simple Background Augmentation Method for Object Detection with Diffusion Model
Figure 4 for A Simple Background Augmentation Method for Object Detection with Diffusion Model
Viaarxiv icon

COALA: A Practical and Vision-Centric Federated Learning Platform

Add code
Jul 23, 2024
Figure 1 for COALA: A Practical and Vision-Centric Federated Learning Platform
Figure 2 for COALA: A Practical and Vision-Centric Federated Learning Platform
Figure 3 for COALA: A Practical and Vision-Centric Federated Learning Platform
Figure 4 for COALA: A Practical and Vision-Centric Federated Learning Platform
Viaarxiv icon