Picture for Boyang Zheng

Boyang Zheng

Benchmarking Visual State Tracking in Multimodal Video Understanding

Add code
Jun 02, 2026
Viaarxiv icon

Improved Baselines with Representation Autoencoders

Add code
May 18, 2026
Viaarxiv icon

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Add code
Mar 03, 2026
Viaarxiv icon

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Add code
Jan 22, 2026
Viaarxiv icon

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Add code
May 15, 2025
Figure 1 for Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Figure 2 for Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Figure 3 for Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Figure 4 for Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Viaarxiv icon

LM4LV: A Frozen Large Language Model for Low-level Vision Tasks

Add code
May 24, 2024
Figure 1 for LM4LV: A Frozen Large Language Model for Low-level Vision Tasks
Figure 2 for LM4LV: A Frozen Large Language Model for Low-level Vision Tasks
Figure 3 for LM4LV: A Frozen Large Language Model for Low-level Vision Tasks
Figure 4 for LM4LV: A Frozen Large Language Model for Low-level Vision Tasks
Viaarxiv icon

Understanding and Improving Adversarial Attacks on Latent Diffusion Model

Add code
Oct 07, 2023
Figure 1 for Understanding and Improving Adversarial Attacks on Latent Diffusion Model
Figure 2 for Understanding and Improving Adversarial Attacks on Latent Diffusion Model
Figure 3 for Understanding and Improving Adversarial Attacks on Latent Diffusion Model
Figure 4 for Understanding and Improving Adversarial Attacks on Latent Diffusion Model
Viaarxiv icon