Picture for Chengyue Wu

Chengyue Wu

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Add code
May 28, 2025
Viaarxiv icon

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

Add code
May 26, 2025
Viaarxiv icon

Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs

Add code
Mar 16, 2025
Viaarxiv icon

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Add code
Dec 13, 2024
Figure 1 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Figure 2 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Figure 3 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Figure 4 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Viaarxiv icon

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Add code
Nov 12, 2024
Viaarxiv icon

Autoregressive Models in Vision: A Survey

Add code
Nov 08, 2024
Figure 1 for Autoregressive Models in Vision: A Survey
Figure 2 for Autoregressive Models in Vision: A Survey
Figure 3 for Autoregressive Models in Vision: A Survey
Figure 4 for Autoregressive Models in Vision: A Survey
Viaarxiv icon

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Add code
Oct 17, 2024
Figure 1 for Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Figure 2 for Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Figure 3 for Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Figure 4 for Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Viaarxiv icon

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Add code
May 13, 2024
Figure 1 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 2 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 3 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 4 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Viaarxiv icon

Adapting LLaMA Decoder to Vision Transformer

Add code
Apr 13, 2024
Viaarxiv icon

FiT: Flexible Vision Transformer for Diffusion Model

Add code
Feb 19, 2024
Figure 1 for FiT: Flexible Vision Transformer for Diffusion Model
Figure 2 for FiT: Flexible Vision Transformer for Diffusion Model
Figure 3 for FiT: Flexible Vision Transformer for Diffusion Model
Figure 4 for FiT: Flexible Vision Transformer for Diffusion Model
Viaarxiv icon