Picture for Chunwei Wang

Chunwei Wang

IMUG-Bench: Benchmarking Unified Multimodal Models on Interleaved Understanding and Generation

Add code
Jun 08, 2026
Viaarxiv icon

Towards Unified Multimodal Interleaved Generation via Group Relative Policy Optimization

Add code
Mar 10, 2026
Viaarxiv icon

InterCoG: Towards Spatially Precise Image Editing with Interleaved Chain-of-Grounding Reasoning

Add code
Mar 03, 2026
Viaarxiv icon

SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM

Add code
Feb 03, 2026
Viaarxiv icon

KFFocus: Highlighting Keyframes for Enhanced Video Understanding

Add code
Aug 12, 2025
Figure 1 for KFFocus: Highlighting Keyframes for Enhanced Video Understanding
Figure 2 for KFFocus: Highlighting Keyframes for Enhanced Video Understanding
Figure 3 for KFFocus: Highlighting Keyframes for Enhanced Video Understanding
Figure 4 for KFFocus: Highlighting Keyframes for Enhanced Video Understanding
Viaarxiv icon

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Add code
Apr 03, 2025
Figure 1 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 2 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 3 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 4 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Viaarxiv icon

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation

Add code
Mar 09, 2025
Figure 1 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Figure 2 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Figure 3 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Figure 4 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Viaarxiv icon

FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise

Add code
Feb 05, 2025
Figure 1 for FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
Figure 2 for FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
Figure 3 for FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
Figure 4 for FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
Viaarxiv icon

Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising

Add code
Jan 06, 2025
Figure 1 for Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising
Figure 2 for Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising
Figure 3 for Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising
Figure 4 for Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising
Viaarxiv icon

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Add code
Dec 09, 2024
Viaarxiv icon