Picture for Wenhao Sun

Wenhao Sun

Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers

Add code
Mar 11, 2026
Viaarxiv icon

MixerCSeg: An Efficient Mixer Architecture for Crack Segmentation via Decoupled Mamba Attention

Add code
Mar 02, 2026
Viaarxiv icon

SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models

Add code
Jan 30, 2026
Viaarxiv icon

AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization

Add code
Aug 06, 2025
Figure 1 for AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization
Figure 2 for AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization
Figure 3 for AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization
Figure 4 for AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization
Viaarxiv icon

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval

Add code
May 26, 2025
Viaarxiv icon

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Add code
May 24, 2025
Figure 1 for VORTA: Efficient Video Diffusion via Routing Sparse Attention
Figure 2 for VORTA: Efficient Video Diffusion via Routing Sparse Attention
Figure 3 for VORTA: Efficient Video Diffusion via Routing Sparse Attention
Figure 4 for VORTA: Efficient Video Diffusion via Routing Sparse Attention
Viaarxiv icon

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance

Add code
Dec 17, 2024
Figure 1 for Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
Figure 2 for Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
Figure 3 for Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
Figure 4 for Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
Viaarxiv icon

AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration

Add code
Dec 16, 2024
Figure 1 for AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Figure 2 for AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Figure 3 for AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Figure 4 for AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Viaarxiv icon

DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline

Add code
Dec 02, 2024
Figure 1 for DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline
Figure 2 for DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline
Figure 3 for DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline
Figure 4 for DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline
Viaarxiv icon

SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing

Add code
Nov 28, 2024
Figure 1 for SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Figure 2 for SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Figure 3 for SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Figure 4 for SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Viaarxiv icon