Picture for Wenhao Sun

Wenhao Sun

AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization

Add code
Aug 06, 2025
Viaarxiv icon

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval

Add code
May 26, 2025
Viaarxiv icon

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Add code
May 24, 2025
Viaarxiv icon

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance

Add code
Dec 17, 2024
Figure 1 for Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
Figure 2 for Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
Figure 3 for Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
Figure 4 for Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
Viaarxiv icon

AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration

Add code
Dec 16, 2024
Viaarxiv icon

DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline

Add code
Dec 02, 2024
Figure 1 for DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline
Figure 2 for DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline
Figure 3 for DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline
Figure 4 for DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline
Viaarxiv icon

SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing

Add code
Nov 28, 2024
Figure 1 for SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Figure 2 for SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Figure 3 for SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Figure 4 for SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Viaarxiv icon

Diffusion Model-Based Video Editing: A Survey

Add code
Jun 26, 2024
Figure 1 for Diffusion Model-Based Video Editing: A Survey
Figure 2 for Diffusion Model-Based Video Editing: A Survey
Figure 3 for Diffusion Model-Based Video Editing: A Survey
Figure 4 for Diffusion Model-Based Video Editing: A Survey
Viaarxiv icon

Class-based Quantization for Neural Networks

Add code
Nov 27, 2022
Viaarxiv icon

SteppingNet: A Stepping Neural Network with Incremental Accuracy Enhancement

Add code
Nov 27, 2022
Figure 1 for SteppingNet: A Stepping Neural Network with Incremental Accuracy Enhancement
Figure 2 for SteppingNet: A Stepping Neural Network with Incremental Accuracy Enhancement
Figure 3 for SteppingNet: A Stepping Neural Network with Incremental Accuracy Enhancement
Figure 4 for SteppingNet: A Stepping Neural Network with Incremental Accuracy Enhancement
Viaarxiv icon