Picture for Xiaoshuai Sun

Xiaoshuai Sun

Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach

Add code
Apr 16, 2025
Viaarxiv icon

An Efficient and Mixed Heterogeneous Model for Image Restoration

Add code
Apr 15, 2025
Viaarxiv icon

Exploring the Collaborative Advantage of Low-level Information on Generalizable AI-Generated Image Detection

Add code
Apr 01, 2025
Viaarxiv icon

MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning

Add code
Mar 26, 2025
Viaarxiv icon

Grounded Chain-of-Thought for Multimodal Large Language Models

Add code
Mar 17, 2025
Viaarxiv icon

AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection

Add code
Feb 08, 2025
Viaarxiv icon

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation

Add code
Jan 09, 2025
Figure 1 for IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
Figure 2 for IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
Figure 3 for IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
Figure 4 for IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
Viaarxiv icon

StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization

Add code
Dec 10, 2024
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Add code
Dec 03, 2024
Figure 1 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 2 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 3 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 4 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Viaarxiv icon