Picture for Shouhong Ding

Shouhong Ding

Switchable Token-Specific Codebook Quantization For Face Image Compression

Add code
Oct 27, 2025
Viaarxiv icon

Towards Rationale-Answer Alignment of LVLMs via Self-Rationale Calibration

Add code
Sep 17, 2025
Viaarxiv icon

VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference

Add code
Aug 25, 2025
Viaarxiv icon

MS-DETR: Towards Effective Video Moment Retrieval and Highlight Detection by Joint Motion-Semantic Learning

Add code
Jul 16, 2025
Viaarxiv icon

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models

Add code
Jul 03, 2025
Viaarxiv icon

Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes

Add code
May 26, 2025
Viaarxiv icon

Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable

Add code
May 20, 2025
Viaarxiv icon

Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception

Add code
Apr 29, 2025
Figure 1 for Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Figure 2 for Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Figure 3 for Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Figure 4 for Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Viaarxiv icon

All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning

Add code
Apr 02, 2025
Viaarxiv icon

ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts

Add code
Apr 01, 2025
Figure 1 for ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts
Figure 2 for ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts
Figure 3 for ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts
Figure 4 for ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts
Viaarxiv icon