Picture for Yunhong Wang

Yunhong Wang

GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art

Add code
May 16, 2025
Viaarxiv icon

DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning

Add code
May 16, 2025
Viaarxiv icon

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding

Add code
Apr 30, 2025
Figure 1 for SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Figure 2 for SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Figure 3 for SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Figure 4 for SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Viaarxiv icon

SkeletonX: Data-Efficient Skeleton-based Action Recognition via Cross-sample Feature Aggregation

Add code
Apr 16, 2025
Viaarxiv icon

Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation

Add code
Apr 13, 2025
Figure 1 for Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Figure 2 for Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Figure 3 for Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Figure 4 for Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Viaarxiv icon

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

Add code
Apr 03, 2025
Figure 1 for APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Figure 2 for APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Figure 3 for APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Figure 4 for APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Viaarxiv icon

A Survey on Remote Sensing Foundation Models: From Vision to Multimodality

Add code
Mar 28, 2025
Figure 1 for A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Figure 2 for A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Figure 3 for A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Figure 4 for A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Viaarxiv icon

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset

Add code
Mar 25, 2025
Figure 1 for AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
Figure 2 for AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
Figure 3 for AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
Figure 4 for AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
Viaarxiv icon

Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum

Add code
Mar 11, 2025
Figure 1 for Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum
Figure 2 for Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum
Figure 3 for Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum
Figure 4 for Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum
Viaarxiv icon

KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus

Add code
Mar 10, 2025
Viaarxiv icon