Picture for Kaipeng Zhang

Kaipeng Zhang

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Add code
Oct 07, 2024
Viaarxiv icon

HRVMamba: High-Resolution Visual State Space Model for Dense Prediction

Add code
Oct 04, 2024
Viaarxiv icon

T3M: Text Guided 3D Human Motion Synthesis from Speech

Add code
Aug 23, 2024
Viaarxiv icon

Prioritize Alignment in Dataset Distillation

Add code
Aug 06, 2024
Viaarxiv icon

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Add code
Aug 05, 2024
Figure 1 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Figure 2 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Figure 3 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Figure 4 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Viaarxiv icon

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Add code
Jul 24, 2024
Viaarxiv icon

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

Add code
Jul 11, 2024
Viaarxiv icon

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Add code
Jun 17, 2024
Viaarxiv icon

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

Add code
Jun 13, 2024
Viaarxiv icon

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Add code
Jun 12, 2024
Viaarxiv icon