Picture for Yuxin Peng

Yuxin Peng

TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring

Add code
Aug 07, 2025
Viaarxiv icon

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding

Add code
Aug 06, 2025
Viaarxiv icon

SphereDrag: Spherical Geometry-Aware Panoramic Image Editing

Add code
Jun 13, 2025
Viaarxiv icon

Scan-and-Print: Patch-level Data Summarization and Augmentation for Content-aware Layout Generation in Poster Design

Add code
May 27, 2025
Viaarxiv icon

PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation

Add code
May 06, 2025
Viaarxiv icon

DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding

Add code
Apr 21, 2025
Figure 1 for DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Figure 2 for DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Figure 3 for DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Figure 4 for DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Viaarxiv icon

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation

Add code
Apr 21, 2025
Figure 1 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Figure 2 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Figure 3 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Figure 4 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Viaarxiv icon

ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer

Add code
Apr 03, 2025
Viaarxiv icon

STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding

Add code
Mar 20, 2025
Figure 1 for STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Figure 2 for STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Figure 3 for STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Figure 4 for STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Viaarxiv icon

SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting

Add code
Mar 17, 2025
Viaarxiv icon