Picture for Gen Li

Gen Li

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Add code
Sep 10, 2025
Viaarxiv icon

Connections between reinforcement learning with feedback,test-time scaling, and diffusion guidance: An anthology

Add code
Sep 04, 2025
Viaarxiv icon

Dual Enhancement on 3D Vision-Language Perception for Monocular 3D Visual Grounding

Add code
Aug 26, 2025
Viaarxiv icon

Can Structured Templates Facilitate LLMs in Tackling Harder Tasks? : An Exploration of Scaling Laws by Difficulty

Add code
Aug 26, 2025
Viaarxiv icon

Hydra-Bench: A Benchmark for Multi-Modal Leaf Wetness Sensing

Add code
Jul 30, 2025
Viaarxiv icon

Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

Add code
Jun 23, 2025
Viaarxiv icon

EgoM2P: Egocentric Multimodal Multitask Pretraining

Add code
Jun 09, 2025
Viaarxiv icon

Transformers Meet In-Context Learning: A Universal Approximation Theory

Add code
Jun 05, 2025
Viaarxiv icon

A Convergence Theory for Diffusion Language Models: An Information-Theoretic Perspective

Add code
May 27, 2025
Viaarxiv icon

TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks

Add code
May 19, 2025
Viaarxiv icon