Picture for Kun Zhou

Kun Zhou

Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation

Add code
Aug 18, 2025
Viaarxiv icon

Can Large Pretrained Depth Estimation Models Help With Image Dehazing?

Add code
Aug 01, 2025
Viaarxiv icon

Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models

Add code
Jul 27, 2025
Viaarxiv icon

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Add code
Jun 17, 2025
Viaarxiv icon

AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing

Add code
Jun 16, 2025
Viaarxiv icon

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction

Add code
May 27, 2025
Viaarxiv icon

Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models

Add code
May 23, 2025
Viaarxiv icon

Towards General Continuous Memory for Vision-Language Models

Add code
May 23, 2025
Viaarxiv icon

Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models

Add code
May 19, 2025
Viaarxiv icon

A 2D Semantic-Aware Position Encoding for Vision Transformers

Add code
May 14, 2025
Viaarxiv icon