Picture for Jian Zhang

Jian Zhang

Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models

Add code
Jun 06, 2025
Viaarxiv icon

AssetDropper: Asset Extraction via Diffusion Models with Reward-Driven Optimization

Add code
Jun 06, 2025
Viaarxiv icon

Unleashing the Power of Intermediate Domains for Mixed Domain Semi-Supervised Medical Image Segmentation

Add code
May 30, 2025
Viaarxiv icon

LAFR: Efficient Diffusion-based Blind Face Restoration via Latent Codebook Alignment Adapter

Add code
May 29, 2025
Viaarxiv icon

Robot Operation of Home Appliances by Reading User Manuals

Add code
May 26, 2025
Viaarxiv icon

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

Add code
May 26, 2025
Viaarxiv icon

MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection

Add code
May 25, 2025
Viaarxiv icon

AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection

Add code
May 21, 2025
Viaarxiv icon

Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It Teaches

Add code
May 18, 2025
Viaarxiv icon

Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning

Add code
May 18, 2025
Viaarxiv icon