Picture for Jiaya Jia

Jiaya Jia

Logits-Based Finetuning

Add code
May 30, 2025
Viaarxiv icon

RTime-QA: A Benchmark for Atomic Temporal Event Understanding in Large Multi-modal Models

Add code
May 25, 2025
Viaarxiv icon

Training-Free Efficient Video Generation via Dynamic Token Carving

Add code
May 22, 2025
Viaarxiv icon

ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay

Add code
May 22, 2025
Viaarxiv icon

VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning

Add code
May 17, 2025
Viaarxiv icon

TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance

Add code
Apr 23, 2025
Viaarxiv icon

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?

Add code
Mar 16, 2025
Viaarxiv icon

STEVE: AStep Verification Pipeline for Computer-use Agent Training

Add code
Mar 16, 2025
Viaarxiv icon

Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation

Add code
Mar 11, 2025
Viaarxiv icon

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

Add code
Mar 09, 2025
Viaarxiv icon