Picture for Tianyuan Qu

Tianyuan Qu

RTime-QA: A Benchmark for Atomic Temporal Event Understanding in Large Multi-modal Models

Add code
May 25, 2025
Viaarxiv icon

VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning

Add code
May 17, 2025
Viaarxiv icon

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?

Add code
Mar 16, 2025
Viaarxiv icon

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Add code
Dec 12, 2024
Figure 1 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 2 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 3 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 4 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Viaarxiv icon

An Improved Baseline for Reasoning Segmentation with Large Language Model

Add code
Jan 03, 2024
Figure 1 for An Improved Baseline for Reasoning Segmentation with Large Language Model
Figure 2 for An Improved Baseline for Reasoning Segmentation with Large Language Model
Figure 3 for An Improved Baseline for Reasoning Segmentation with Large Language Model
Figure 4 for An Improved Baseline for Reasoning Segmentation with Large Language Model
Viaarxiv icon