Picture for Junbin Xiao

Junbin Xiao

MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering

Add code
May 21, 2026
Viaarxiv icon

Audio-Visual Intelligence in Large Foundation Models

Add code
May 05, 2026
Viaarxiv icon

Ego-Grounding for Personalized Question-Answering in Egocentric Videos

Add code
Apr 02, 2026
Viaarxiv icon

EgoExo-Con: Exploring View-Invariant Video Temporal Understanding

Add code
Oct 30, 2025
Viaarxiv icon

Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation

Add code
Apr 21, 2025
Figure 1 for Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation
Figure 2 for Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation
Figure 3 for Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation
Figure 4 for Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation
Viaarxiv icon

Visual Intention Grounding for Egocentric Assistants

Add code
Apr 18, 2025
Viaarxiv icon

EgoBlind: Towards Egocentric Visual Assistance for the Blind People

Add code
Mar 11, 2025
Viaarxiv icon

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Add code
Feb 11, 2025
Viaarxiv icon

On the Consistency of Video Large Language Models in Temporal Comprehension

Add code
Nov 20, 2024
Figure 1 for On the Consistency of Video Large Language Models in Temporal Comprehension
Figure 2 for On the Consistency of Video Large Language Models in Temporal Comprehension
Figure 3 for On the Consistency of Video Large Language Models in Temporal Comprehension
Figure 4 for On the Consistency of Video Large Language Models in Temporal Comprehension
Viaarxiv icon

Scene-Text Grounding for Text-Based Video Question Answering

Add code
Sep 22, 2024
Figure 1 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 2 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 3 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 4 for Scene-Text Grounding for Text-Based Video Question Answering
Viaarxiv icon