Picture for Yunyang Xiong

Yunyang Xiong

VLM3: Vision Language Models Are Native 3D Learners

Add code
May 28, 2026
Viaarxiv icon

Exploring Audio Hallucination in Egocentric Video Understanding

Add code
Apr 26, 2026
Viaarxiv icon

Small Vision-Language Models are Smart Compressors for Long Video Understanding

Add code
Apr 09, 2026
Viaarxiv icon

Neural Computers

Add code
Apr 07, 2026
Viaarxiv icon

Efficient Universal Perception Encoder

Add code
Mar 23, 2026
Viaarxiv icon

EgoAVU: Egocentric Audio-Visual Understanding

Add code
Feb 05, 2026
Viaarxiv icon

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Add code
Jan 08, 2026
Viaarxiv icon

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

Add code
Feb 04, 2025
Viaarxiv icon

EdgeTAM: On-Device Track Anything Model

Add code
Jan 13, 2025
Viaarxiv icon

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Add code
Dec 18, 2024
Figure 1 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 2 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 3 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 4 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Viaarxiv icon