Picture for Peiwen Sun

Peiwen Sun

Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation

Add code
Jun 10, 2026
Viaarxiv icon

LongSpace: Exploring Long-Horizon Spatial Memory from Perception to Recall in Video

Add code
Jun 04, 2026
Viaarxiv icon

Benchmark Everything Everywhere All at Once

Add code
Jun 04, 2026
Viaarxiv icon

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Add code
Jun 01, 2026
Viaarxiv icon

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

Add code
Apr 26, 2026
Viaarxiv icon

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Add code
Apr 05, 2026
Viaarxiv icon

PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios

Add code
Jan 30, 2026
Viaarxiv icon

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Add code
Oct 10, 2025
Viaarxiv icon

OmniAudio: Generating Spatial Audio from 360-Degree Video

Add code
Apr 21, 2025
Figure 1 for OmniAudio: Generating Spatial Audio from 360-Degree Video
Figure 2 for OmniAudio: Generating Spatial Audio from 360-Degree Video
Figure 3 for OmniAudio: Generating Spatial Audio from 360-Degree Video
Figure 4 for OmniAudio: Generating Spatial Audio from 360-Degree Video
Viaarxiv icon

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Add code
Oct 14, 2024
Figure 1 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 2 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 3 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 4 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Viaarxiv icon