Picture for Zuxuan Wu

Zuxuan Wu

VidLaDA: Bidirectional Diffusion Large Language Models for Efficient Video Understanding

Add code
Jan 25, 2026
Viaarxiv icon

FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

Add code
Jan 19, 2026
Viaarxiv icon

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Add code
Jan 16, 2026
Viaarxiv icon

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Add code
Jan 12, 2026
Viaarxiv icon

Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy

Add code
Jan 11, 2026
Viaarxiv icon

FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction

Add code
Dec 18, 2025
Viaarxiv icon

UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning

Add code
Nov 18, 2025
Viaarxiv icon

TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction

Add code
Nov 16, 2025
Figure 1 for TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
Figure 2 for TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
Figure 3 for TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
Figure 4 for TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
Viaarxiv icon

Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning

Add code
Nov 14, 2025
Figure 1 for Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning
Figure 2 for Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning
Figure 3 for Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning
Figure 4 for Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning
Viaarxiv icon

LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

Add code
Nov 08, 2025
Viaarxiv icon