Picture for Zhiyang Xu

Zhiyang Xu

Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding

Add code
Apr 02, 2026
Viaarxiv icon

Incentivizing Temporal-Awareness in Egocentric Video Understanding Models

Add code
Mar 28, 2026
Viaarxiv icon

SuperFlow: Training Flow Matching Models with RL on the Fly

Add code
Dec 17, 2025
Viaarxiv icon

COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization

Add code
Oct 08, 2025
Figure 1 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 2 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 3 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 4 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Viaarxiv icon

Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation

Add code
Jun 12, 2025
Viaarxiv icon

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Add code
Jun 08, 2025
Viaarxiv icon

LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer

Add code
Jun 08, 2025
Viaarxiv icon

R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation

Add code
May 29, 2025
Figure 1 for R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
Figure 2 for R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
Figure 3 for R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
Figure 4 for R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
Viaarxiv icon

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Add code
May 14, 2025
Viaarxiv icon

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Add code
Apr 14, 2025
Figure 1 for LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
Figure 2 for LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
Figure 3 for LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
Figure 4 for LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
Viaarxiv icon