Picture for Zhiyang Xu

Zhiyang Xu

Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation

Add code
Jun 12, 2025
Viaarxiv icon

LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer

Add code
Jun 08, 2025
Viaarxiv icon

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Add code
Jun 08, 2025
Viaarxiv icon

R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation

Add code
May 29, 2025
Viaarxiv icon

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Add code
May 14, 2025
Viaarxiv icon

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Add code
Apr 14, 2025
Viaarxiv icon

Transfer between Modalities with MetaQueries

Add code
Apr 08, 2025
Viaarxiv icon

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models

Add code
Feb 22, 2025
Figure 1 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 2 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 3 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 4 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Viaarxiv icon

UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers

Add code
Oct 26, 2024
Figure 1 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Figure 2 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Figure 3 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Figure 4 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Viaarxiv icon

RoRA-VLM: Robust Retrieval-Augmented Vision Language Models

Add code
Oct 11, 2024
Figure 1 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Figure 2 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Figure 3 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Figure 4 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Viaarxiv icon