Picture for Zhiyang Xu

Zhiyang Xu

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Add code
Jun 08, 2025
Viaarxiv icon

LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer

Add code
Jun 08, 2025
Viaarxiv icon

R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation

Add code
May 29, 2025
Viaarxiv icon

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Add code
May 14, 2025
Viaarxiv icon

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Add code
Apr 14, 2025
Viaarxiv icon

Transfer between Modalities with MetaQueries

Add code
Apr 08, 2025
Viaarxiv icon

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models

Add code
Feb 22, 2025
Figure 1 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 2 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 3 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 4 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Viaarxiv icon

UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers

Add code
Oct 26, 2024
Figure 1 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Figure 2 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Figure 3 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Figure 4 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Viaarxiv icon

RoRA-VLM: Robust Retrieval-Augmented Vision Language Models

Add code
Oct 11, 2024
Figure 1 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Figure 2 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Figure 3 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Figure 4 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Viaarxiv icon

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models

Add code
Oct 04, 2024
Viaarxiv icon