Picture for Zhiyang Xu

Zhiyang Xu

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Add code
May 14, 2025
Viaarxiv icon

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Add code
Apr 14, 2025
Viaarxiv icon

Transfer between Modalities with MetaQueries

Add code
Apr 08, 2025
Viaarxiv icon

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models

Add code
Feb 22, 2025
Figure 1 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 2 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 3 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 4 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Viaarxiv icon

UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers

Add code
Oct 26, 2024
Figure 1 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Figure 2 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Figure 3 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Figure 4 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Viaarxiv icon

RoRA-VLM: Robust Retrieval-Augmented Vision Language Models

Add code
Oct 11, 2024
Figure 1 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Figure 2 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Figure 3 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Figure 4 for RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Viaarxiv icon

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models

Add code
Oct 04, 2024
Viaarxiv icon

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

Add code
Oct 04, 2024
Figure 1 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Figure 2 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Figure 3 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Figure 4 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Viaarxiv icon

Do Large Language Models Possess Sensitive to Sentiment?

Add code
Sep 04, 2024
Viaarxiv icon

NeuroBind: Towards Unified Multimodal Representations for Neural Signals

Add code
Jul 19, 2024
Viaarxiv icon