Alert button
Picture for Zijia Zhao

Zijia Zhao

Alert button

VL-Mamba: Exploring State Space Models for Multimodal Learning

Add code
Bookmark button
Alert button
Mar 20, 2024
Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, Jing Liu

Figure 1 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 2 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 3 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 4 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Viaarxiv icon

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Add code
Bookmark button
Alert button
Mar 20, 2024
Tongtian Yue, Jie Cheng, Longteng Guo, Xingyuan Dai, Zijia Zhao, Xingjian He, Gang Xiong, Yisheng Lv, Jing Liu

Figure 1 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Figure 2 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Figure 3 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Figure 4 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Viaarxiv icon

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions

Add code
Bookmark button
Alert button
Feb 17, 2024
Wenxuan Wang, Yisi Zhang, Xingjian He, Yichen Yan, Zijia Zhao, Xinlong Wang, Jing Liu

Viaarxiv icon

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Add code
Bookmark button
Alert button
May 29, 2023
Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, Jing Liu

Figure 1 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 2 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 3 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 4 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Viaarxiv icon

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

Add code
Bookmark button
Alert button
May 25, 2023
Zijia Zhao, Longteng Guo, Tongtian Yue, Sihan Chen, Shuai Shao, Xinxin Zhu, Zehuan Yuan, Jing Liu

Figure 1 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 2 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 3 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 4 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Viaarxiv icon

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning

Add code
Bookmark button
Alert button
Oct 09, 2022
Zijia Zhao, Longteng Guo, Xingjian He, Shuai Shao, Zehuan Yuan, Jing Liu

Figure 1 for MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Figure 2 for MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Figure 3 for MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Figure 4 for MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Viaarxiv icon

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

Add code
Bookmark button
Alert button
Jul 06, 2021
Jing Liu, Xinxin Zhu, Fei Liu, Longteng Guo, Zijia Zhao, Mingzhen Sun, Weining Wang, Hanqing Lu, Shiyu Zhou, Jiajun Zhang, Jinqiao Wang

Figure 1 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 2 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 3 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 4 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Viaarxiv icon