Alert button
Picture for Sihan Chen

Sihan Chen

Alert button

Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation

Add code
Bookmark button
Alert button
Apr 12, 2024
Yichen Yan, Xingjian He, Sihan Chen, Jing Liu

Viaarxiv icon

VL-Mamba: Exploring State Space Models for Multimodal Learning

Add code
Bookmark button
Alert button
Mar 20, 2024
Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, Jing Liu

Figure 1 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 2 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 3 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 4 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Viaarxiv icon

Semantic Entropy Can Simultaneously Benefit Transmission Efficiency and Channel Security of Wireless Semantic Communications

Add code
Bookmark button
Alert button
Feb 07, 2024
Yankai Rong, Guoshun Nan, Minwei Zhang, Sihan Chen, Songtao Wang, Xuefei Zhang, Nan Ma, Shixun Gong, Zhaohui Yang, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek

Viaarxiv icon

GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER

Add code
Bookmark button
Alert button
Sep 23, 2023
Mingzhen Sun, Weining Wang, Zihan Qin, Jiahui Sun, Sihan Chen, Jing Liu

Viaarxiv icon

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

Add code
Bookmark button
Alert button
Aug 22, 2023
Yichen Yan, Xingjian He, Wenxuan Wang, Sihan Chen, Jing Liu

Figure 1 for EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Figure 2 for EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Figure 3 for EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Figure 4 for EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Viaarxiv icon

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Add code
Bookmark button
Alert button
Jun 15, 2023
Sihan Chen, Xingjian He, Handong Li, Xiaojie Jin, Jiashi Feng, Jing Liu

Figure 1 for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Figure 2 for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Figure 3 for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Figure 4 for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Viaarxiv icon

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Add code
Bookmark button
Alert button
May 29, 2023
Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, Jing Liu

Figure 1 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 2 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 3 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 4 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Viaarxiv icon

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

Add code
Bookmark button
Alert button
May 25, 2023
Zijia Zhao, Longteng Guo, Tongtian Yue, Sihan Chen, Shuai Shao, Xinxin Zhu, Zehuan Yuan, Jing Liu

Figure 1 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 2 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 3 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 4 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Viaarxiv icon

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

Add code
Bookmark button
Alert button
May 22, 2023
Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng

Figure 1 for VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Figure 2 for VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Figure 3 for VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Figure 4 for VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Viaarxiv icon