Alert button
Picture for Shuhuai Ren

Shuhuai Ren

Alert button

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

Add code
Bookmark button
Alert button
Apr 16, 2024
Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun

Viaarxiv icon

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

Add code
Bookmark button
Alert button
Mar 28, 2024
Sishuo Chen, Lei Li, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu Sun, Lu Hou

Viaarxiv icon

TempCompass: Do Video LLMs Really Understand Videos?

Add code
Bookmark button
Alert button
Mar 01, 2024
Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou

Figure 1 for TempCompass: Do Video LLMs Really Understand Videos?
Figure 2 for TempCompass: Do Video LLMs Really Understand Videos?
Figure 3 for TempCompass: Do Video LLMs Really Understand Videos?
Figure 4 for TempCompass: Do Video LLMs Really Understand Videos?
Viaarxiv icon

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

Add code
Bookmark button
Alert button
Feb 21, 2024
Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Xiangdi Meng, Tianyu Liu, Baobao Chang

Viaarxiv icon

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Add code
Bookmark button
Alert button
Dec 04, 2023
Shuhuai Ren, Linli Yao, Shicheng Li, Xu Sun, Lu Hou

Viaarxiv icon

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Add code
Bookmark button
Alert button
Nov 29, 2023
Shicheng Li, Lei Li, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu Sun, Lu Hou

Viaarxiv icon

FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation

Add code
Bookmark button
Alert button
Nov 08, 2023
Yuanxin Liu, Lei Li, Shuhuai Ren, Rundong Gao, Shicheng Li, Sishuo Chen, Xu Sun, Lu Hou

Figure 1 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Figure 2 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Figure 3 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Figure 4 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Viaarxiv icon

TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

Add code
Bookmark button
Alert button
Oct 29, 2023
Shuhuai Ren, Sishuo Chen, Shicheng Li, Xu Sun, Lu Hou

Viaarxiv icon

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

Add code
Bookmark button
Alert button
Oct 16, 2023
Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Tianyu Liu, Baobao Chang

Figure 1 for Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
Figure 2 for Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
Figure 3 for Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
Figure 4 for Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
Viaarxiv icon