Alert button
Picture for Yuanhan Zhang

Yuanhan Zhang

Alert button

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Add code
Bookmark button
Alert button
Apr 02, 2024
Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander Hauptmann, Yonatan Bisk, Yiming Yang

Viaarxiv icon

VBench: Comprehensive Benchmark Suite for Video Generative Models

Add code
Bookmark button
Alert button
Nov 29, 2023
Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

Viaarxiv icon

OtterHD: A High-Resolution Multi-modality Model

Add code
Bookmark button
Alert button
Nov 07, 2023
Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, Ziwei Liu

Figure 1 for OtterHD: A High-Resolution Multi-modality Model
Figure 2 for OtterHD: A High-Resolution Multi-modality Model
Figure 3 for OtterHD: A High-Resolution Multi-modality Model
Figure 4 for OtterHD: A High-Resolution Multi-modality Model
Viaarxiv icon

Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images

Add code
Bookmark button
Alert button
Nov 02, 2023
Zalan Fabian, Zhongqi Miao, Chunyuan Li, Yuanhan Zhang, Ziwei Liu, Andrés Hernández, Andrés Montes-Rojas, Rafael Escucha, Laura Siabatto, Andrés Link, Pablo Arbeláez, Rahul Dodhia, Juan Lavista Ferres

Viaarxiv icon

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Add code
Bookmark button
Alert button
Oct 12, 2023
Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, Chencheng Jiang, Haoran Tan, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

Figure 1 for Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Figure 2 for Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Figure 3 for Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Figure 4 for Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Viaarxiv icon

MMBench: Is Your Multi-modal Model an All-around Player?

Add code
Bookmark button
Alert button
Jul 26, 2023
Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin

Figure 1 for MMBench: Is Your Multi-modal Model an All-around Player?
Figure 2 for MMBench: Is Your Multi-modal Model an All-around Player?
Figure 3 for MMBench: Is Your Multi-modal Model an All-around Player?
Figure 4 for MMBench: Is Your Multi-modal Model an All-around Player?
Viaarxiv icon

FunQA: Towards Surprising Video Comprehension

Add code
Bookmark button
Alert button
Jun 26, 2023
Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu

Figure 1 for FunQA: Towards Surprising Video Comprehension
Figure 2 for FunQA: Towards Surprising Video Comprehension
Viaarxiv icon

MIMIC-IT: Multi-Modal In-Context Instruction Tuning

Add code
Bookmark button
Alert button
Jun 08, 2023
Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu

Figure 1 for MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Figure 2 for MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Figure 3 for MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Figure 4 for MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Viaarxiv icon

Learning without Forgetting for Vision-Language Models

Add code
Bookmark button
Alert button
May 30, 2023
Da-Wei Zhou, Yuanhan Zhang, Jingyi Ning, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

Figure 1 for Learning without Forgetting for Vision-Language Models
Figure 2 for Learning without Forgetting for Vision-Language Models
Figure 3 for Learning without Forgetting for Vision-Language Models
Figure 4 for Learning without Forgetting for Vision-Language Models
Viaarxiv icon