Alert button
Picture for Yuhang Cao

Yuhang Cao

Alert button

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Add code
Bookmark button
Alert button
Apr 09, 2024
Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang

Viaarxiv icon

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

Add code
Bookmark button
Alert button
Feb 22, 2024
Yuhang Cao, Pan Zhang, Xiaoyi Dong, Dahua Lin, Jiaqi Wang

Viaarxiv icon

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Add code
Bookmark button
Alert button
Jan 29, 2024
Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

Viaarxiv icon

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Add code
Bookmark button
Alert button
Sep 29, 2023
Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Shuangrui Ding, Songyang Zhang, Haodong Duan, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

Figure 1 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 2 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 3 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 4 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Viaarxiv icon

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

Add code
Bookmark button
Alert button
Sep 28, 2023
Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

Figure 1 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 2 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 3 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 4 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Viaarxiv icon

DiaCorrect: Error Correction Back-end For Speaker Diarization

Add code
Bookmark button
Alert button
Sep 15, 2023
Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas Burget, Yuhang Cao, Heng Lu, Jan Cernocky

Figure 1 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Figure 2 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Figure 3 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Figure 4 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Viaarxiv icon

V3Det: Vast Vocabulary Visual Detection Dataset

Add code
Bookmark button
Alert button
Apr 07, 2023
Jiaqi Wang, Pan Zhang, Tao Chu, Yuhang Cao, Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin

Figure 1 for V3Det: Vast Vocabulary Visual Detection Dataset
Figure 2 for V3Det: Vast Vocabulary Visual Detection Dataset
Figure 3 for V3Det: Vast Vocabulary Visual Detection Dataset
Figure 4 for V3Det: Vast Vocabulary Visual Detection Dataset
Viaarxiv icon

DiaCorrect: End-to-end error correction for speaker diarization

Add code
Bookmark button
Alert button
Oct 31, 2022
Jiangyu Han, Yuhang Cao, Heng Lu, Yanhua Long

Figure 1 for DiaCorrect: End-to-end error correction for speaker diarization
Figure 2 for DiaCorrect: End-to-end error correction for speaker diarization
Figure 3 for DiaCorrect: End-to-end error correction for speaker diarization
Figure 4 for DiaCorrect: End-to-end error correction for speaker diarization
Viaarxiv icon

MINI: Mining Implicit Novel Instances for Few-Shot Object Detection

Add code
Bookmark button
Alert button
May 06, 2022
Yuhang Cao, Jiaqi Wang, Yiqi Lin, Dahua Lin

Figure 1 for MINI: Mining Implicit Novel Instances for Few-Shot Object Detection
Figure 2 for MINI: Mining Implicit Novel Instances for Few-Shot Object Detection
Figure 3 for MINI: Mining Implicit Novel Instances for Few-Shot Object Detection
Figure 4 for MINI: Mining Implicit Novel Instances for Few-Shot Object Detection
Viaarxiv icon

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

Add code
Bookmark button
Alert button
Feb 10, 2022
Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee

Figure 1 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Figure 2 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Figure 3 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Figure 4 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Viaarxiv icon