Picture for Yuhang Cao

Yuhang Cao

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Add code
Apr 09, 2024
Viaarxiv icon

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

Add code
Feb 22, 2024
Viaarxiv icon

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Add code
Jan 29, 2024
Viaarxiv icon

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Add code
Sep 29, 2023
Figure 1 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 2 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 3 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 4 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Viaarxiv icon

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

Add code
Sep 28, 2023
Figure 1 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 2 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 3 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 4 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Viaarxiv icon

DiaCorrect: Error Correction Back-end For Speaker Diarization

Add code
Sep 15, 2023
Figure 1 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Figure 2 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Figure 3 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Figure 4 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Viaarxiv icon

V3Det: Vast Vocabulary Visual Detection Dataset

Add code
Apr 07, 2023
Figure 1 for V3Det: Vast Vocabulary Visual Detection Dataset
Figure 2 for V3Det: Vast Vocabulary Visual Detection Dataset
Figure 3 for V3Det: Vast Vocabulary Visual Detection Dataset
Figure 4 for V3Det: Vast Vocabulary Visual Detection Dataset
Viaarxiv icon

DiaCorrect: End-to-end error correction for speaker diarization

Add code
Oct 31, 2022
Figure 1 for DiaCorrect: End-to-end error correction for speaker diarization
Figure 2 for DiaCorrect: End-to-end error correction for speaker diarization
Figure 3 for DiaCorrect: End-to-end error correction for speaker diarization
Figure 4 for DiaCorrect: End-to-end error correction for speaker diarization
Viaarxiv icon

MINI: Mining Implicit Novel Instances for Few-Shot Object Detection

May 06, 2022
Figure 1 for MINI: Mining Implicit Novel Instances for Few-Shot Object Detection
Figure 2 for MINI: Mining Implicit Novel Instances for Few-Shot Object Detection
Figure 3 for MINI: Mining Implicit Novel Instances for Few-Shot Object Detection
Figure 4 for MINI: Mining Implicit Novel Instances for Few-Shot Object Detection
Viaarxiv icon

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

Add code
Feb 10, 2022
Figure 1 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Figure 2 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Figure 3 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Figure 4 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Viaarxiv icon