Picture for Zhou Huan

Zhou Huan

Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning

Add code
May 24, 2025
Figure 1 for Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Figure 2 for Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Figure 3 for Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Figure 4 for Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Viaarxiv icon

SCDiar: a streaming diarization system based on speaker change detection and speech recognition

Add code
Jan 28, 2025
Figure 1 for SCDiar: a streaming diarization system based on speaker change detection and speech recognition
Figure 2 for SCDiar: a streaming diarization system based on speaker change detection and speech recognition
Figure 3 for SCDiar: a streaming diarization system based on speaker change detection and speech recognition
Figure 4 for SCDiar: a streaming diarization system based on speaker change detection and speech recognition
Viaarxiv icon

An efficient text augmentation approach for contextualized Mandarin speech recognition

Add code
Jun 14, 2024
Figure 1 for An efficient text augmentation approach for contextualized Mandarin speech recognition
Figure 2 for An efficient text augmentation approach for contextualized Mandarin speech recognition
Figure 3 for An efficient text augmentation approach for contextualized Mandarin speech recognition
Figure 4 for An efficient text augmentation approach for contextualized Mandarin speech recognition
Viaarxiv icon