Picture for Shiliang Zhang

Shiliang Zhang

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Add code
Jul 09, 2024
Viaarxiv icon

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Add code
Jun 17, 2024
Figure 1 for Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
Figure 2 for Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
Figure 3 for Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
Figure 4 for Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
Viaarxiv icon

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

Add code
Jun 17, 2024
Figure 1 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Figure 2 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Figure 3 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Figure 4 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Viaarxiv icon

MaLa-ASR: Multimedia-Assisted LLM-Based ASR

Add code
Jun 09, 2024
Figure 1 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Figure 2 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Figure 3 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Figure 4 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Viaarxiv icon

OVMR: Open-Vocabulary Recognition with Multi-Modal References

Add code
Jun 07, 2024
Viaarxiv icon

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

Add code
Jun 07, 2024
Viaarxiv icon

ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

Add code
Jun 04, 2024
Viaarxiv icon

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Add code
Apr 29, 2024
Figure 1 for MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Figure 2 for MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Figure 3 for MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Figure 4 for MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Viaarxiv icon

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

Add code
Mar 29, 2024
Figure 1 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Figure 2 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Figure 3 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Figure 4 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Viaarxiv icon

Decoupled Contrastive Learning for Long-Tailed Recognition

Add code
Mar 10, 2024
Figure 1 for Decoupled Contrastive Learning for Long-Tailed Recognition
Figure 2 for Decoupled Contrastive Learning for Long-Tailed Recognition
Figure 3 for Decoupled Contrastive Learning for Long-Tailed Recognition
Figure 4 for Decoupled Contrastive Learning for Long-Tailed Recognition
Viaarxiv icon