Picture for Jialong Zuo

Jialong Zuo

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis

Add code
Jul 19, 2024
Viaarxiv icon

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

Add code
Jun 25, 2024
Figure 1 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 2 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 3 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 4 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Viaarxiv icon

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

Add code
Jun 18, 2024
Viaarxiv icon

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Add code
Jun 03, 2024
Viaarxiv icon

Spatial Cascaded Clustering and Weighted Memory for Unsupervised Person Re-identification

Add code
Mar 01, 2024
Figure 1 for Spatial Cascaded Clustering and Weighted Memory for Unsupervised Person Re-identification
Figure 2 for Spatial Cascaded Clustering and Weighted Memory for Unsupervised Person Re-identification
Figure 3 for Spatial Cascaded Clustering and Weighted Memory for Unsupervised Person Re-identification
Figure 4 for Spatial Cascaded Clustering and Weighted Memory for Unsupervised Person Re-identification
Viaarxiv icon

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech

Add code
Feb 14, 2024
Figure 1 for MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Figure 2 for MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Figure 3 for MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Figure 4 for MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Viaarxiv icon

UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity

Add code
Dec 11, 2023
Figure 1 for UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Figure 2 for UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Figure 3 for UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Figure 4 for UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Viaarxiv icon

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

Add code
Aug 28, 2023
Figure 1 for TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Figure 2 for TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Figure 3 for TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Figure 4 for TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Viaarxiv icon

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

Add code
May 23, 2023
Figure 1 for FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Figure 2 for FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Figure 3 for FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Figure 4 for FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Viaarxiv icon

PLIP: Language-Image Pre-training for Person Representation Learning

Add code
May 15, 2023
Figure 1 for PLIP: Language-Image Pre-training for Person Representation Learning
Figure 2 for PLIP: Language-Image Pre-training for Person Representation Learning
Figure 3 for PLIP: Language-Image Pre-training for Person Representation Learning
Figure 4 for PLIP: Language-Image Pre-training for Person Representation Learning
Viaarxiv icon