Picture for Xin Jing

Xin Jing

Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models

Add code
Sep 10, 2024
Figure 1 for Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
Figure 2 for Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
Figure 3 for Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
Figure 4 for Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
Viaarxiv icon

DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition

Add code
Jun 11, 2024
Viaarxiv icon

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks

Add code
Jun 11, 2024
Figure 1 for ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
Figure 2 for ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
Figure 3 for ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
Figure 4 for ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
Viaarxiv icon

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Add code
Feb 02, 2024
Viaarxiv icon

U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech

Add code
May 22, 2023
Figure 1 for U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Figure 2 for U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Figure 3 for U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Viaarxiv icon

HEAR4Health: A blueprint for making computer audition a staple of modern healthcare

Add code
Jan 25, 2023
Figure 1 for HEAR4Health: A blueprint for making computer audition a staple of modern healthcare
Figure 2 for HEAR4Health: A blueprint for making computer audition a staple of modern healthcare
Viaarxiv icon

Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression

Add code
Jun 28, 2022
Figure 1 for Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression
Figure 2 for Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression
Figure 3 for Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression
Viaarxiv icon

Dynamic Restrained Uncertainty Weighting Loss for Multitask Learning of Vocal Expression

Add code
Jun 27, 2022
Figure 1 for Dynamic Restrained Uncertainty Weighting Loss for Multitask Learning of Vocal Expression
Figure 2 for Dynamic Restrained Uncertainty Weighting Loss for Multitask Learning of Vocal Expression
Viaarxiv icon

Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

Add code
Jun 20, 2022
Figure 1 for Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction
Figure 2 for Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction
Viaarxiv icon

A Temporal-oriented Broadcast ResNet for COVID-19 Detection

Add code
Mar 31, 2022
Figure 1 for A Temporal-oriented Broadcast ResNet for COVID-19 Detection
Figure 2 for A Temporal-oriented Broadcast ResNet for COVID-19 Detection
Figure 3 for A Temporal-oriented Broadcast ResNet for COVID-19 Detection
Figure 4 for A Temporal-oriented Broadcast ResNet for COVID-19 Detection
Viaarxiv icon