Picture for Jiaen Liang

Jiaen Liang

Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition

Add code
Mar 19, 2026
Viaarxiv icon

Anchoring Emotions in Text: Robust Multimodal Fusion for Mimicry Intensity Estimation

Add code
Mar 16, 2026
Viaarxiv icon

Solution to the 10th ABAW Expression Recognition Challenge: A Robust Multimodal Framework with Safe Cross-Attention and Modality Dropout

Add code
Mar 09, 2026
Viaarxiv icon

FocalOrder: Focal Preference Optimization for Reading Order Detection

Add code
Jan 12, 2026
Viaarxiv icon

PARL: Position-Aware Relation Learning Network for Document Layout Analysis

Add code
Jan 12, 2026
Viaarxiv icon

Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR

Add code
Jan 04, 2026
Viaarxiv icon

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Add code
Jun 05, 2023
Figure 1 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 2 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 3 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 4 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Viaarxiv icon

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis

Add code
May 03, 2023
Figure 1 for M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis
Figure 2 for M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis
Figure 3 for M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis
Viaarxiv icon

ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

Add code
Mar 26, 2022
Figure 1 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Figure 2 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Figure 3 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Figure 4 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Viaarxiv icon

Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection

Add code
Mar 04, 2022
Figure 1 for Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection
Figure 2 for Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection
Figure 3 for Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection
Figure 4 for Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection
Viaarxiv icon