Picture for Yanzhen Ren

Yanzhen Ren

When AVSR Meets Video Conferencing: Dataset, Degradation, and the Hidden Mechanism Behind Performance Collapse

Add code
Mar 24, 2026
Viaarxiv icon

Robust Provably Secure Image Steganography via Latent Iterative Optimization

Add code
Mar 10, 2026
Viaarxiv icon

Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs

Add code
Jan 29, 2026
Viaarxiv icon

Audio-visual Event Localization on Portrait Mode Short Videos

Add code
Apr 09, 2025
Viaarxiv icon

SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity

Add code
Apr 08, 2025
Viaarxiv icon

Improving Speech Enhancement by Cross- and Sub-band Processing with State Space Model

Add code
Feb 22, 2025
Figure 1 for Improving Speech Enhancement by Cross- and Sub-band Processing with State Space Model
Figure 2 for Improving Speech Enhancement by Cross- and Sub-band Processing with State Space Model
Figure 3 for Improving Speech Enhancement by Cross- and Sub-band Processing with State Space Model
Figure 4 for Improving Speech Enhancement by Cross- and Sub-band Processing with State Space Model
Viaarxiv icon

FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder

Add code
Jul 05, 2024
Viaarxiv icon

Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description

Add code
Sep 28, 2023
Figure 1 for Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Figure 2 for Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Figure 3 for Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Figure 4 for Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Viaarxiv icon

A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis

Add code
Jul 25, 2023
Figure 1 for A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis
Figure 2 for A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis
Figure 3 for A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis
Figure 4 for A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis
Viaarxiv icon

Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

Add code
May 09, 2023
Figure 1 for Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
Figure 2 for Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
Figure 3 for Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
Figure 4 for Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
Viaarxiv icon