Picture for Tao Wang

Tao Wang

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Add code
Sep 18, 2024
Figure 1 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 2 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 3 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 4 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Viaarxiv icon

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification

Add code
Sep 18, 2024
Figure 1 for WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
Figure 2 for WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
Figure 3 for WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
Figure 4 for WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
Viaarxiv icon

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation

Add code
Sep 14, 2024
Figure 1 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 2 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 3 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 4 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Viaarxiv icon

LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection

Add code
Sep 03, 2024
Figure 1 for LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection
Figure 2 for LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection
Figure 3 for LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection
Figure 4 for LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection
Viaarxiv icon

Multi-modal Adversarial Training for Zero-Shot Voice Cloning

Add code
Aug 28, 2024
Viaarxiv icon

GrassNet: State Space Model Meets Graph Neural Network

Add code
Aug 16, 2024
Figure 1 for GrassNet: State Space Model Meets Graph Neural Network
Figure 2 for GrassNet: State Space Model Meets Graph Neural Network
Figure 3 for GrassNet: State Space Model Meets Graph Neural Network
Figure 4 for GrassNet: State Space Model Meets Graph Neural Network
Viaarxiv icon

DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution

Add code
Aug 15, 2024
Figure 1 for DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution
Figure 2 for DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution
Figure 3 for DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution
Figure 4 for DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution
Viaarxiv icon

CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning

Add code
Aug 15, 2024
Figure 1 for CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning
Figure 2 for CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning
Figure 3 for CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning
Figure 4 for CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning
Viaarxiv icon

P/D-Serve: Serving Disaggregated Large Language Model at Scale

Add code
Aug 15, 2024
Figure 1 for P/D-Serve: Serving Disaggregated Large Language Model at Scale
Figure 2 for P/D-Serve: Serving Disaggregated Large Language Model at Scale
Figure 3 for P/D-Serve: Serving Disaggregated Large Language Model at Scale
Figure 4 for P/D-Serve: Serving Disaggregated Large Language Model at Scale
Viaarxiv icon

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

Add code
Aug 11, 2024
Viaarxiv icon