Picture for Ruibo Fu

Ruibo Fu

Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning

Add code
May 28, 2025
Viaarxiv icon

Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model

Add code
May 19, 2025
Viaarxiv icon

Deconfounded Reasoning for Multimodal Fake News Detection via Causal Intervention

Add code
Apr 12, 2025
Viaarxiv icon

Exploring Modality Disruption in Multimodal Fake News Detection

Add code
Apr 12, 2025
Viaarxiv icon

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception

Add code
Apr 09, 2025
Viaarxiv icon

MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection

Add code
Jan 12, 2025
Viaarxiv icon

Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition

Add code
Jan 11, 2025
Figure 1 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Figure 2 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Figure 3 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Figure 4 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Viaarxiv icon

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation

Add code
Dec 11, 2024
Viaarxiv icon

LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis

Add code
Nov 24, 2024
Figure 1 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Figure 2 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Figure 3 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Figure 4 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Viaarxiv icon

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Add code
Sep 18, 2024
Figure 1 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 2 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 3 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 4 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Viaarxiv icon