Picture for Yapeng Tian

Yapeng Tian

SignLLM: Sign Languages Production Large Language Models

Add code
May 17, 2024
Figure 1 for SignLLM: Sign Languages Production Large Language Models
Figure 2 for SignLLM: Sign Languages Production Large Language Models
Figure 3 for SignLLM: Sign Languages Production Large Language Models
Figure 4 for SignLLM: Sign Languages Production Large Language Models
Viaarxiv icon

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures

Add code
Apr 02, 2024
Figure 1 for T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Figure 2 for T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Figure 3 for T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Figure 4 for T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Viaarxiv icon

Robust Active Speaker Detection in Noisy Environments

Add code
Mar 30, 2024
Viaarxiv icon

Text-to-Audio Generation Synchronized with Videos

Add code
Mar 08, 2024
Viaarxiv icon

OSCaR: Object State Captioning and State Change Representation

Add code
Feb 28, 2024
Figure 1 for OSCaR: Object State Captioning and State Change Representation
Figure 2 for OSCaR: Object State Captioning and State Change Representation
Figure 3 for OSCaR: Object State Captioning and State Change Representation
Figure 4 for OSCaR: Object State Captioning and State Change Representation
Viaarxiv icon

Efficiently Leveraging Linguistic Priors for Scene Text Spotting

Add code
Feb 27, 2024
Viaarxiv icon

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

Add code
Dec 21, 2023
Figure 1 for DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Figure 2 for DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Figure 3 for DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Figure 4 for DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Viaarxiv icon

Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning

Add code
Nov 02, 2023
Viaarxiv icon

LAVSS: Location-Guided Audio-Visual Spatial Audio Separation

Add code
Oct 31, 2023
Viaarxiv icon

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation

Add code
Oct 18, 2023
Figure 1 for Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Figure 2 for Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Figure 3 for Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Figure 4 for Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Viaarxiv icon