Picture for Fengyu Yang

Fengyu Yang

Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control

Add code
Dec 29, 2024
Viaarxiv icon

PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation

Add code
Nov 24, 2024
Figure 1 for PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
Figure 2 for PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
Figure 3 for PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
Figure 4 for PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
Viaarxiv icon

Differentiable Gaussian Representation for Incomplete CT Reconstruction

Add code
Nov 07, 2024
Viaarxiv icon

Differentiation Through Black-Box Quadratic Programming Solvers

Add code
Oct 10, 2024
Viaarxiv icon

RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions

Add code
Oct 03, 2024
Figure 1 for RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions
Figure 2 for RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions
Figure 3 for RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions
Figure 4 for RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions
Viaarxiv icon

TextToucher: Fine-Grained Text-to-Touch Generation

Add code
Sep 09, 2024
Figure 1 for TextToucher: Fine-Grained Text-to-Touch Generation
Figure 2 for TextToucher: Fine-Grained Text-to-Touch Generation
Figure 3 for TextToucher: Fine-Grained Text-to-Touch Generation
Figure 4 for TextToucher: Fine-Grained Text-to-Touch Generation
Viaarxiv icon

NeuroBind: Towards Unified Multimodal Representations for Neural Signals

Add code
Jul 19, 2024
Viaarxiv icon

Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling

Add code
Jun 11, 2024
Figure 1 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Figure 2 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Figure 3 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Figure 4 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Viaarxiv icon

Tactile-Augmented Radiance Fields

Add code
May 07, 2024
Figure 1 for Tactile-Augmented Radiance Fields
Figure 2 for Tactile-Augmented Radiance Fields
Figure 3 for Tactile-Augmented Radiance Fields
Figure 4 for Tactile-Augmented Radiance Fields
Viaarxiv icon

WorDepth: Variational Language Prior for Monocular Depth Estimation

Add code
Apr 05, 2024
Viaarxiv icon