Picture for Krishna Somandepalli

Krishna Somandepalli

A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

Add code
May 22, 2024
Viaarxiv icon

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Add code
Dec 21, 2023
Figure 1 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 2 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 3 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 4 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Viaarxiv icon

LanSER: Language-Model Supported Speech Emotion Recognition

Add code
Sep 07, 2023
Figure 1 for LanSER: Language-Model Supported Speech Emotion Recognition
Figure 2 for LanSER: Language-Model Supported Speech Emotion Recognition
Figure 3 for LanSER: Language-Model Supported Speech Emotion Recognition
Figure 4 for LanSER: Language-Model Supported Speech Emotion Recognition
Viaarxiv icon

MM-AU:Towards Multimodal Understanding of Advertisement Videos

Add code
Aug 27, 2023
Figure 1 for MM-AU:Towards Multimodal Understanding of Advertisement Videos
Figure 2 for MM-AU:Towards Multimodal Understanding of Advertisement Videos
Figure 3 for MM-AU:Towards Multimodal Understanding of Advertisement Videos
Figure 4 for MM-AU:Towards Multimodal Understanding of Advertisement Videos
Viaarxiv icon

Contextually-rich human affect perception using multimodal scene information

Add code
Mar 13, 2023
Figure 1 for Contextually-rich human affect perception using multimodal scene information
Figure 2 for Contextually-rich human affect perception using multimodal scene information
Figure 3 for Contextually-rich human affect perception using multimodal scene information
Figure 4 for Contextually-rich human affect perception using multimodal scene information
Viaarxiv icon

Heterogeneous Graph Learning for Acoustic Event Classification

Add code
Mar 12, 2023
Figure 1 for Heterogeneous Graph Learning for Acoustic Event Classification
Figure 2 for Heterogeneous Graph Learning for Acoustic Event Classification
Figure 3 for Heterogeneous Graph Learning for Acoustic Event Classification
Viaarxiv icon

A dataset for Audio-Visual Sound Event Detection in Movies

Add code
Feb 14, 2023
Figure 1 for A dataset for Audio-Visual Sound Event Detection in Movies
Figure 2 for A dataset for Audio-Visual Sound Event Detection in Movies
Figure 3 for A dataset for Audio-Visual Sound Event Detection in Movies
Figure 4 for A dataset for Audio-Visual Sound Event Detection in Movies
Viaarxiv icon

Visually-aware Acoustic Event Detection using Heterogeneous Graphs

Add code
Jul 16, 2022
Figure 1 for Visually-aware Acoustic Event Detection using Heterogeneous Graphs
Figure 2 for Visually-aware Acoustic Event Detection using Heterogeneous Graphs
Figure 3 for Visually-aware Acoustic Event Detection using Heterogeneous Graphs
Viaarxiv icon

Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers

Add code
Jun 24, 2022
Figure 1 for Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers
Figure 2 for Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers
Viaarxiv icon

To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition

Add code
Mar 17, 2022
Figure 1 for To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition
Figure 2 for To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition
Figure 3 for To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition
Figure 4 for To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition
Viaarxiv icon