Picture for Hilde Kuehne

Hilde Kuehne

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

Add code
Jul 04, 2024
Viaarxiv icon

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Add code
Jun 14, 2024
Figure 1 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 2 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 3 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 4 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Viaarxiv icon

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

Add code
Apr 04, 2024
Viaarxiv icon

Uncertainty Quantification via Stable Distribution Propagation

Add code
Feb 13, 2024
Viaarxiv icon

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

Add code
Dec 05, 2023
Figure 1 for Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Figure 2 for Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Figure 3 for Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Figure 4 for Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Viaarxiv icon

Learning Human Action Recognition Representations Without Real Humans

Add code
Nov 10, 2023
Figure 1 for Learning Human Action Recognition Representations Without Real Humans
Figure 2 for Learning Human Action Recognition Representations Without Real Humans
Figure 3 for Learning Human Action Recognition Representations Without Real Humans
Figure 4 for Learning Human Action Recognition Representations Without Real Humans
Viaarxiv icon

HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

Add code
Oct 07, 2023
Figure 1 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 2 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 3 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 4 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Viaarxiv icon

In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval

Add code
Sep 16, 2023
Figure 1 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Figure 2 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Figure 3 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Figure 4 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Viaarxiv icon

Preserving Modality Structure Improves Multi-Modal Learning

Add code
Aug 24, 2023
Figure 1 for Preserving Modality Structure Improves Multi-Modal Learning
Figure 2 for Preserving Modality Structure Improves Multi-Modal Learning
Figure 3 for Preserving Modality Structure Improves Multi-Modal Learning
Figure 4 for Preserving Modality Structure Improves Multi-Modal Learning
Viaarxiv icon

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

Add code
May 21, 2023
Figure 1 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Figure 2 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Figure 3 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Viaarxiv icon