Picture for Heinrich Dinkel

Heinrich Dinkel

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

Add code
Jun 19, 2024
Figure 1 for Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Figure 2 for Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Figure 3 for Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Figure 4 for Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Viaarxiv icon

Scaling up masked audio encoder learning for general audio classification

Add code
Jun 11, 2024
Viaarxiv icon

Bridging Language Gaps in Audio-Text Retrieval

Add code
Jun 11, 2024
Figure 1 for Bridging Language Gaps in Audio-Text Retrieval
Figure 2 for Bridging Language Gaps in Audio-Text Retrieval
Figure 3 for Bridging Language Gaps in Audio-Text Retrieval
Figure 4 for Bridging Language Gaps in Audio-Text Retrieval
Viaarxiv icon

CED: Consistent ensemble distillation for audio tagging

Add code
Sep 08, 2023
Figure 1 for CED: Consistent ensemble distillation for audio tagging
Figure 2 for CED: Consistent ensemble distillation for audio tagging
Figure 3 for CED: Consistent ensemble distillation for audio tagging
Figure 4 for CED: Consistent ensemble distillation for audio tagging
Viaarxiv icon

Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information

Add code
Jun 28, 2023
Figure 1 for Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information
Figure 2 for Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information
Figure 3 for Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information
Figure 4 for Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information
Viaarxiv icon

AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction

Add code
Jun 25, 2023
Figure 1 for AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Figure 2 for AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Figure 3 for AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Figure 4 for AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Viaarxiv icon

Understanding temporally weakly supervised training: A case study for keyword spotting

Add code
May 30, 2023
Figure 1 for Understanding temporally weakly supervised training: A case study for keyword spotting
Figure 2 for Understanding temporally weakly supervised training: A case study for keyword spotting
Figure 3 for Understanding temporally weakly supervised training: A case study for keyword spotting
Figure 4 for Understanding temporally weakly supervised training: A case study for keyword spotting
Viaarxiv icon

Streaming Audio Transformers for Online Audio Tagging

Add code
May 29, 2023
Figure 1 for Streaming Audio Transformers for Online Audio Tagging
Figure 2 for Streaming Audio Transformers for Online Audio Tagging
Figure 3 for Streaming Audio Transformers for Online Audio Tagging
Figure 4 for Streaming Audio Transformers for Online Audio Tagging
Viaarxiv icon

Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers

Add code
Mar 03, 2023
Figure 1 for Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Figure 2 for Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Figure 3 for Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Figure 4 for Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Viaarxiv icon

An empirical study of weakly supervised audio tagging embeddings for general audio representations

Add code
Sep 30, 2022
Figure 1 for An empirical study of weakly supervised audio tagging embeddings for general audio representations
Figure 2 for An empirical study of weakly supervised audio tagging embeddings for general audio representations
Figure 3 for An empirical study of weakly supervised audio tagging embeddings for general audio representations
Figure 4 for An empirical study of weakly supervised audio tagging embeddings for general audio representations
Viaarxiv icon