Picture for Tatsuya Komatsu

Tatsuya Komatsu

CASTELLA: Long Audio Dataset with Captions and Temporal Boundaries

Add code
Nov 19, 2025
Viaarxiv icon

Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos

Add code
Jul 16, 2025
Figure 1 for Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos
Figure 2 for Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos
Figure 3 for Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos
Figure 4 for Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos
Viaarxiv icon

Self-supervised learning method using multiple sampling strategies for general-purpose audio representation

Add code
May 25, 2025
Figure 1 for Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
Figure 2 for Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
Figure 3 for Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
Viaarxiv icon

Music Tagging with Classifier Group Chains

Add code
Jan 09, 2025
Figure 1 for Music Tagging with Classifier Group Chains
Figure 2 for Music Tagging with Classifier Group Chains
Figure 3 for Music Tagging with Classifier Group Chains
Figure 4 for Music Tagging with Classifier Group Chains
Viaarxiv icon

Pre-training with Synthetic Patterns for Audio

Add code
Oct 01, 2024
Figure 1 for Pre-training with Synthetic Patterns for Audio
Figure 2 for Pre-training with Synthetic Patterns for Audio
Figure 3 for Pre-training with Synthetic Patterns for Audio
Figure 4 for Pre-training with Synthetic Patterns for Audio
Viaarxiv icon

DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information

Add code
Sep 18, 2024
Figure 1 for DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
Figure 2 for DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
Figure 3 for DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
Figure 4 for DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
Viaarxiv icon

Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection

Add code
Aug 06, 2024
Figure 1 for Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection
Figure 2 for Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection
Figure 3 for Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection
Figure 4 for Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection
Viaarxiv icon

Audio Fingerprinting with Holographic Reduced Representations

Add code
Jun 19, 2024
Figure 1 for Audio Fingerprinting with Holographic Reduced Representations
Figure 2 for Audio Fingerprinting with Holographic Reduced Representations
Figure 3 for Audio Fingerprinting with Holographic Reduced Representations
Figure 4 for Audio Fingerprinting with Holographic Reduced Representations
Viaarxiv icon

Universal Score-based Speech Enhancement with High Content Preservation

Add code
Jun 18, 2024
Figure 1 for Universal Score-based Speech Enhancement with High Content Preservation
Figure 2 for Universal Score-based Speech Enhancement with High Content Preservation
Figure 3 for Universal Score-based Speech Enhancement with High Content Preservation
Figure 4 for Universal Score-based Speech Enhancement with High Content Preservation
Viaarxiv icon

Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers

Add code
Jan 22, 2024
Figure 1 for Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers
Figure 2 for Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers
Figure 3 for Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers
Figure 4 for Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers
Viaarxiv icon