Picture for Arda Senocak

Arda Senocak

Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions

Add code
Apr 13, 2026
Viaarxiv icon

Cinematic Audio Source Separation Using Visual Cues

Add code
Mar 27, 2026
Viaarxiv icon

Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization

Add code
May 08, 2025
Figure 1 for Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Figure 2 for Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Figure 3 for Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Figure 4 for Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Viaarxiv icon

Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes

Add code
Mar 24, 2025
Viaarxiv icon

Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment

Add code
Dec 09, 2024
Figure 1 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Figure 2 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Figure 3 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Figure 4 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Viaarxiv icon

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Add code
Oct 23, 2024
Figure 1 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 2 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 3 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 4 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Viaarxiv icon

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Add code
Jul 18, 2024
Viaarxiv icon

ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions

Add code
Jul 11, 2024
Figure 1 for ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
Figure 2 for ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
Figure 3 for ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
Figure 4 for ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
Viaarxiv icon

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Add code
Jun 05, 2024
Viaarxiv icon

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers

Add code
Jan 16, 2024
Figure 1 for From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers
Figure 2 for From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers
Figure 3 for From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers
Figure 4 for From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers
Viaarxiv icon