Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marion Baranes

Beyond Musical Descriptors: Extracting Preference-Bearing Intent in Music Queries

Feb 11, 2026

Marion Baranes, Romain Hennequin, Elena V. Epure

Abstract:Although annotated music descriptor datasets for user queries are increasingly common, few consider the user's intent behind these descriptors, which is essential for effectively meeting their needs. We introduce MusicRecoIntent, a manually annotated corpus of 2,291 Reddit music requests, labeling musical descriptors across seven categories with positive, negative, or referential preference-bearing roles. We then investigate how reliably large language models (LLMs) can extract these music descriptors, finding that they do capture explicit descriptors but struggle with context-dependent ones. This work can further serve as a benchmark for fine-grained modeling of user intent and for gaining insights into improving LLM-based music understanding systems.

* Accepted at NLP4MusA 2026 (4th Workshop on NLP for Music and Audio)

Via

Access Paper or Ask Questions

Topic Modeling on Podcast Short-Text Metadata

Jan 12, 2022

Francisco B. Valero, Marion Baranes, Elena V. Epure

Figure 1 for Topic Modeling on Podcast Short-Text Metadata

Figure 2 for Topic Modeling on Podcast Short-Text Metadata

Figure 3 for Topic Modeling on Podcast Short-Text Metadata

Figure 4 for Topic Modeling on Podcast Short-Text Metadata

Abstract:Podcasts have emerged as a massively consumed online content, notably due to wider accessibility of production means and scaled distribution through large streaming platforms. Categorization systems and information access technologies typically use topics as the primary way to organize or navigate podcast collections. However, annotating podcasts with topics is still quite problematic because the assigned editorial genres are broad, heterogeneous or misleading, or because of data challenges (e.g. short metadata text, noisy transcripts). Here, we assess the feasibility to discover relevant topics from podcast metadata, titles and descriptions, using topic modeling techniques for short text. We also propose a new strategy to leverage named entities (NEs), often present in podcast metadata, in a Non-negative Matrix Factorization (NMF) topic modeling framework. Our experiments on two existing datasets from Spotify and iTunes and Deezer, a new dataset from an online service providing a catalog of podcasts, show that our proposed document representation, NEiCE, leads to improved topic coherence over the baselines. We release the code for experimental reproducibility of the results.

* Accepted for publication in the 44nd European Conference on Information Retrieval (ECIR'22)

Via

Access Paper or Ask Questions