Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sahitya Mantravadi

Contextualizing Spotify's Audiobook List Recommendations with Descriptive Shelves

Apr 18, 2025

Gustavo Penha, Alice Wang, Martin Achenbach, Kristen Sheets, Sahitya Mantravadi, Remi Galvez, Nico Guetta-Jeanrenaud, Divya Narayanan, Ofeliya Kalaydzhyan, Hugues Bouchard

Abstract:In this paper, we propose a pipeline to generate contextualized list recommendations with descriptive shelves in the domain of audiobooks. By creating several shelves for topics the user has an affinity to, e.g. Uplifting Women's Fiction, we can help them explore their recommendations according to their interests and at the same time recommend a diverse set of items. To do so, we use Large Language Models (LLMs) to enrich each item's metadata based on a taxonomy created for this domain. Then we create diverse descriptive shelves for each user. A/B tests show improvements in user engagement and audiobook discovery metrics, demonstrating benefits for users and content creators.

* Accepted for publication in the 47th European Conference on Information Retrieval (ECIR'25)

Via

Access Paper or Ask Questions

PODTILE: Facilitating Podcast Episode Browsing with Auto-generated Chapters

Oct 21, 2024

Azin Ghazimatin, Ekaterina Garmash, Gustavo Penha, Kristen Sheets, Martin Achenbach, Oguz Semerci, Remi Galvez, Marcus Tannenberg, Sahitya Mantravadi, Divya Narayanan(+7 more)

Abstract:Listeners of long-form talk-audio content, such as podcast episodes, often find it challenging to understand the overall structure and locate relevant sections. A practical solution is to divide episodes into chapters--semantically coherent segments labeled with titles and timestamps. Since most episodes on our platform at Spotify currently lack creator-provided chapters, automating the creation of chapters is essential. Scaling the chapterization of podcast episodes presents unique challenges. First, episodes tend to be less structured than written texts, featuring spontaneous discussions with nuanced transitions. Second, the transcripts are usually lengthy, averaging about 16,000 tokens, which necessitates efficient processing that can preserve context. To address these challenges, we introduce PODTILE, a fine-tuned encoder-decoder transformer to segment conversational data. The model simultaneously generates chapter transitions and titles for the input transcript. To preserve context, each input text is augmented with global context, including the episode's title, description, and previous chapter titles. In our intrinsic evaluation, PODTILE achieved an 11% improvement in ROUGE score over the strongest baseline. Additionally, we provide insights into the practical benefits of auto-generated chapters for listeners navigating episode content. Our findings indicate that auto-generated chapters serve as a useful tool for engaging with less popular podcasts. Finally, we present empirical evidence that using chapter titles can enhance effectiveness of sparse retrieval in search tasks.

* 9 pages, 4 figures, CIKM industry track 2024

Via

Access Paper or Ask Questions

Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents

Aug 06, 2021

Amit Gupte, Alexey Romanov, Sahitya Mantravadi, Dalitso Banda, Jianjie Liu, Raza Khan, Lakshmanan Ramu Meenal, Benjamin Han, Soundar Srinivasan

Figure 1 for Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents

Figure 2 for Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents

Figure 3 for Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents

Figure 4 for Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents

Abstract:Document digitization is essential for the digital transformation of our societies, yet a crucial step in the process, Optical Character Recognition (OCR), is still not perfect. Even commercial OCR systems can produce questionable output depending on the fidelity of the scanned documents. In this paper, we demonstrate an effective framework for mitigating OCR errors for any downstream NLP task, using Named Entity Recognition (NER) as an example. We first address the data scarcity problem for model training by constructing a document synthesis pipeline, generating realistic but degraded data with NER labels. We measure the NER accuracy drop at various degradation levels and show that a text restoration model, trained on the degraded data, significantly closes the NER accuracy gaps caused by OCR errors, including on an out-of-domain dataset. For the benefit of the community, we have made the document synthesis pipeline available as an open-source project.

* Accepted to the Document Intelligence Workshop at KDD 2021. The source code of Genalog is available at https://github.com/microsoft/genalog

Via

Access Paper or Ask Questions

Examination and Extension of Strategies for Improving Personalized Language Modeling via Interpolation

Jun 09, 2020

Liqun Shao, Sahitya Mantravadi, Tom Manzini, Alejandro Buendia, Manon Knoertzer, Soundar Srinivasan, Chris Quirk

Figure 1 for Examination and Extension of Strategies for Improving Personalized Language Modeling via Interpolation

Figure 2 for Examination and Extension of Strategies for Improving Personalized Language Modeling via Interpolation

Figure 3 for Examination and Extension of Strategies for Improving Personalized Language Modeling via Interpolation

Figure 4 for Examination and Extension of Strategies for Improving Personalized Language Modeling via Interpolation

Abstract:In this paper, we detail novel strategies for interpolating personalized language models and methods to handle out-of-vocabulary (OOV) tokens to improve personalized language models. Using publicly available data from Reddit, we demonstrate improvements in offline metrics at the user level by interpolating a global LSTM-based authoring model with a user-personalized n-gram model. By optimizing this approach with a back-off to uniform OOV penalty and the interpolation coefficient, we observe that over 80% of users receive a lift in perplexity, with an average of 5.2% in perplexity lift per user. In doing this research we extend previous work in building NLIs and improve the robustness of metrics for downstream tasks.

* ACL Natural Language Interface Workshop 2020, short paper

Via

Access Paper or Ask Questions