Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adithi Satish

Multi-Dimensional Evaluation of Sustainable City Trips with LLM-as-a-Judge and Human-in-the-Loop

Apr 27, 2026

Ashmi Banerjee, Adithi Satish, Wolfgang Wörndl, Yashar Deldjoo

Abstract:Evaluating nuanced conversational travel recommendations is challenging when human annotations are costly and standard metrics ignore stakeholder-centric goals. We study LLMs-as-Judges for sustainable city-trip lists across four dimensions -- relevance, diversity, sustainability, and popularity balance, and propose a three-phase calibration framework: (1) baseline judging with multiple LLMs, (2) expert evaluation to identify systematic misalignment, and (3) dimension-specific calibration via rules and few-shot examples. Across two recommendation settings, we observe model-specific biases and high dimension-level variance, even when judges agree on overall rankings. Calibration clarifies reasoning per dimension but exposes divergent interpretations of sustainability, highlighting the need for transparent, bias-aware LLM evaluation. Prompts and code are released for reproducibility: https://github.com/ashmibanerjee/trs-llm-calibration.

Via

Access Paper or Ask Questions

TRACE: A Conversational Framework for Sustainable Tourism Recommendation with Agentic Counterfactual Explanations

Apr 14, 2026

Ashmi Banerjee, Adithi Satish, Wolfgang Wörndl, Yashar Deldjoo

Abstract:Traditional conversational travel recommender systems primarily optimize for user relevance and convenience, often reinforcing popular, overcrowded destinations and carbon-intensive travel choices. To address this, we present TRACE (Tourism Recommendation with Agentic Counterfactual Explanations), a multi-agent, LLM-based framework that promotes sustainable tourism through interactive nudging. TRACE uses a modular orchestrator-worker architecture where specialized agents elicit latent sustainability preferences, construct structured user personas, and generate recommendations that balance relevance with environmental impact. A key innovation lies in its use of agentic counterfactual explanations and LLM-driven clarifying questions, which together surface greener alternatives and refine understanding of intent, fostering user reflection without coercion. User studies and semantic alignment analyses demonstrate that TRACE effectively supports sustainable decision-making while preserving recommendation quality and interactive responsiveness. TRACE is implemented on Google's Agent Development Kit, with full code, Docker setup, prompts, and a publicly available demo video to ensure reproducibility. A project summary, including all resources, prompts, and demo access, is available at https://ashmibanerjee.github.io/trace-chatbot.

* Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '26), July 20--24, 2026, Melbourne, VIC, Australia

Via

Access Paper or Ask Questions

Collab-REC: An LLM-based Agentic Framework for Balancing Recommendations in Tourism

Aug 20, 2025

Ashmi Banerjee, Fitri Nur Aisyah, Adithi Satish, Wolfgang Wörndl, Yashar Deldjoo

Abstract:We propose Collab-REC, a multi-agent framework designed to counteract popularity bias and enhance diversity in tourism recommendations. In our setting, three LLM-based agents -- Personalization, Popularity, and Sustainability generate city suggestions from complementary perspectives. A non-LLM moderator then merges and refines these proposals via multi-round negotiation, ensuring each agent's viewpoint is incorporated while penalizing spurious or repeated responses. Experiments on European city queries show that Collab-REC improves diversity and overall relevance compared to a single-agent baseline, surfacing lesser-visited locales that often remain overlooked. This balanced, context-aware approach addresses over-tourism and better aligns with constraints provided by the user, highlighting the promise of multi-stakeholder collaboration in LLM-driven recommender systems.

Via

Access Paper or Ask Questions

SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders

Apr 12, 2025

Ashmi Banerjee, Adithi Satish, Fitri Nur Aisyah, Wolfgang Wörndl, Yashar Deldjoo

Abstract:Tourism Recommender Systems (TRS) are crucial in personalizing travel experiences by tailoring recommendations to users' preferences, constraints, and contextual factors. However, publicly available travel datasets often lack sufficient breadth and depth, limiting their ability to support advanced personalization strategies -- particularly for sustainable travel and off-peak tourism. In this work, we explore using Large Language Models (LLMs) to generate synthetic travel queries that emulate diverse user personas and incorporate structured filters such as budget constraints and sustainability preferences. This paper introduces a novel SynthTRIPs framework for generating synthetic travel queries using LLMs grounded in a curated knowledge base (KB). Our approach combines persona-based preferences (e.g., budget, travel style) with explicit sustainability filters (e.g., walkability, air quality) to produce realistic and diverse queries. We mitigate hallucination and ensure factual correctness by grounding the LLM responses in the KB. We formalize the query generation process and introduce evaluation metrics for assessing realism and alignment. Both human expert evaluations and automatic LLM-based assessments demonstrate the effectiveness of our synthetic dataset in capturing complex personalization aspects underrepresented in existing datasets. While our framework was developed and tested for personalized city trip recommendations, the methodology applies to other recommender system domains. Code and dataset are made public at https://bit.ly/synthTRIPs

* Accepted for publication at SIGIR 2025

Via

Access Paper or Ask Questions

Enhancing Tourism Recommender Systems for Sustainable City Trips Using Retrieval-Augmented Generation

Sep 26, 2024

Ashmi Banerjee, Adithi Satish, Wolfgang Wörndl

Abstract:Tourism Recommender Systems (TRS) have traditionally focused on providing personalized travel suggestions, often prioritizing user preferences without considering broader sustainability goals. Integrating sustainability into TRS has become essential with the increasing need to balance environmental impact, local community interests, and visitor satisfaction. This paper proposes a novel approach to enhancing TRS for sustainable city trips using Large Language Models (LLMs) and a modified Retrieval-Augmented Generation (RAG) pipeline. We enhance the traditional RAG system by incorporating a sustainability metric based on a city's popularity and seasonal demand during the prompt augmentation phase. This modification, called Sustainability Augmented Reranking (SAR), ensures the system's recommendations align with sustainability goals. Evaluations using popular open-source LLMs, such as Llama-3.1-Instruct-8B and Mistral-Instruct-7B, demonstrate that the SAR-enhanced approach consistently matches or outperforms the baseline (without SAR) across most metrics, highlighting the benefits of incorporating sustainability into TRS.

* Accepted at the RecSoGood 2024 Workshop co-located with the 18th ACM Conference on Recommender Systems (RecSys 2024)

Via

Access Paper or Ask Questions

Beats of Bias: Analyzing Lyrics with Topic Modeling and Gender Bias Measurements

Sep 24, 2024

Danqing Chen, Adithi Satish, Rasul Khanbayov, Carolin M. Schuster, Georg Groh

Figure 1 for Beats of Bias: Analyzing Lyrics with Topic Modeling and Gender Bias Measurements

Figure 2 for Beats of Bias: Analyzing Lyrics with Topic Modeling and Gender Bias Measurements

Figure 3 for Beats of Bias: Analyzing Lyrics with Topic Modeling and Gender Bias Measurements

Figure 4 for Beats of Bias: Analyzing Lyrics with Topic Modeling and Gender Bias Measurements

Abstract:This paper uses topic modeling and bias measurement techniques to analyze and determine gender bias in English song lyrics. We utilize BERTopic to cluster 537,553 English songs into distinct topics and chart their development over time. Our analysis shows the thematic shift in song lyrics over the years, from themes of romance to the increasing sexualization of women in songs. We observe large amounts of profanity and misogynistic lyrics on various topics, especially in the overall biggest cluster. Furthermore, to analyze gender bias across topics and genres, we employ the Single Category Word Embedding Association Test (SC-WEAT) to compute bias scores for the word embeddings trained on the most popular topics as well as for each genre. We find that words related to intelligence and strength tend to show a male bias across genres, as opposed to appearance and weakness words, which are more female-biased; however, a closer look also reveals differences in biases across topics.

* Accepted and presented at the 17th International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation (see https://sbp-brims.org/2024/papers/working-papers/Chen_SBP-BRiMS2024_Final_31.pdf )

Via

Access Paper or Ask Questions