Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tobias Vente

Green Recommender Systems: Understanding and Minimizing the Carbon Footprint of AI-Powered Personalization

Sep 16, 2025

Lukas Wegmeth, Tobias Vente, Alan Said, Joeran Beel

Abstract:As global warming soars, the need to assess and reduce the environmental impact of recommender systems is becoming increasingly urgent. Despite this, the recommender systems community hardly understands, addresses, and evaluates the environmental impact of their work. In this study, we examine the environmental impact of recommender systems research by reproducing typical experimental pipelines. Based on our results, we provide guidelines for researchers and practitioners on how to minimize the environmental footprint of their work and implement green recommender systems - recommender systems designed to minimize their energy consumption and carbon footprint. Our analysis covers 79 papers from the 2013 and 2023 ACM RecSys conferences, comparing traditional "good old-fashioned AI" models with modern deep learning models. We designed and reproduced representative experimental pipelines for both years, measuring energy consumption using a hardware energy meter and converting it into CO2 equivalents. Our results show that papers utilizing deep learning models emit approximately 42 times more CO2 equivalents than papers using traditional models. On average, a single deep learning-based paper generates 2,909 kilograms of CO2 equivalents - more than the carbon emissions of a person flying from New York City to Melbourne or the amount of CO2 sequestered by one tree over 260 years. This work underscores the urgent need for the recommender systems and wider machine learning communities to adopt green AI principles, balancing algorithmic advancements and environmental responsibility to build a sustainable future with AI-powered personalization.

* Just Accepted at ACM TORS. arXiv admin note: substantial text overlap with arXiv:2408.08203

Via

Access Paper or Ask Questions

APS Explorer: Navigating Algorithm Performance Spaces for Informed Dataset Selection

Aug 26, 2025

Tobias Vente, Michael Heep, Abdullah Abbas, Theodor Sperle, Joeran Beel, Bart Goethals

Abstract:Dataset selection is crucial for offline recommender system experiments, as mismatched data (e.g., sparse interaction scenarios require datasets with low user-item density) can lead to unreliable results. Yet, 86\% of ACM RecSys 2024 papers provide no justification for their dataset choices, with most relying on just four datasets: Amazon (38\%), MovieLens (34\%), Yelp (15\%), and Gowalla (12\%). While Algorithm Performance Spaces (APS) were proposed to guide dataset selection, their adoption has been limited due to the absence of an intuitive, interactive tool for APS exploration. Therefore, we introduce the APS Explorer, a web-based visualization tool for interactive APS exploration, enabling data-driven dataset selection. The APS Explorer provides three interactive features: (1) an interactive PCA plot showing dataset similarity via performance patterns, (2) a dynamic meta-feature table for dataset comparisons, and (3) a specialized visualization for pairwise algorithm performance.

Via

Access Paper or Ask Questions

Optimal Dataset Size for Recommender Systems: Evaluating Algorithms' Performance via Downsampling

Feb 12, 2025

Ardalan Arabzadeh, Joeran Beel, Tobias Vente

Abstract:This thesis investigates dataset downsampling as a strategy to optimize energy efficiency in recommender systems while maintaining competitive performance. With increasing dataset sizes posing computational and environmental challenges, this study explores the trade-offs between energy efficiency and recommendation quality in Green Recommender Systems, which aim to reduce environmental impact. By applying two downsampling approaches to seven datasets, 12 algorithms, and two levels of core pruning, the research demonstrates significant reductions in runtime and carbon emissions. For example, a 30% downsampling portion can reduce runtime by 52% compared to the full dataset, leading to a carbon emission reduction of up to 51.02 KgCO2e during the training of a single algorithm on a single dataset. The analysis reveals that algorithm performance under different downsampling portions depends on factors like dataset characteristics, algorithm complexity, and the specific downsampling configuration (scenario dependent). Some algorithms, which showed lower nDCG@10 scores compared to higher-performing ones, exhibited lower sensitivity to the amount of training data, offering greater potential for efficiency in lower downsampling portions. On average, these algorithms retained 81% of full-size performance using only 50% of the training set. In certain downsampling configurations, where more users were progressively included while keeping the test set size fixed, they even showed higher nDCG@10 scores than when using the full dataset. These findings highlight the feasibility of balancing sustainability and effectiveness, providing insights for designing energy-efficient recommender systems and promoting sustainable AI practices.

Via

Access Paper or Ask Questions

e-Fold Cross-Validation for Recommender-System Evaluation

Dec 02, 2024

Moritz Baumgart, Lukas Wegmeth, Tobias Vente, Joeran Beel

Abstract:To combat the rising energy consumption of recommender systems we implement a novel alternative for k-fold cross validation. This alternative, named e-fold cross validation, aims to minimize the number of folds to achieve a reduction in power usage while keeping the reliability and robustness of the test results high. We tested our method on 5 recommender system algorithms across 6 datasets and compared it with 10-fold cross validation. On average e-fold cross validation only needed 41.5% of the energy that 10-fold cross validation would need, while it's results only differed by 1.81%. We conclude that e-fold cross validation is a promising approach that has the potential to be an energy efficient but still reliable alternative to k-fold cross validation.

* This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. The Version of Record of this contribution is published in [TBA], and is available online at [TBA]

Via

Access Paper or Ask Questions

e-Fold Cross-Validation for energy-aware Machine Learning Evaluations

Oct 12, 2024

Christopher Mahlich, Tobias Vente, Joeran Beel

Abstract:This paper introduces e-fold cross-validation, an energy-efficient alternative to k-fold cross-validation. It dynamically adjusts the number of folds based on a stopping criterion. The criterion checks after each fold whether the standard deviation of the evaluated folds has consistently decreased or remained stable. Once met, the process stops early. We tested e-fold cross-validation on 15 datasets and 10 machine-learning algorithms. On average, it required 4 fewer folds than 10-fold cross-validation, reducing evaluation time, computational resources, and energy use by about 40%. Performance differences between e-fold and 10-fold cross-validation were less than 2% for larger datasets. More complex models showed even smaller discrepancies. In 96% of iterations, the results were within the confidence interval, confirming statistical significance. E-fold cross-validation offers a reliable and efficient alternative to k-fold, reducing computational costs while maintaining accuracy.

Via

Access Paper or Ask Questions

Green Recommender Systems: Optimizing Dataset Size for Energy-Efficient Algorithm Performance

Oct 12, 2024

Ardalan Arabzadeh, Tobias Vente, Joeran Beel

Abstract:As recommender systems become increasingly prevalent, the environmental impact and energy efficiency of training large-scale models have come under scrutiny. This paper investigates the potential for energy-efficient algorithm performance by optimizing dataset sizes through downsampling techniques in the context of Green Recommender Systems. We conducted experiments on the MovieLens 100K, 1M, 10M, and Amazon Toys and Games datasets, analyzing the performance of various recommender algorithms under different portions of dataset size. Our results indicate that while more training data generally leads to higher algorithm performance, certain algorithms, such as FunkSVD and BiasedMF, particularly with unbalanced and sparse datasets like Amazon Toys and Games, maintain high-quality recommendations with up to a 50% reduction in training data, achieving nDCG@10 scores within approximately 13% of full dataset performance. These findings suggest that strategic dataset reduction can decrease computational and environmental costs without substantially compromising recommendation quality. This study advances sustainable and green recommender systems by providing insights for reducing energy consumption while maintaining effectiveness.

* 2024. International Workshop on Recommender Systems for Sustainability and Social Good (RecSoGood) at the 18th ACM Conference on Recommender Systems (ACM RecSys)

Via

Access Paper or Ask Questions

Recommender Systems Algorithm Selection for Ranking Prediction on Implicit Feedback Datasets

Sep 09, 2024

Lukas Wegmeth, Tobias Vente, Joeran Beel

Abstract:The recommender systems algorithm selection problem for ranking prediction on implicit feedback datasets is under-explored. Traditional approaches in recommender systems algorithm selection focus predominantly on rating prediction on explicit feedback datasets, leaving a research gap for ranking prediction on implicit feedback datasets. Algorithm selection is a critical challenge for nearly every practitioner in recommender systems. In this work, we take the first steps toward addressing this research gap. We evaluate the NDCG@10 of 24 recommender systems algorithms, each with two hyperparameter configurations, on 72 recommender systems datasets. We train four optimized machine-learning meta-models and one automated machine-learning meta-model with three different settings on the resulting meta-dataset. Our results show that the predictions of all tested meta-models exhibit a median Spearman correlation ranging from 0.857 to 0.918 with the ground truth. We show that the median Spearman correlation between meta-model predictions and the ground truth increases by an average of 0.124 when the meta-model is optimized to predict the ranking of algorithms instead of their performance. Furthermore, in terms of predicting the best algorithm for an unknown dataset, we demonstrate that the best optimized traditional meta-model, e.g., XGBoost, achieves a recall of 48.6%, outperforming the best tested automated machine learning meta-model, e.g., AutoGluon, which achieves a recall of 47.2%.

* Accepted for presentation at the 18th ACM Conference on Recommender Systems in the Late-Breaking Results Track

Via

Access Paper or Ask Questions

From Clicks to Carbon: The Environmental Toll of Recommender Systems

Aug 15, 2024

Tobias Vente, Lukas Wegmeth, Alan Said, Joeran Beel

Abstract:As global warming soars, evaluating the environmental impact of research is more critical now than ever before. However, we find that few to no recommender systems research papers document their impact on the environment. Consequently, in this paper, we conduct a comprehensive analysis of the environmental impact of recommender system research by reproducing a characteristic recommender systems experimental pipeline. We focus on estimating the carbon footprint of recommender systems research papers, highlighting the evolution of the environmental impact of recommender systems research experiments over time. We thoroughly evaluated all 79 full papers from the ACM RecSys conference in the years 2013 and 2023 to analyze representative experimental pipelines for papers utilizing traditional, so-called good old-fashioned AI algorithms and deep learning algorithms, respectively. We reproduced these representative experimental pipelines, measured electricity consumption using a hardware energy meter, and converted the measured energy consumption into CO2 equivalents to estimate the environmental impact. Our results show that a recommender systems research paper utilizing deep learning algorithms emits approximately 42 times more CO2 equivalents than a paper utilizing traditional algorithms. Furthermore, on average, such a paper produces 3,297 kilograms of CO2 equivalents, which is more than one person produces by flying from New York City to Melbourne or the amount one tree sequesters in 300 years.

* Accepted for presentation at the 18th ACM Conference on Recommender Systems in the Reproducibility Track

Via

Access Paper or Ask Questions

Ensemble Boost: Greedy Selection for Superior Recommender Systems

Jul 07, 2024

Zainil Mehta, Tobias Vente

Figure 1 for Ensemble Boost: Greedy Selection for Superior Recommender Systems

Figure 2 for Ensemble Boost: Greedy Selection for Superior Recommender Systems

Figure 3 for Ensemble Boost: Greedy Selection for Superior Recommender Systems

Figure 4 for Ensemble Boost: Greedy Selection for Superior Recommender Systems

Abstract:Ensemble techniques have demonstrated remarkable success in improving predictive performance across various domains by aggregating predictions from multiple models [1]. In the realm of recommender systems, this research explores the application of ensemble technique to enhance recommendation quality. Specifically, we propose a novel approach to combine top-k recommendations from ten diverse recommendation models resulting in superior top-n recommendations using this novel ensemble technique. Our method leverages a Greedy Ensemble Selection(GES) strategy, effectively harnessing the collective intelligence of multiple models. We conduct experiments on five distinct datasets to evaluate the effectiveness of our approach. Evaluation across five folds using the NDCG metric reveals significant improvements in recommendation accuracy across all datasets compared to single best performing model. Furthermore, comprehensive comparisons against existing models underscore the efficacy of our ensemble approach in enhancing recommendation quality. Our ensemble approach yielded an average improvement of 21.67% across different NDCG@N metrics and the five datasets, compared to single best model. The popularity recommendation model serves as the baseline for comparison. This research contributes to the advancement of ensemble-based recommender systems, offering insights into the potential of combining diverse recommendation strategies to enhance user experience and satisfaction. By presenting a novel approach and demonstrating its superiority over existing methods, we aim to inspire further exploration and innovation in this domain.

Via

Access Paper or Ask Questions

The Potential of AutoML for Recommender Systems

Feb 06, 2024

Tobias Vente, Joeran Beel

Abstract:Automated Machine Learning (AutoML) has greatly advanced applications of Machine Learning (ML) including model compression, machine translation, and computer vision. Recommender Systems (RecSys) can be seen as an application of ML. Yet, AutoML has found little attention in the RecSys community; nor has RecSys found notable attention in the AutoML community. Only few and relatively simple Automated Recommender Systems (AutoRecSys) libraries exist that adopt AutoML techniques. However, these libraries are based on student projects and do not offer the features and thorough development of AutoML libraries. We set out to determine how AutoML libraries perform in the scenario of an inexperienced user who wants to implement a recommender system. We compared the predictive performance of 60 AutoML, AutoRecSys, ML, and RecSys algorithms from 15 libraries, including a mean predictor baseline, on 14 explicit feedback RecSys datasets. To simulate the perspective of an inexperienced user, the algorithms were evaluated with default hyperparameters. We found that AutoML and AutoRecSys libraries performed best. AutoML libraries performed best for six of the 14 datasets (43%), but it was not always the same AutoML library performing best. The single-best library was the AutoRecSys library Auto-Surprise, which performed best on five datasets (36%). On three datasets (21%), AutoML libraries performed poorly, and RecSys libraries with default parameters performed best. Although, while obtaining 50% of all placements in the top five per dataset, RecSys algorithms fall behind AutoML on average. ML algorithms generally performed the worst.

Via

Access Paper or Ask Questions