Abstract:In modern recommender systems, experimental settings typically include filtering out cold users and items based on a minimum interaction threshold. However, these thresholds are often chosen arbitrarily and vary widely across studies, leading to inconsistencies that can significantly affect the comparability and reliability of evaluation results. In this paper, we systematically explore the cold-start boundary by examining the criteria used to determine whether a user or an item should be considered cold. Our experiments incrementally vary the number of interactions for different items during training, and gradually update the length of user interaction histories during inference. We investigate the thresholds across several widely used datasets, commonly represented in recent papers from top-tier conferences, and on multiple established recommender baselines. Our findings show that inconsistent selection of cold-start thresholds can either result in the unnecessary removal of valuable data or lead to the misclassification of cold instances as warm, introducing more noise into the system.
Abstract:Cold-start challenges in recommender systems necessitate leveraging auxiliary features beyond user-item interactions. However, the presence of irrelevant or noisy features can degrade predictive performance, whereas an excessive number of features increases computational demands, leading to higher memory consumption and prolonged training times. To address this, we propose a feature selection strategy that prioritizes the user behavioral information. Our method enhances the feature representation by incorporating correlations from collaborative behavior data using a hybrid matrix factorization technique and then ranks features using a mechanism based on the maximum volume algorithm. This approach identifies the most influential features, striking a balance between recommendation accuracy and computational efficiency. We conduct an extensive evaluation across various datasets and hybrid recommendation models, demonstrating that our method excels in cold-start scenarios by selecting minimal yet highly effective feature subsets. Even under strict feature reduction, our approach surpasses existing feature selection techniques while maintaining superior efficiency.