



Abstract:Large language models (LLMs) have been garnering increasing attention in the recommendation community. Some studies have observed that LLMs, when fine-tuned by the cross-entropy (CE) loss with a full softmax, could achieve `state-of-the-art' performance in sequential recommendation. However, most of the baselines used for comparison are trained using a pointwise/pairwise loss function. This inconsistent experimental setting leads to the underestimation of traditional methods and further fosters over-confidence in the ranking capability of LLMs. In this study, we provide theoretical justification for the superiority of the cross-entropy loss by demonstrating its two desirable properties: tightness and coverage. Furthermore, this study sheds light on additional novel insights: 1) Taking into account only the recommendation performance, CE is not yet optimal as it is not a quite tight bound in terms of some ranking metrics. 2) In scenarios that full softmax cannot be performed, an effective alternative is to scale up the sampled normalizing term. These findings then help unleash the potential of traditional recommendation models, allowing them to surpass LLM-based counterparts. Given the substantial computational burden, existing LLM-based methods are not as effective as claimed for sequential recommendation. We hope that these theoretical understandings in conjunction with the empirical results will facilitate an objective evaluation of LLM-based recommendation in the future.
Abstract:Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as clustering and quality-based methods, to create representative subsets of benchmarks. Our approach ensures statistically aligned model rankings compared to full datasets, evidenced by high Pearson correlation coefficients. Empirical analysis across six NLP benchmarks reveals that: (1) quality-based sampling consistently achieves strong correlations (0.85 to 0.95) with full datasets at a 10\% sampling rate such as Quality SE and Quality CPD (2) clustering methods excel in specific benchmarks such as MMLU (3) no single method universally outperforms others across all metrics. Extending this framework, we leverage the HEIM leaderboard to cover 25 text-to-image models on 17 different benchmarks. SubLIME dynamically selects the optimal technique for each benchmark, significantly reducing evaluation costs while preserving ranking integrity and score distribution. Notably, a minimal sampling rate of 1% proves effective for benchmarks like MMLU. Additionally, we demonstrate that employing difficulty-based sampling to target more challenging benchmark segments enhances model differentiation with broader score distributions. We also combine semantic search, tool use, and GPT-4 review to identify redundancy across benchmarks within specific LLM categories, such as coding benchmarks. This allows us to further reduce the number of samples needed to maintain targeted rank preservation. Overall, SubLIME offers a versatile and cost-effective solution for the robust evaluation of LLMs and text-to-image models.




Abstract:Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbiased estimation of forest AGB at high resolution, particularly in dense and tall forests, where Synthetic Aperture Radar (SAR) and passive optical data exhibit saturation. However, GEDI is a sampling instrument, collecting dispersed footprints, and its data must be combined with that from other continuous cover satellites to create high-resolution maps, using local machine learning methods. In this study, we developed local models to estimate forest AGB from GEDI L2A data, as the models used to create GEDI L4 AGB data incorporated minimal field data from China. We then applied LightGBM and random forest regression to generate wall-to-wall AGB maps at 25 m resolution, using extensive GEDI footprints as well as Sentinel-1 data, ALOS-2 PALSAR-2 and Sentinel-2 optical data. Through a 5-fold cross-validation, LightGBM demonstrated a slightly better performance than Random Forest across two contrasting regions. However, in both regions, the computation speed of LightGBM is substantially faster than that of the random forest model, requiring roughly one-third of the time to compute on the same hardware. Through the validation against field data, the 25 m resolution AGB maps generated using the local models developed in this study exhibited higher accuracy compared to the GEDI L4B AGB data. We found in both regions an increase in error as slope increased. The trained models were tested on nearby but different regions and exhibited good performance.




Abstract:Recommender Systems (RSs) provide personalized recommendation service based on user interest, which are widely used in various platforms. However, there are lots of users with sparse interest due to lacking consumption behaviors, which leads to poor recommendation results for them. This problem is widespread in large-scale RSs and is particularly difficult to address. To solve this problem, we propose a novel solution named User Interest Enhancement (UIE) which enhances user interest including user profile and user history behavior sequences using the enhancement vectors and personalized enhancement vector generated based on stream clustering and memory networks from different perspectives. UIE not only remarkably improves model performance on the users with sparse interest but also significantly enhance model performance on other users. UIE is an end-to-end solution which is easy to be implemented based on ranking model. Moreover, we expand our solution and apply similar methods to long-tail items, which also achieves excellent improvement. Furthermore, we conduct extensive offline and online experiments in a large-scale industrial RS. The results demonstrate that our model outperforms other models remarkably, especially for the users with sparse interest. Until now, UIE has been fully deployed in multiple large-scale RSs and achieved remarkable improvements.




Abstract:Recommender Systems (RSs) are widely used to provide personalized recommendation service. As the last critical stage of RSs, Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which determines the ultimate recommendation results. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. However, the off-policy RL algorithms used for MTF so far have the following severe problems: 1) to avoid out-of-distribution (OOD) problem, their constraints are overly strict, which seriously damage their performance; 2) they are unaware of the exploration policy used for producing training data and never interact with real environment, so only suboptimal policy can be learned; 3) the traditional exploration policies are inefficient and hurt user experience. To solve the above problems, we propose a novel off-policy RL algorithm customized for MTF in large-scale RSs. Our RL-MTF algorithm integrates off-policy RL model with our online exploration policy to relax overstrict and complicated constraints, which significantly improves the performance of our RL model. We also design an extremely efficient exploration policy, which eliminates low-value exploration space and focuses on exploring potential high-value state-action pairs. Moreover, we adopt progressive training mode to further enhance our RL model's performance with the help of our exploration policy. We conduct extensive offline and online experiments in the short video channel of Tencent News. The results demonstrate that our RL-MTF model outperforms other models remarkably. Our RL-MTF model has been fully deployed in the short video channel of Tencent News for about one year. In addition, our solution has been used in other large-scale RSs in Tencent.




Abstract:A transparent decision-making process is essential for developing reliable and trustworthy recommender systems. For sequential recommendation, it means that the model can identify critical items asthe justifications for its recommendation results. However, achieving both model transparency and recommendation performance simultaneously is challenging, especially for models that take the entire sequence of items as input without screening. In this paper,we propose an interpretable framework (named PTSR) that enables a pattern-wise transparent decision-making process. It breaks the sequence of items into multi-level patterns that serve as atomic units for the entire recommendation process. The contribution of each pattern to the outcome is quantified in the probability space. With a carefully designed pattern weighting correction, the pattern contribution can be learned in the absence of ground-truth critical patterns. The final recommended items are those items that most critical patterns strongly endorse. Extensive experiments on four public datasets demonstrate remarkable recommendation performance, while case studies validate the model transparency. Our code is available at https://anonymous.4open.science/r/PTSR-2237.




Abstract:Large language models (LLMs) have gained much attention in the recommendation community; some studies have observed that LLMs, fine-tuned by the cross-entropy loss with a full softmax, could achieve state-of-the-art performance already. However, these claims are drawn from unobjective and unfair comparisons. In view of the substantial quantity of items in reality, conventional recommenders typically adopt a pointwise/pairwise loss function instead for training. This substitute however causes severe performance degradation, leading to under-estimation of conventional methods and over-confidence in the ranking capability of LLMs. In this work, we theoretically justify the superiority of cross-entropy, and showcase that it can be adequately replaced by some elementary approximations with certain necessary modifications. The remarkable results across three public datasets corroborate that even in a practical sense, existing LLM-based methods are not as effective as claimed for next-item recommendation. We hope that these theoretical understandings in conjunction with the empirical results will facilitate an objective evaluation of LLM-based recommendation in the future.




Abstract:With the rapid development of detectors, Bounding Box Regression (BBR) loss function has constantly updated and optimized. However, the existing IoU-based BBR still focus on accelerating convergence by adding new loss terms, ignoring the limitations of IoU loss term itself. Although theoretically IoU loss can effectively describe the state of bounding box regression,in practical applications, it cannot adjust itself according to different detectors and detection tasks, and does not have strong generalization. Based on the above, we first analyzed the BBR model and concluded that distinguishing different regression samples and using different scales of auxiliary bounding boxes to calculate losses can effectively accelerate the bounding box regression process. For high IoU samples, using smaller auxiliary bounding boxes to calculate losses can accelerate convergence, while larger auxiliary bounding boxes are suitable for low IoU samples. Then, we propose Inner-IoU loss, which calculates IoU loss through auxiliary bounding boxes. For different datasets and detectors, we introduce a scaling factor ratio to control the scale size of the auxiliary bounding boxes for calculating losses. Finally, integrate Inner-IoU into the existing IoU-based loss functions for simulation and comparative experiments. The experiment result demonstrate a further enhancement in detection performance with the utilization of the method proposed in this paper, verifying the effectiveness and generalization ability of Inner-IoU loss. Code is available at https://github.com/malagoutou/Inner-IoU.




Abstract:Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption.
Abstract:Although Physics-Informed Neural Networks (PINNs) have been successfully applied to various differential equations, accurately solving perturbed convection-diffusion-reaction problems is still extremely challenging for PINNs. This paper investigates the source of the learning difficulties and finds that the rapid transition of potential solution in the layer region causes the failure of convergence. Based on this finding, we present a curriculum learning method that encourages neural networks to ``prioritize the learning on easier non-layer regions''. The method helps PINNs to dynamically adjust the training data weights, speed up the learning procedure, and ultimately significantly improve the accuracy of the network approximation. Extensive evaluation on multiple typical model equations shows that the proposed approach accurately captures the resolution of the layer regions, and achieves multiple orders of magnitude lower root-mean-squared error than ordinary PINNs. We provide our PyTorch code at https://github.com/WYu-Feng/CLPINN