Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Recommendation": models, code, and papers

Random Offset Block Embedding Array (ROBE) for CriteoTB Benchmark MLPerf DLRM Model : 1000$\times$ Compression and 2.7$\times$ Faster Inference

Aug 04, 2021
Aditya Desai, Li Chou, Anshumali Shrivastava

Deep learning for recommendation data is the one of the most pervasive and challenging AI workload in recent times. State-of-the-art recommendation models are one of the largest models rivalling the likes of GPT-3 and Switch Transformer. Challenges in deep learning recommendation models (DLRM) stem from learning dense embeddings for each of the categorical values. These embedding tables in industrial scale models can be as large as hundreds of terabytes. Such large models lead to a plethora of engineering challenges, not to mention prohibitive communication overheads, and slower training and inference times. Of these, slower inference time directly impacts user experience. Model compression for DLRM is gaining traction and the community has recently shown impressive compression results. In this paper, we present Random Offset Block Embedding Array (ROBE) as a low memory alternative to embedding tables which provide orders of magnitude reduction in memory usage while maintaining accuracy and boosting execution speed. ROBE is a simple fundamental approach in improving both cache performance and the variance of randomized hashing, which could be of independent interest in itself. We demonstrate that we can successfully train DLRM models with same accuracy while using $1000 \times$ less memory. A $1000\times$ compressed model directly results in faster inference without any engineering. In particular, we show that we can train DLRM model using ROBE Array of size 100MB on a single GPU to achieve AUC of 0.8025 or higher as required by official MLPerf CriteoTB benchmark DLRM model of 100GB while achieving about $2.7\times$ (170\%) improvement in inference throughput.

  Access Paper or Ask Questions

PreSizE: Predicting Size in E-Commerce using Transformers

May 04, 2021
Yotam Eshel, Or Levi, Haggai Roitman, Alexander Nus

Recent advances in the e-commerce fashion industry have led to an exploration of novel ways to enhance buyer experience via improved personalization. Predicting a proper size for an item to recommend is an important personalization challenge, and is being studied in this work. Earlier works in this field either focused on modeling explicit buyer fitment feedback or modeling of only a single aspect of the problem (e.g., specific category, brand, etc.). More recent works proposed richer models, either content-based or sequence-based, better accounting for content-based aspects of the problem or better modeling the buyer's online journey. However, both these approaches fail in certain scenarios: either when encountering unseen items (sequence-based models) or when encountering new users (content-based models). To address the aforementioned gaps, we propose PreSizE - a novel deep learning framework which utilizes Transformers for accurate size prediction. PreSizE models the effect of both content-based attributes, such as brand and category, and the buyer's purchase history on her size preferences. Using an extensive set of experiments on a large-scale e-commerce dataset, we demonstrate that PreSizE is capable of achieving superior prediction performance compared to previous state-of-the-art baselines. By encoding item attributes, PreSizE better handles cold-start cases with unseen items, and cases where buyers have little past purchase data. As a proof of concept, we demonstrate that size predictions made by PreSizE can be effectively integrated into an existing production recommender system yielding very effective features and significantly improving recommendations.

* Accepted for publication in SIGIR'2021 

  Access Paper or Ask Questions

Dynamic Planning of Bicycle Stations in Dockless Public Bicycle-sharing System Using Gated Graph Neural Network

Jan 19, 2021
Jianguo Chen, Kenli Li, Keqin Li, Philip S. Yu, Zeng Zeng

Benefiting from convenient cycling and flexible parking locations, the Dockless Public Bicycle-sharing (DL-PBS) network becomes increasingly popular in many countries. However, redundant and low-utility stations waste public urban space and maintenance costs of DL-PBS vendors. In this paper, we propose a Bicycle Station Dynamic Planning (BSDP) system to dynamically provide the optimal bicycle station layout for the DL-PBS network. The BSDP system contains four modules: bicycle drop-off location clustering, bicycle-station graph modeling, bicycle-station location prediction, and bicycle-station layout recommendation. In the bicycle drop-off location clustering module, candidate bicycle stations are clustered from each spatio-temporal subset of the large-scale cycling trajectory records. In the bicycle-station graph modeling module, a weighted digraph model is built based on the clustering results and inferior stations with low station revenue and utility are filtered. Then, graph models across time periods are combined to create a graph sequence model. In the bicycle-station location prediction module, the GGNN model is used to train the graph sequence data and dynamically predict bicycle stations in the next period. In the bicycle-station layout recommendation module, the predicted bicycle stations are fine-tuned according to the government urban management plan, which ensures that the recommended station layout is conducive to city management, vendor revenue, and user convenience. Experiments on actual DL-PBS networks verify the effectiveness, accuracy and feasibility of the proposed BSDP system.

  Access Paper or Ask Questions

To what extent should we trust AI models when they extrapolate?

Jan 27, 2022
Roozbeh Yousefzadeh, Xuenan Cao

Many applications affecting human lives rely on models that have come to be known under the umbrella of machine learning and artificial intelligence. These AI models are usually complicated mathematical functions that map from an input space to an output space. Stakeholders are interested to know the rationales behind models' decisions and functional behavior. We study this functional behavior in relation to the data used to create the models. On this topic, scholars have often assumed that models do not extrapolate, i.e., they learn from their training samples and process new input by interpolation. This assumption is questionable: we show that models extrapolate frequently; the extent of extrapolation varies and can be socially consequential. We demonstrate that extrapolation happens for a substantial portion of datasets more than one would consider reasonable. How can we trust models if we do not know whether they are extrapolating? Given a model trained to recommend clinical procedures for patients, can we trust the recommendation when the model considers a patient older or younger than all the samples in the training set? If the training set is mostly Whites, to what extent can we trust its recommendations about Black and Hispanic patients? Which dimension (race, gender, or age) does extrapolation happen? Even if a model is trained on people of all races, it still may extrapolate in significant ways related to race. The leading question is, to what extent can we trust AI models when they process inputs that fall outside their training set? This paper investigates several social applications of AI, showing how models extrapolate without notice. We also look at different sub-spaces of extrapolation for specific individuals subject to AI models and report how these extrapolations can be interpreted, not mathematically, but from a humanistic point of view.

  Access Paper or Ask Questions

Blind Source Separation for NMR Spectra with Negative Intensity

Feb 07, 2020
Ryan J. McCarty, Nimish Ronghe, Mandy Woo, Todd M. Alam

NMR spectral datasets, especially in systems with limited samples, can be difficult to interpret if they contain multiple chemical components (phases, polymorphs, molecules, crystals, glasses, etc...) and the possibility of overlapping resonances. In this paper, we benchmark several blind source separation techniques for analysis of NMR spectral datasets containing negative intensity. For benchmarking purposes, we generated a large synthetic datasbase of quadrupolar solid-state NMR-like spectra that model spin-lattice T1 relaxation or nutation tip/flip angle experiments. Our benchmarking approach focused exclusively on the ability of blind source separation techniques to reproduce the spectra of the underlying pure components. In general, we find that FastICA (Fast Independent Component Analysis), SIMPLISMA (SIMPLe-to-use-Interactive Self-modeling Mixture Analysis), and NNMF (Non-Negative Matrix Factorization) are top-performing techniques. We demonstrate that dataset normalization approaches prior to blind source separation do not considerably improve outcomes. Within the range of noise levels studied, we did not find drastic changes to the ranking of techniques. The accuracy of FastICA and SIMPLISMA degrades quickly if excess (unreal) pure components are predicted. Our results indicate poor performance of SVD (Singular Value Decomposition) methods, and we propose alternative techniques for matrix initialization. The benchmarked techniques are also applied to real solid state NMR datasets. In general, the recommendations from the synthetic datasets agree with the recommendations and results from the real data analysis. The discussion provides some additional recommendations for spectroscopists applying blind source separation to NMR datasets, and for future benchmark studies.

* 28 pages, 6 figures, 5 tables 

  Access Paper or Ask Questions

Learning Representations of Social Media Users

Dec 02, 2018
Adrian Benton

User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists consider the problem of inferring user representations more abstractly; how does one extract a stable user representation - effective for many downstream tasks - from a medium as noisy and complicated as social media? The quality of a user representation is ultimately task-dependent (e.g. does it improve classifier performance, make more accurate recommendations in a recommendation system) but there are proxies that are less sensitive to the specific task. Is the representation predictive of latent properties such as a person's demographic features, socioeconomic class, or mental health state? Is it predictive of the user's future behavior? In this thesis, we begin by showing how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis to learn these representations and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can be employed as distant supervision to improve topic model fit. Finally, we show how user features can be integrated into and improve existing classifiers in the multitask learning framework. We treat user representations - ground truth gender and mental health features - as auxiliary tasks to improve mental health state prediction. We also use distributed user representations learned in the first chapter to improve tweet-level stance classifiers, showing that distant user information can inform classification tasks at the granularity of a single message.

* PhD thesis 

  Access Paper or Ask Questions

Learning to Create Better Ads: Generation and Ranking Approaches for Ad Creative Refinement

Aug 17, 2020
Shaunak Mishra, Manisha Verma, Yichao Zhou, Kapil Thadani, Wei Wang

In the online advertising industry, the process of designing an ad creative (i.e., ad text and image) requires manual labor. Typically, each advertiser launches multiple creatives via online A/B tests to infer effective creatives for the target audience, that are then refined further in an iterative fashion. Due to the manual nature of this process, it is time-consuming to learn, refine, and deploy the modified creatives. Since major ad platforms typically run A/B tests for multiple advertisers in parallel, we explore the possibility of collaboratively learning ad creative refinement via A/B tests of multiple advertisers. In particular, given an input ad creative, we study approaches to refine the given ad text and image by: (i) generating new ad text, (ii) recommending keyphrases for new ad text, and (iii) recommending image tags (objects in image) to select new ad image. Based on A/B tests conducted by multiple advertisers, we form pairwise examples of inferior and superior ad creatives, and use such pairs to train models for the above tasks. For generating new ad text, we demonstrate the efficacy of an encoder-decoder architecture with copy mechanism, which allows some words from the (inferior) input text to be copied to the output while incorporating new words associated with higher click-through-rate. For the keyphrase and image tag recommendation task, we demonstrate the efficacy of a deep relevance matching model, as well as the relative robustness of ranking approaches compared to ad text generation in cold-start scenarios with unseen advertisers. We also share broadly applicable insights from our experiments using data from the Yahoo Gemini ad platform.

* 9 pages, accepted for publication in CIKM 2020 

  Access Paper or Ask Questions

Computing With Words for Student Strategy Evaluation in an Examination

May 02, 2020
Prashant K Gupta, Pranab K. Muhuri

In the framework of Granular Computing (GC), Interval type 2 Fuzzy Sets (IT2 FSs) play a prominent role by facilitating a better representation of uncertain linguistic information. Perceptual Computing (Per C), a well known computing with words (CWW) approach, and its various applications have nicely exploited this advantage. This paper reports a novel Per C based approach for student strategy evaluation. Examinations are generally oriented to test the subject knowledge of students. The number of questions that they are able to solve accurately judges success rates of students in the examinations. However, we feel that not only the solutions of questions, but also the strategy adopted for finding those solutions are equally important. More marks should be awarded to a student, who solves a question with a better strategy compared to a student, whose strategy is relatively not that good. Furthermore, the students strategy can be taken as a measure of his or her learning outcome as perceived by a faculty member. This can help to identify students, whose learning outcomes are not good, and, thus, can be provided with any relevant help, for improvement. The main contribution of this paper is to illustrate the use of CWW for student strategy evaluation and present a comparison of the recommendations generated by different CWW approaches. CWW provides us with two major advantages. First, it generates a numeric score for the overall evaluation of strategy adopted by a student in the examination. This enables comparison and ranking of the students based on their performances. Second, a linguistic evaluation describing the student strategy is also obtained from the system. Both these numeric score and linguistic recommendation are together used to assess the quality of a students strategy. We found that Per-C generates unique recommendations in all cases and outperforms other CWW approaches.

* Gupta, Prashant K., and Pranab K. Muhuri. "Computing with words for student strategy evaluation in an examination." Granular Computing 4, no. 2 (2019): 167-184 

  Access Paper or Ask Questions

Compositional Coding for Collaborative Filtering

May 09, 2019
Chenghao Liu, Tao Lu, Xin Wang, Zhiyong Cheng, Jianling Sun, Steven C. H. Hoi

Efficiency is crucial to the online recommender systems. Representing users and items as binary vectors for Collaborative Filtering (CF) can achieve fast user-item affinity computation in the Hamming space, in recent years, we have witnessed an emerging research effort in exploiting binary hashing techniques for CF methods. However, CF with binary codes naturally suffers from low accuracy due to limited representation capability in each bit, which impedes it from modeling complex structure of the data. In this work, we attempt to improve the efficiency without hurting the model performance by utilizing both the accuracy of real-valued vectors and the efficiency of binary codes to represent users/items. In particular, we propose the Compositional Coding for Collaborative Filtering (CCCF) framework, which not only gains better recommendation efficiency than the state-of-the-art binarized CF approaches but also achieves even higher accuracy than the real-valued CF method. Specifically, CCCF innovatively represents each user/item with a set of binary vectors, which are associated with a sparse real-value weight vector. Each value of the weight vector encodes the importance of the corresponding binary vector to the user/item. The continuous weight vectors greatly enhances the representation capability of binary codes, and its sparsity guarantees the processing speed. Furthermore, an integer weight approximation scheme is proposed to further accelerate the speed. Based on the CCCF framework, we design an efficient discrete optimization algorithm to learn its parameters. Extensive experiments on three real-world datasets show that our method outperforms the state-of-the-art binarized CF methods (even achieves better performance than the real-valued CF method) by a large margin in terms of both recommendation accuracy and efficiency.

* SIGIR2019 

  Access Paper or Ask Questions

Deep-learning based Tools for Automated Protocol Definition of Advanced Diagnostic Imaging Exams

May 28, 2021
Andrew S. Nencka, Mohammad Sherafati, Timothy Goebel, Parag Tolat, Kevin M. Koch

Purpose: This study evaluates the effectiveness and impact of automated order-based protocol assignment for magnetic resonance imaging (MRI) exams using natural language processing (NLP) and deep learning (DL). Methods: NLP tools were applied to retrospectively process orders from over 116,000 MRI exams with 200 unique sub-specialized protocols ("Local" protocol class). Separate DL models were trained on 70\% of the processed data for "Local" protocols as well as 93 American College of Radiology ("ACR") protocols and 48 "General" protocols. The DL Models were assessed in an "auto-protocoling (AP)" inference mode which returns the top recommendation and in a "clinical decision support (CDS)" inference mode which returns up to 10 protocols for radiologist review. The accuracy of each protocol recommendation was computed and analyzed based on the difference between the normalized output score of the corresponding neural net for the top two recommendations. Results: The top predicted protocol in AP mode was correct for 82.8%, 73.8%, and 69.3% of the test cases for "General", "ACR", and "Local" protocol classes, respectively. Higher levels of accuracy over 96% were obtained for all protocol classes in CDS mode. However, at current validation performance levels, the proposed models offer modest, positive, financial impact on large-scale imaging networks. Conclusions: DL-based protocol automation is feasible and can be tuned to route substantial fractions of exams for auto-protocoling, with higher accuracy with more general protocols. Economic analyses of the tested algorithms indicate that improved algorithm performance is required to yield a practical exam auto-protocoling tool for sub-specialized imaging exams.

  Access Paper or Ask Questions