Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ido Guy

Measuring multi-calibration

Jun 12, 2025

Ido Guy, Daniel Haimovich, Fridolin Linder, Nastaran Okati, Lorenzo Perini, Niek Tax, Mark Tygert

Abstract:A suitable scalar metric can help measure multi-calibration, defined as follows. When the expected values of observed responses are equal to corresponding predicted probabilities, the probabilistic predictions are known as "perfectly calibrated." When the predicted probabilities are perfectly calibrated simultaneously across several subpopulations, the probabilistic predictions are known as "perfectly multi-calibrated." In practice, predicted probabilities are seldom perfectly multi-calibrated, so a statistic measuring the distance from perfect multi-calibration is informative. A recently proposed metric for calibration, based on the classical Kuiper statistic, is a natural basis for a new metric of multi-calibration and avoids well-known problems of metrics based on binning or kernel density estimation. The newly proposed metric weights the contributions of different subpopulations in proportion to their signal-to-noise ratios; data analyses' ablations demonstrate that the metric becomes noisy when omitting the signal-to-noise ratios from the metric. Numerical examples on benchmark data sets illustrate the new metric.

* 25 pages, 12 tables

Via

Access Paper or Ask Questions

Leveraging World Events to Predict E-Commerce Consumer Demand under Anomaly

May 22, 2024

Dan Kalifa, Uriel Singer, Ido Guy, Guy D. Rosin, Kira Radinsky

Abstract:Consumer demand forecasting is of high importance for many e-commerce applications, including supply chain optimization, advertisement placement, and delivery speed optimization. However, reliable time series sales forecasting for e-commerce is difficult, especially during periods with many anomalies, as can often happen during pandemics, abnormal weather, or sports events. Although many time series algorithms have been applied to the task, prediction during anomalies still remains a challenge. In this work, we hypothesize that leveraging external knowledge found in world events can help overcome the challenge of prediction under anomalies. We mine a large repository of 40 years of world events and their textual representations. Further, we present a novel methodology based on transformers to construct an embedding of a day based on the relations of the day's events. Those embeddings are then used to forecast future consumer behavior. We empirically evaluate the methods over a large e-commerce products sales dataset, extracted from eBay, one of the world's largest online marketplaces. We show over numerous categories that our method outperforms state-of-the-art baselines during anomalies.

* In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM 2022), 9 pages

Via

Access Paper or Ask Questions

Active learning with biased non-response to label requests

Dec 13, 2023

Thomas Robinson, Niek Tax, Richard Mudd, Ido Guy

Abstract:Active learning can improve the efficiency of training prediction models by identifying the most informative new labels to acquire. However, non-response to label requests can impact active learning's effectiveness in real-world contexts. We conceptualise this degradation by considering the type of non-response present in the data, demonstrating that biased non-response is particularly detrimental to model performance. We argue that this sort of non-response is particularly likely in contexts where the labelling process, by nature, relies on user interactions. To mitigate the impact of biased non-response, we propose a cost-based correction to the sampling strategy--the Upper Confidence Bound of the Expected Utility (UCB-EU)--that can, plausibly, be applied to any active learning algorithm. Through experiments, we demonstrate that our method successfully reduces the harm from labelling non-response in many settings. However, we also characterise settings where the non-response bias in the annotations remains detrimental under UCB-EU for particular sampling methods and data generating processes. Finally, we evaluate our method on a real-world dataset from e-commerce platform Taobao. We show that UCB-EU yields substantial performance improvements to conversion models that are trained on clicked impressions. Most generally, this research serves to both better conceptualise the interplay between types of non-response and model improvements via active learning, and to provide a practical, easy to implement correction that helps mitigate model degradation.

Via

Access Paper or Ask Questions

TCE: A Test-Based Approach to Measuring Calibration Error

Jun 25, 2023

Takuo Matsubara, Niek Tax, Richard Mudd, Ido Guy

Figure 1 for TCE: A Test-Based Approach to Measuring Calibration Error

Figure 2 for TCE: A Test-Based Approach to Measuring Calibration Error

Figure 3 for TCE: A Test-Based Approach to Measuring Calibration Error

Figure 4 for TCE: A Test-Based Approach to Measuring Calibration Error

Abstract:This paper proposes a new metric to measure the calibration error of probabilistic binary classifiers, called test-based calibration error (TCE). TCE incorporates a novel loss function based on a statistical test to examine the extent to which model predictions differ from probabilities estimated from data. It offers (i) a clear interpretation, (ii) a consistent scale that is unaffected by class imbalance, and (iii) an enhanced visual representation with repect to the standard reliability diagram. In addition, we introduce an optimality criterion for the binning procedure of calibration error metrics based on a minimal estimation error of the empirical probabilities. We provide a novel computational algorithm for optimal bins under bin-size constraints. We demonstrate properties of TCE through a range of experiments, including multiple real-world imbalanced datasets and ImageNet 1000.

Via

Access Paper or Ask Questions

Explaining Predictive Uncertainty with Information Theoretic Shapley Values

Jun 09, 2023

David S. Watson, Joshua O'Hara, Niek Tax, Richard Mudd, Ido Guy

Abstract:Researchers in explainable artificial intelligence have developed numerous methods for helping users understand the predictions of complex supervised learning models. By contrast, explaining the $\textit{uncertainty}$ of model outputs has received relatively little attention. We adapt the popular Shapley value framework to explain various types of predictive uncertainty, quantifying each feature's contribution to the conditional entropy of individual model outputs. We consider games with modified characteristic functions and find deep connections between the resulting Shapley values and fundamental quantities from information theory and conditional independence testing. We outline inference procedures for finite sample error rate control with provable guarantees, and implement an efficient algorithm that performs well in a range of experiments on real and simulated data. Our method has applications to covariate shift detection, active learning, feature selection, and active feature-value acquisition.

Via

Access Paper or Ask Questions

tBDFS: Temporal Graph Neural Network Leveraging DFS

Jun 12, 2022

Uriel Singer, Haggai Roitman, Ido Guy, Kira Radinsky

Figure 1 for tBDFS: Temporal Graph Neural Network Leveraging DFS

Figure 2 for tBDFS: Temporal Graph Neural Network Leveraging DFS

Figure 3 for tBDFS: Temporal Graph Neural Network Leveraging DFS

Figure 4 for tBDFS: Temporal Graph Neural Network Leveraging DFS

Abstract:Temporal graph neural networks (temporal GNNs) have been widely researched, reaching state-of-the-art results on multiple prediction tasks. A common approach employed by most previous works is to apply a layer that aggregates information from the historical neighbors of a node. Taking a different research direction, in this work, we propose tBDFS -- a novel temporal GNN architecture. tBDFS applies a layer that efficiently aggregates information from temporal paths to a given (target) node in the graph. For each given node, the aggregation is applied in two stages: (1) A single representation is learned for each temporal path ending in that node, and (2) all path representations are aggregated into a final node representation. Overall, our goal is not to add new information to a node, but rather observe the same exact information in a new perspective. This allows our model to directly observe patterns that are path-oriented rather than neighborhood-oriented. This can be thought as a Depth-First Search (DFS) traversal over the temporal graph, compared to the popular Breath-First Search (BFS) traversal that is applied in previous works. We evaluate tBDFS over multiple link prediction tasks and show its favorable performance compared to state-of-the-art baselines. To the best of our knowledge, we are the first to apply a temporal-DFS neural network.

* 9 pages, 2 figures, 2 tables

Via

Access Paper or Ask Questions

Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce

Oct 24, 2021

Uriel Singer, Haggai Roitman, Yotam Eshel, Alexander Nus, Ido Guy, Or Levi, Idan Hasson, Eliyahu Kiperwasser

Figure 1 for Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce

Figure 2 for Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce

Figure 3 for Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce

Figure 4 for Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce

Abstract:In e-commerce, the watchlist enables users to track items over time and has emerged as a primary feature, playing an important role in users' shopping journey. Watchlist items typically have multiple attributes whose values may change over time (e.g., price, quantity). Since many users accumulate dozens of items on their watchlist, and since shopping intents change over time, recommending the top watchlist items in a given context can be valuable. In this work, we study the watchlist functionality in e-commerce and introduce a novel watchlist recommendation task. Our goal is to prioritize which watchlist items the user should pay attention to next by predicting the next items the user will click. We cast this task as a specialized sequential recommendation task and discuss its characteristics. Our proposed recommendation model, Trans2D, is built on top of the Transformer architecture, where we further suggest a novel extended attention mechanism (Attention2D) that allows to learn complex item-item, attribute-attribute and item-attribute patterns from sequential-data with multiple item attributes. Using a large-scale watchlist dataset from eBay, we evaluate our proposed model, where we demonstrate its superiority compared to multiple state-of-the-art baselines, many of which are adapted for this task.

* International Conference on Web Search and Data Mining (WSDM), 2022

Via

Access Paper or Ask Questions

Time Masking for Temporal Language Models

Oct 22, 2021

Guy D. Rosin, Ido Guy, Kira Radinsky

Figure 1 for Time Masking for Temporal Language Models

Figure 2 for Time Masking for Temporal Language Models

Figure 3 for Time Masking for Temporal Language Models

Figure 4 for Time Masking for Temporal Language Models

Abstract:Our world is constantly evolving, and so is the content on the web. Consequently, our languages, often said to mirror the world, are dynamic in nature. However, most current contextual language models are static and cannot adapt to changes over time. In this work, we propose a temporal contextual language model called TempoBERT, which uses time as an additional context of texts. Our technique is based on modifying texts with temporal information and performing time masking - specific masking for the supplementary time information. We leverage our approach for the tasks of semantic change detection and sentence time prediction, experimenting on diverse datasets in terms of time, size, genre, and language. Our extensive evaluation shows that both tasks benefit from exploiting time masking.

* 9 pages, accepted to WSDM 2022

Via

Access Paper or Ask Questions

E-Commerce Dispute Resolution Prediction

Oct 13, 2021

David Tsurel, Michael Doron, Alexander Nus, Arnon Dagan, Ido Guy, Dafna Shahaf

Figure 1 for E-Commerce Dispute Resolution Prediction

Figure 2 for E-Commerce Dispute Resolution Prediction

Figure 3 for E-Commerce Dispute Resolution Prediction

Figure 4 for E-Commerce Dispute Resolution Prediction

Abstract:E-Commerce marketplaces support millions of daily transactions, and some disagreements between buyers and sellers are unavoidable. Resolving disputes in an accurate, fast, and fair manner is of great importance for maintaining a trustworthy platform. Simple cases can be automated, but intricate cases are not sufficiently addressed by hard-coded rules, and therefore most disputes are currently resolved by people. In this work we take a first step towards automatically assisting human agents in dispute resolution at scale. We construct a large dataset of disputes from the eBay online marketplace, and identify several interesting behavioral and linguistic patterns. We then train classifiers to predict dispute outcomes with high accuracy. We explore the model and the dataset, reporting interesting correlations, important features, and insights.

* CIKM'20: Proceedings of the 29th ACM International Conference on Information and Knowledge Management, Oct 2020, Pages 1465-1474

Via

Access Paper or Ask Questions

Event-Driven Query Expansion

Dec 22, 2020

Guy D. Rosin, Ido Guy, Kira Radinsky

Figure 1 for Event-Driven Query Expansion

Figure 2 for Event-Driven Query Expansion

Figure 3 for Event-Driven Query Expansion

Figure 4 for Event-Driven Query Expansion

Abstract:A significant number of event-related queries are issued in Web search. In this paper, we seek to improve retrieval performance by leveraging events and specifically target the classic task of query expansion. We propose a method to expand an event-related query by first detecting the events related to it. Then, we derive the candidates for expansion as terms semantically related to both the query and the events. To identify the candidates, we utilize a novel mechanism to simultaneously embed words and events in the same vector space. We show that our proposed method of leveraging events improves query expansion performance significantly compared with state-of-the-art methods on various newswire TREC datasets.

* 9 pages, WSDM 2021

Via

Access Paper or Ask Questions