We propose modeling real-world data markets, where sellers post fixed prices and buyers are free to purchase from any set of sellers they please, as a simultaneous-move game between the buyers. A key component of this model is the negative externality buyers induce on one another due to purchasing similar data, a phenomenon exacerbated by its easy replicability. In the complete-information setting, where all buyers know their valuations, we characterize both the existence and the quality (with respect to optimal social welfare) of the pure-strategy Nash equilibrium under various models of buyer externality. While this picture is bleak without any market intervention, reinforcing the inadequacy of modern data markets, we prove that for a broad class of externality functions, market intervention in the form of a revenue-neutral transaction cost can lead to a pure-strategy equilibrium with strong welfare guarantees. We further show that this intervention is amenable to the more realistic setting where buyers start with unknown valuations and learn them over time through repeated market interactions. For such a setting, we provide an online learning algorithm for each buyer that achieves low regret guarantees with respect to both individual buyers' strategy and social welfare optimal. Our work paves the way for considering simple intervention strategies for existing fixed-price data markets to address their shortcoming and the unique challenges put forth by data products.
The classic Bayesian persuasion model assumes a Bayesian and best-responding receiver. We study a relaxation of the Bayesian persuasion model where the receiver can approximately best respond to the sender's signaling scheme. We show that, under natural assumptions, (1) the sender can find a signaling scheme that guarantees itself an expected utility almost as good as its optimal utility in the classic model, no matter what approximately best-responding strategy the receiver uses; (2) on the other hand, there is no signaling scheme that gives the sender much more utility than its optimal utility in the classic model, even if the receiver uses the approximately best-responding strategy that is best for the sender. Together, (1) and (2) imply that the approximately best-responding behavior of the receiver does not affect the sender's maximal achievable utility a lot in the Bayesian persuasion problem. The proofs of both results rely on the idea of robustification of a Bayesian persuasion scheme: given a pair of the sender's signaling scheme and the receiver's strategy, we can construct another signaling scheme such that the receiver prefers to use that strategy in the new scheme more than in the original scheme, and the two schemes give the sender similar utilities. As an application of our main result (1), we show that, in a repeated Bayesian persuasion model where the receiver learns to respond to the sender by some algorithms, the sender can do almost as well as in the classic model. Interestingly, unlike (2), with a learning receiver the sender can sometimes do much better than in the classic model.
Artificial intelligence (AI) systems are increasingly used for providing advice to facilitate human decision making. While a large body of work has explored how AI systems can be optimized to produce accurate and fair advice and how algorithmic advice should be presented to human decision makers, in this work we ask a different basic question: When should algorithms provide advice? Motivated by limitations of the current practice of constantly providing algorithmic advice, we propose the design of AI systems that interact with the human user in a two-sided manner and provide advice only when it is likely to be beneficial to the human in making their decision. Our AI systems learn advising policies using past human decisions. Then, for new cases, the learned policies utilize input from the human to identify cases where algorithmic advice would be useful, as well as those where the human is better off deciding alone. We conduct a large-scale experiment to evaluate our approach by using data from the US criminal justice system on pretrial-release decisions. In our experiment, participants were asked to assess the risk of defendants to violate their release terms if released and were advised by different advising approaches. The results show that our interactive-advising approach manages to provide advice at times of need and to significantly improve human decision making compared to fixed, non-interactive advising approaches. Our approach has additional advantages in facilitating human learning, preserving complementary strengths of human decision makers, and leading to more positive responsiveness to the advice.
We consider a Bayesian forecast aggregation model where $n$ experts, after observing private signals about an unknown binary event, report their posterior beliefs about the event to a principal, who then aggregates the reports into a single prediction for the event. The signals of the experts and the outcome of the event follow a joint distribution that is unknown to the principal, but the principal has access to i.i.d. "samples" from the distribution, where each sample is a tuple of experts' reports (not signals) and the realization of the event. Using these samples, the principal aims to find an $\varepsilon$-approximately optimal (Bayesian) aggregator. We study the sample complexity of this problem. We show that, for arbitrary discrete distributions, the number of samples must be at least $\tilde \Omega(m^{n-2} / \varepsilon)$, where $m$ is the size of each expert's signal space. This sample complexity grows exponentially in the number of experts $n$. But if experts' signals are independent conditioned on the realization of the event, then the sample complexity is significantly reduced, to $\tilde O(1 / \varepsilon^2)$, which does not depend on $n$.
This paper introduces an optimization problem for proper scoring rule design. Consider a principal who wants to collect an agent's prediction about an unknown state. The agent can either report his prior prediction or access a costly signal and report the posterior prediction. Given a collection of possible distributions containing the agent's posterior prediction distribution, the principal's objective is to design a bounded scoring rule to maximize the agent's worst-case payoff increment between reporting his posterior prediction and reporting his prior prediction. We study two settings of such optimization for proper scoring rules: static and asymptotic settings. In the static setting, where the agent can access one signal, we propose an efficient algorithm to compute an optimal scoring rule when the collection of distributions is finite. The agent can adaptively and indefinitely refine his prediction in the asymptotic setting. We first consider a sequence of collections of posterior distributions with vanishing covariance, which emulates general estimators with large samples, and show the optimality of the quadratic scoring rule. Then, when the agent's posterior distribution is a Beta-Bernoulli process, we find that the log scoring rule is optimal. We also prove the optimality of the log scoring rule over a smaller set of functions for categorical distributions with Dirichlet priors.
In real life auctions, a widely observed phenomenon is the winner's curse -- the winner's high bid implies that the winner often over-estimates the value of the good for sale, resulting in an incurred negative utility. The seminal work of Eyster and Rabin [Econometrica'05] introduced a behavioral model aimed to explain this observed anomaly. We term agents who display this bias "cursed agents". We adopt their model in the interdependent value setting, and aim to devise mechanisms that prevent the cursed agents from obtaining negative utility. We design mechanisms that are cursed ex-post IC, that is, incentivize agents to bid their true signal even though they are cursed, while ensuring that the outcome is individually rational -- the price the agents pay is no more than the agents' true value. Since the agents might over-estimate the good's value, such mechanisms might require the seller to make positive transfers to the agents to prevent agents from over-paying. For revenue maximization, we give the optimal deterministic and anonymous mechanism. For welfare maximization, we require ex-post budget balance (EPBB), as positive transfers might lead to negative revenue. We propose a masking operation that takes any deterministic mechanism, and imposes that the seller would not make positive transfers, enforcing EPBB. We show that in typical settings, EPBB implies that the mechanism cannot make any positive transfers, implying that applying the masking operation on the fully efficient mechanism results in a socially optimal EPBB mechanism. This further implies that if the valuation function is the maximum of agents' signals, the optimal EPBB mechanism obtains zero welfare. In contrast, we show that for sum-concave valuations, which include weighted-sum valuations and l_p-norms, the welfare optimal EPBB mechanism obtains half of the optimal welfare as the number of agents grows large.
Governments are increasingly turning to algorithmic risk assessments when making important decisions, believing that these algorithms will improve public servants' ability to make policy-relevant predictions and thereby lead to more informed decisions. Yet because many policy decisions require balancing risk-minimization with competing social goals, evaluating the impacts of risk assessments requires considering how public servants are influenced by risk assessments when making policy decisions rather than just how accurately these algorithms make predictions. Through an online experiment with 2,140 lay participants simulating two high-stakes government contexts, we provide the first large-scale evidence that risk assessments can systematically alter decision-making processes by increasing the salience of risk as a factor in decisions and that these shifts could exacerbate racial disparities. These results demonstrate that improving human prediction accuracy with algorithms does not necessarily improve human decisions and highlight the need to experimentally test how government algorithms are used by human decision-makers.
The last decade saw the emergence of systematic large-scale replication projects in the social and behavioral sciences, (Camerer et al., 2016, 2018; Ebersole et al., 2016; Klein et al., 2014, 2018; Collaboration, 2015). These projects were driven by theoretical and conceptual concerns about a high fraction of "false positives" in the scientific publications (Ioannidis, 2005) (and a high prevalence of "questionable research practices" (Simmons, Nelson, and Simonsohn, 2011). Concerns about the credibility of research findings are not unique to the behavioral and social sciences; within Computer Science, Artificial Intelligence (AI) and Machine Learning (ML) are areas of particular concern (Lucic et al., 2018; Freire, Bonnet, and Shasha, 2012; Gundersen and Kjensmo, 2018; Henderson et al., 2018). Given the pioneering role of the behavioral and social sciences in the promotion of novel methodologies to improve the credibility of research, it is a promising approach to analyze the lessons learned from this field and adjust strategies for Computer Science, AI and ML In this paper, we review approaches used in the behavioral and social sciences and in the DARPA SCORE project. We particularly focus on the role of human forecasting of replication outcomes, and how forecasting can leverage the information gained from relatively labor and resource-intensive replications. We will discuss opportunities and challenges of using these approaches to monitor and improve the credibility of research areas in Computer Science, AI, and ML.
We study the problem of online learning in strategic classification settings from the perspective of the learner, who is repeatedly facing myopically rational strategic agents. We model this interplay as a repeated Stackelberg game, where at each timestep the learner deploys a high-dimensional linear classifier first and an agent, after observing the classifier, along with his real feature vector, and according to his underlying utility function, best-responds with a (potentially altered) feature vector. We measure the performance of the learner in terms of Stackelberg regret for her 0-1 loss function. Surprisingly, we prove that in strategic settings like the one considered in this paper there exist worst-case scenarios, where any sequence of actions providing sublinear external regret might result in linear Stackelberg regret and vice versa. We then provide the Grinder Algorithm, an adaptive discretization algorithm, potentially of independent interest in the online learning community, and prove its data-dependent upper bound on the Stackelberg regret given oracle access, while being computationally efficient. We also provide a nearly matching lower bound for the problem of strategic classification. We complement our theoretical analysis with simulation results, which suggest that our algorithm outperforms the benchmarks, even given access to approximation oracles. Our results advance the known state-of-the-art results in the growing literature of online learning from revealed preferences, which has so far focused on smoother utility and loss functions from the perspective of the agents and the learner respectively.
Crowdsourcing is a popular paradigm for soliciting forecasts on future events. As people may have different forecasts, how to aggregate solicited forecasts into a single accurate prediction remains to be an important challenge, especially when no historical accuracy information is available for identifying experts. In this paper, we borrow ideas from the peer prediction literature and assess the prediction accuracy of participants using solely the collected forecasts. This approach leverages the correlations among peer reports to cross-validate each participant's forecasts and allows us to assign a "peer assessment score (PAS)" for each agent as a proxy for the agent's prediction accuracy. We identify several empirically effective methods to generate PAS and propose an aggregation framework that uses PAS to identify experts and to boost existing aggregators' prediction accuracy. We evaluate our methods over 14 real-world datasets and show that i) PAS generated from peer prediction methods can approximately reflect the prediction accuracy of agents, and ii) our aggregation framework demonstrates consistent and significant improvement in the prediction accuracy over existing aggregators for both binary and multi-choice questions under three popular accuracy measures: Brier score (mean square error), log score (cross-entropy loss) and AUC-ROC.