Alert button
Picture for David C. Parkes

David C. Parkes

Alert button

Data Market Design through Deep Learning

Oct 31, 2023
Sai Srivatsa Ravindranath, Yanchen Jiang, David C. Parkes

The $\textit{data market design}$ problem is a problem in economic theory to find a set of signaling schemes (statistical experiments) to maximize expected revenue to the information seller, where each experiment reveals some of the information known to a seller and has a corresponding price [Bergemann et al., 2018]. Each buyer has their own decision to make in a world environment, and their subjective expected value for the information associated with a particular experiment comes from the improvement in this decision and depends on their prior and value for different outcomes. In a setting with multiple buyers, a buyer's expected value for an experiment may also depend on the information sold to others [Bonatti et al., 2022]. We introduce the application of deep learning for the design of revenue-optimal data markets, looking to expand the frontiers of what can be understood and achieved. Relative to earlier work on deep learning for auction design [D\"utting et al., 2023], we must learn signaling schemes rather than allocation rules and handle $\textit{obedience constraints}$ $-$ these arising from modeling the downstream actions of buyers $-$ in addition to incentive constraints on bids. Our experiments demonstrate that this new deep learning framework can almost precisely replicate all known solutions from theory, expand to more complex settings, and be used to establish the optimality of new designs for data markets and make conjectures in regard to the structure of optimal designs.

Viaarxiv icon

Chain-of-Thought Reasoning is a Policy Improvement Operator

Sep 15, 2023
Hugh Zhang, David C. Parkes

Large language models have astounded the world with fascinating new capabilities. However, they currently lack the ability to teach themselves new skills, relying instead on being trained on large amounts of human-generated data. We introduce SECToR (Self-Education via Chain-of-Thought Reasoning), a proof-of-concept demonstration that language models can successfully teach themselves new skills using chain-of-thought reasoning. Inspired by previous work in both reinforcement learning (Silver et al., 2017) and human cognition (Kahneman, 2011), SECToR first uses chain-of-thought reasoning to slowly think its way through problems. SECToR then fine-tunes the model to generate those same answers, this time without using chain-of-thought reasoning. Language models trained via SECToR autonomously learn to add up to 29-digit numbers without any access to any ground truth examples beyond an initial supervised fine-tuning phase consisting only of numbers with 6 or fewer digits. Our central hypothesis is that chain-of-thought reasoning can act as a policy improvement operator, analogously to how Monte-Carlo Tree Search is used in AlphaZero. We hope that this research can lead to new directions in which language models can learn to teach themselves without the need for human demonstrations.

Viaarxiv icon

Generative Social Choice

Sep 03, 2023
Sara Fish, Paul Gölz, David C. Parkes, Ariel D. Procaccia, Gili Rusak, Itai Shapira, Manuel Wüthrich

Traditionally, social choice theory has only been applicable to choices among a few predetermined alternatives but not to more complex decisions such as collectively selecting a textual statement. We introduce generative social choice, a framework that combines the mathematical rigor of social choice theory with large language models' capability to generate text and extrapolate preferences. This framework divides the design of AI-augmented democratic processes into two components: first, proving that the process satisfies rigorous representation guarantees when given access to oracle queries; second, empirically validating that these queries can be approximately implemented using a large language model. We illustrate this framework by applying it to the problem of generating a slate of statements that is representative of opinions expressed as free-form text, for instance in an online deliberative process.

Viaarxiv icon

Deep Contract Design via Discontinuous Piecewise Affine Neural Networks

Jul 05, 2023
Tonghan Wang, Paul Dütting, Dmitry Ivanov, Inbal Talgam-Cohen, David C. Parkes

Figure 1 for Deep Contract Design via Discontinuous Piecewise Affine Neural Networks
Figure 2 for Deep Contract Design via Discontinuous Piecewise Affine Neural Networks
Figure 3 for Deep Contract Design via Discontinuous Piecewise Affine Neural Networks
Figure 4 for Deep Contract Design via Discontinuous Piecewise Affine Neural Networks

Contract design involves a principal who establishes contractual agreements about payments for outcomes that arise from the actions of an agent. In this paper, we initiate the study of deep learning for the automated design of optimal contracts. We formulate this as an offline learning problem, where a deep network is used to represent the principal's expected utility as a function of the design of a contract. We introduce a novel representation: the Discontinuous ReLU (DeLU) network, which models the principal's utility as a discontinuous piecewise affine function where each piece corresponds to the agent taking a particular action. DeLU networks implicitly learn closed-form expressions for the incentive compatibility constraints of the agent and the utility maximization objective of the principal, and support parallel inference on each piece through linear programming or interior-point methods that solve for optimal contracts. We provide empirical results that demonstrate success in approximating the principal's utility function with a small number of training samples and scaling to find approximately optimal contracts on problems with a large number of actions and outcomes.

Viaarxiv icon

Reinforcement Learning with Stepwise Fairness Constraints

Nov 08, 2022
Zhun Deng, He Sun, Zhiwei Steven Wu, Linjun Zhang, David C. Parkes

Figure 1 for Reinforcement Learning with Stepwise Fairness Constraints
Figure 2 for Reinforcement Learning with Stepwise Fairness Constraints

AI methods are used in societally important settings, ranging from credit to employment to housing, and it is crucial to provide fairness in regard to algorithmic decision making. Moreover, many settings are dynamic, with populations responding to sequential decision policies. We introduce the study of reinforcement learning (RL) with stepwise fairness constraints, requiring group fairness at each time step. Our focus is on tabular episodic RL, and we provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation. Our framework provides useful tools to study the impact of fairness constraints in sequential settings and brings up new challenges in RL.

* Fairness, Reinforcement Learning 
Viaarxiv icon

Explainable Reinforcement Learning via Model Transforms

Sep 24, 2022
Mira Finkelstein, Lucy Liu, Nitsan Levy Schlot, Yoav Kolumbus, David C. Parkes, Jeffrey S. Rosenshein, Sarah Keren

Figure 1 for Explainable Reinforcement Learning via Model Transforms
Figure 2 for Explainable Reinforcement Learning via Model Transforms
Figure 3 for Explainable Reinforcement Learning via Model Transforms

Understanding emerging behaviors of reinforcement learning (RL) agents may be difficult since such agents are often trained in complex environments using highly complex decision making procedures. This has given rise to a variety of approaches to explainability in RL that aim to reconcile discrepancies that may arise between the behavior of an agent and the behavior that is anticipated by an observer. Most recent approaches have relied either on domain knowledge, that may not always be available, on an analysis of the agent's policy, or on an analysis of specific elements of the underlying environment, typically modeled as a Markov Decision Process (MDP). Our key claim is that even if the underlying MDP is not fully known (e.g., the transition probabilities have not been accurately learned) or is not maintained by the agent (i.e., when using model-free methods), it can nevertheless be exploited to automatically generate explanations. For this purpose, we suggest using formal MDP abstractions and transforms, previously used in the literature for expediting the search for optimal policies, to automatically produce explanations. Since such transforms are typically based on a symbolic representation of the environment, they may represent meaningful explanations for gaps between the anticipated and actual agent behavior. We formally define this problem, suggest a class of transforms that can be used for explaining emergent behaviors, and suggest methods that enable efficient search for an explanation. We demonstrate the approach on a set of standard benchmarks.

* Conference on Neural Information Processing Systems (NeurIPS) 2022 
Viaarxiv icon

Predictive Multiplicity in Probabilistic Classification

Jun 02, 2022
Jamelle Watson-Daniels, David C. Parkes, Berk Ustun

Figure 1 for Predictive Multiplicity in Probabilistic Classification
Figure 2 for Predictive Multiplicity in Probabilistic Classification
Figure 3 for Predictive Multiplicity in Probabilistic Classification
Figure 4 for Predictive Multiplicity in Probabilistic Classification

For a prediction task, there may exist multiple models that perform almost equally well. This multiplicity complicates how we typically develop and deploy machine learning models. We study how multiplicity affects predictions -- i.e., predictive multiplicity -- in probabilistic classification. We introduce new measures for this setting and present optimization-based methods to compute these measures for convex empirical risk minimization problems like logistic regression. We apply our methodology to gain insight into why predictive multiplicity arises. We study the incidence and prevalence of predictive multiplicity in real-world risk assessment tasks. Our results emphasize the need to report multiplicity more widely.

Viaarxiv icon

Learning to Mitigate AI Collusion on Economic Platforms

Feb 15, 2022
Gianluca Brero, Nicolas Lepore, Eric Mibuari, David C. Parkes

Figure 1 for Learning to Mitigate AI Collusion on Economic Platforms
Figure 2 for Learning to Mitigate AI Collusion on Economic Platforms
Figure 3 for Learning to Mitigate AI Collusion on Economic Platforms
Figure 4 for Learning to Mitigate AI Collusion on Economic Platforms

Algorithmic pricing on online e-commerce platforms raises the concern of tacit collusion, where reinforcement learning algorithms learn to set collusive prices in a decentralized manner and through nothing more than profit feedback. This raises the question as to whether collusive pricing can be prevented through the design of suitable "buy boxes," i.e., through the design of the rules that govern the elements of e-commerce sites that promote particular products and prices to consumers. In previous work, Johnson et al. (2020) designed hand-crafted buy box rules that use demand-steering, based on the history of pricing by sellers, to prevent collusive behavior. Although effective against price collusion, these rules effect this by imposing severe restrictions on consumer choice and consumer welfare. In this paper, we demonstrate that reinforcement learning (RL) can also be used by platforms to learn buy box rules that are effective in preventing collusion by RL sellers, and to do so without reducing consumer choice. For this, we adopt the methodology of Stackelberg MDPs, and demonstrate success in learning robust rules that continue to provide high consumer welfare together with sellers employing different behavior models or having out-of-distribution costs for goods.

Viaarxiv icon

The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning

Aug 05, 2021
Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, Richard Socher

Figure 1 for The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning
Figure 2 for The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning
Figure 3 for The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning
Figure 4 for The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning

AI and reinforcement learning (RL) have improved many areas, but are not yet widely adopted in economic policy design, mechanism design, or economics at large. At the same time, current economic methodology is limited by a lack of counterfactual data, simplistic behavioral models, and limited opportunities to experiment with policies and evaluate behavioral responses. Here we show that machine-learning-based economic simulation is a powerful policy and mechanism design framework to overcome these limitations. The AI Economist is a two-level, deep RL framework that trains both agents and a social planner who co-adapt, providing a tractable solution to the highly unstable and novel two-level RL challenge. From a simple specification of an economy, we learn rational agent behaviors that adapt to learned planner policies and vice versa. We demonstrate the efficacy of the AI Economist on the problem of optimal taxation. In simple one-step economies, the AI Economist recovers the optimal tax policy of economic theory. In complex, dynamic economies, the AI Economist substantially improves both utilitarian social welfare and the trade-off between equality and productivity over baselines. It does so despite emergent tax-gaming strategies, while accounting for agent interactions and behavioral change more accurately than economic theory. These results demonstrate for the first time that two-level, deep RL can be used for understanding and as a complement to theory for economic design, unlocking a new computational learning-based approach to understanding economic policy.

* Substantial Extension of https://arxiv.org/abs/2004.13332. SZ and AT contributed equally 
Viaarxiv icon

Deep Learning for Two-Sided Matching

Jul 07, 2021
Sai Srivatsa Ravindranath, Zhe Feng, Shira Li, Jonathan Ma, Scott D. Kominers, David C. Parkes

Figure 1 for Deep Learning for Two-Sided Matching
Figure 2 for Deep Learning for Two-Sided Matching
Figure 3 for Deep Learning for Two-Sided Matching
Figure 4 for Deep Learning for Two-Sided Matching

We initiate the use of a multi-layer neural network to model two-sided matching and to explore the design space between strategy-proofness and stability. It is well known that both properties cannot be achieved simultaneously but the efficient frontier in this design space is not understood. We show empirically that it is possible to achieve a good compromise between stability and strategy-proofness-substantially better than that achievable through a convex combination of deferred acceptance (stable and strategy-proof for only one side of the market) and randomized serial dictatorship (strategy-proof but not stable).

Viaarxiv icon