Many human-facing algorithms -- including those that power recommender systems or hiring decision tools -- are trained on data provided by their users. The developers of these algorithms commonly adopt the assumption that the data generating process is exogenous: that is, how a user reacts to a given prompt (e.g., a recommendation or hiring suggestion) depends on the prompt and not on the algorithm that generated it. For example, the assumption that a person's behavior follows a ground-truth distribution is an exogeneity assumption. In practice, when algorithms interact with humans, this assumption rarely holds because users can be strategic. Recent studies document, for example, TikTok users changing their scrolling behavior after learning that TikTok uses it to curate their feed, and Uber drivers changing how they accept and cancel rides in response to changes in Uber's algorithm. Our work studies the implications of this strategic behavior by modeling the interactions between a user and their data-driven platform as a repeated, two-player game. We first find that user strategization can actually help platforms in the short term. We then show that it corrupts platforms' data and ultimately hurts their ability to make counterfactual decisions. We connect this phenomenon to user trust, and show that designing trustworthy algorithms can go hand in hand with accurate estimation. Finally, we provide a formalization of trustworthiness that inspires potential interventions.
People form judgments and make decisions based on the information that they observe. A growing portion of that information is not only provided, but carefully curated by social media platforms. Although lawmakers largely agree that platforms should not operate without any oversight, there is little consensus on how to regulate social media. There is consensus, however, that creating a strict, global standard of "acceptable" content is untenable (e.g., in the US, it is incompatible with Section 230 of the Communications Decency Act and the First Amendment). In this work, we propose that algorithmic filtering should be regulated with respect to a flexible, user-driven baseline. We provide a concrete framework for regulating and auditing a social media platform according to such a baseline. In particular, we introduce the notion of a baseline feed: the content that a user would see without filtering (e.g., on Twitter, this could be the chronological timeline). We require that the feeds a platform filters contain "similar" informational content as their respective baseline feeds, and we design a principled way to measure similarity. This approach is motivated by related suggestions that regulations should increase user agency. We present an auditing procedure that checks whether a platform honors this requirement. Notably, the audit needs only black-box access to a platform's filtering algorithm, and it does not access or infer private user information. We provide theoretical guarantees on the strength of the audit. We further show that requiring closeness between filtered and baseline feeds does not impose a large performance cost, nor does it create echo chambers.
In recent years, multiple notions of algorithmic fairness have arisen. One such notion is individual fairness (IF), which requires that individuals who are similar receive similar treatment. In parallel, matrix estimation (ME) has emerged as a natural paradigm for handling noisy data with missing values. In this work, we connect the two concepts. We show that pre-processing data using ME can improve an algorithm's IF without sacrificing performance. Specifically, we show that using a popular ME method known as singular value thresholding (SVT) to pre-process the data provides a strong IF guarantee under appropriate conditions. We then show that, under analogous conditions, SVT pre-processing also yields estimates that are consistent and approximately minimax optimal. As such, the ME pre-processing step does not, under the stated conditions, increase the prediction error of the base algorithm, i.e., does not impose a fairness-performance trade-off. We verify these results on synthetic and real data.
We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.
We consider the two-sided matching market with bandit learners. In the standard matching problem, users and providers are matched to ensure incentive compatibility via the notion of stability. However, contrary to the core assumption of the matching problem, users and providers do not know their true preferences a priori and must learn them. To address this assumption, recent works propose to blend the matching and multi-armed bandit problems. They establish that it is possible to assign matchings that are stable (i.e., incentive-compatible) at every time step while also allowing agents to learn enough so that the system converges to matchings that are stable under the agents' true preferences. However, while some agents may incur low regret under these matchings, others can incur high regret -- specifically, $\Omega(T)$ optimal regret where $T$ is the time horizon. In this work, we incorporate costs and transfers in the two-sided matching market with bandit learners in order to faithfully model competition between agents. We prove that, under our framework, it is possible to simultaneously guarantee four desiderata: (1) incentive compatibility, i.e., stability, (2) low regret, i.e., $O(\log(T))$ optimal regret, (3) fairness in the distribution of regret among agents, and (4) high social welfare.
Radar detects stable, long-range objects under variable weather and lighting conditions, making it a reliable and versatile sensor well suited for ego-motion estimation. In this work, we propose a radar-only odometry pipeline that is highly robust to radar artifacts (e.g., speckle noise and false positives) and requires only one input parameter. We demonstrate its ability to adapt across diverse settings, from urban UK to off-road Iceland, achieving a scan matching accuracy of approximately 5.20 cm and 0.0929 deg when using GPS as ground truth (compared to visual odometry's 5.77 cm and 0.1032 deg). We present algorithms for keypoint extraction and data association, framing the latter as a graph matching optimization problem, and provide an in-depth system analysis.