This paper presents the challenge report for the 2021 Kidney and Kidney Tumor Segmentation Challenge (KiTS21) held in conjunction with the 2021 international conference on Medical Image Computing and Computer Assisted Interventions (MICCAI). KiTS21 is a sequel to its first edition in 2019, and it features a variety of innovations in how the challenge was designed, in addition to a larger dataset. A novel annotation method was used to collect three separate annotations for each region of interest, and these annotations were performed in a fully transparent setting using a web-based annotation tool. Further, the KiTS21 test set was collected from an outside institution, challenging participants to develop methods that generalize well to new populations. Nonetheless, the top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance. An in-depth meta-analysis is presented describing which methods were used and how they faired on the leaderboard, as well as the characteristics of which cases generally saw good performance, and which did not. Overall KiTS21 facilitated a significant advancement in the state of the art in kidney tumor segmentation, and provides useful insights that are applicable to the field of semantic segmentation as a whole.
Manual engineering of high-performance implementations typically consumes many resources and requires in-depth knowledge of the hardware. Compilers try to address these problems; however, they are limited by design in what they can do. To address this, we present CryptOpt, an automatic optimizer for long stretches of straightline code. Experimental results across eight hardware platforms show that CryptOpt achieves a speed-up factor of up to 2.56 over current off-the-shelf compilers.
The objectives of option hedging/trading extend beyond mere protection against downside risks, with a desire to seek gains also driving agent's strategies. In this study, we showcase the potential of robust risk-aware reinforcement learning (RL) in mitigating the risks associated with path-dependent financial derivatives. We accomplish this by leveraging a policy gradient approach that optimises robust risk-aware performance criteria. We specifically apply this methodology to the hedging of barrier options, and highlight how the optimal hedging strategy undergoes distortions as the agent moves from being risk-averse to risk-seeking. As well as how the agent robustifies their strategy. We further investigate the performance of the hedge when the data generating process (DGP) varies from the training DGP, and demonstrate that the robust strategies outperform the non-robust ones.
Despite many recent advancements in language modeling, state-of-the-art language models lack grounding in the real world and struggle with tasks involving complex reasoning. Meanwhile, advances in the symbolic reasoning capabilities of AI have led to systems that outperform humans in games like chess and Go (Silver et al., 2018). Chess commentary provides an interesting domain for bridging these two fields of research, as it requires reasoning over a complex board state and providing analyses in natural language. In this work we demonstrate how to combine symbolic reasoning engines with controllable language models to generate chess commentaries. We conduct experiments to demonstrate that our approach generates commentaries that are preferred by human judges over previous baselines.
Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best-known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations for the relatively new Intel i9 12G, of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1.
The 2021 Waymo Interaction Prediction Challenge introduced a problem of predicting the future trajectories and confidences of two interacting agents jointly. We developed a solution that takes an anchored marginal motion prediction model with rasterization and augments it to model agent interaction. We do this by predicting the joint confidences using a rasterized image that highlights the ego agent and the interacting agent. Our solution operates on the cartesian product space of the anchors; hence the $"^2"$ in $AIR^2$. Our model achieved the highest mAP (the primary metric) on the leaderboard.
Self-supervised representation learning is a fundamental problem in computer vision with many useful applications (e.g., image search, instance level recognition, copy detection). In this paper we present a new contrastive self-supervised representation learning algorithm in the context of Copy Detection in the 2021 Image Similarity Challenge hosted by Facebook AI Research. Previous work in contrastive self-supervised learning has identified the importance of being able to optimize representations while ``pushing'' against a large number of negative examples. Representative previous solutions either use large batches enabled by modern distributed training systems or maintain queues or memory banks holding recently evaluated representations while relaxing some consistency properties. We approach this problem from a new angle: We directly learn a query model and a key model jointly and push representations against a very large number (e.g., 1 million) of negative representations in each SGD step. We achieve this by freezing the backbone on one side and by alternating between a Q-optimization step and a K-optimization step. During the competition timeframe, our algorithms achieved a micro-AP score of 0.3401 on the Phase 1 leaderboard, significantly improving over the baseline $\mu$AP of 0.1556. On the final Phase 2 leaderboard, our model scored 0.1919, while the baseline scored 0.0526. Continued training yielded further improvement. We conducted an empirical study to compare the proposed approach with a SimCLR style strategy where the negative examples are taken from the batch only. We found that our method ($\mu$AP of 0.3403) significantly outperforms this SimCLR-style baseline ($\mu$AP of 0.2001).
Prior AI successes in complex games have largely focused on settings with at most hundreds of actions at each decision point. In contrast, Diplomacy is a game with more than 10^20 possible actions per turn. Previous attempts to address games with large branching factors, such as Diplomacy, StarCraft, and Dota, used human data to bootstrap the policy or used handcrafted reward shaping. In this paper, we describe an algorithm for action exploration and equilibrium approximation in games with combinatorial action spaces. This algorithm simultaneously performs value iteration while learning a policy proposal network. A double oracle step is used to explore additional actions to add to the policy proposals. At each state, the target state value and policy for the model training are computed via an equilibrium search procedure. Using this algorithm, we train an agent, DORA, completely from scratch for a popular two-player variant of Diplomacy and show that it achieves superhuman performance. Additionally, we extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data. We present evidence that this agent plays a strategy that is incompatible with human-data bootstrapped agents. This presents the first strong evidence of multiple equilibria in Diplomacy and suggests that self play alone may be insufficient for achieving superhuman performance in Diplomacy.
The standard problem setting in Dec-POMDPs is self-play, where the goal is to find a set of policies that play optimally together. Policies learned through self-play may adopt arbitrary conventions and rely on multi-step counterfactual reasoning based on assumptions about other agents' actions and thus fail when paired with humans or independently trained agents. In contrast, no current methods can learn optimal policies that are fully grounded, i.e., do not rely on counterfactual information from observing other agents' actions. To address this, we present off-belief learning} (OBL): at each time step OBL agents assume that all past actions were taken by a given, fixed policy ($\pi_0$), but that future actions will be taken by an optimal policy under these same assumptions. When $\pi_0$ is uniform random, OBL learns the optimal grounded policy. OBL can be iterated in a hierarchy, where the optimal policy from one level becomes the input to the next. This introduces counterfactual reasoning in a controlled manner. Unlike independent RL which may converge to any equilibrium policy, OBL converges to a unique policy, making it more suitable for zero-shot coordination. OBL can be scaled to high-dimensional settings with a fictitious transition mechanism and shows strong performance in both a simple toy-setting and the benchmark human-AI/zero-shot coordination problem Hanabi.