Practitioners conducting adaptive experiments often encounter two competing priorities: reducing the cost of experimentation by effectively assigning treatments during the experiment itself, and gathering information swiftly to conclude the experiment and implement a treatment across the population. Currently, the literature is divided, with studies on regret minimization addressing the former priority in isolation, and research on best-arm identification focusing solely on the latter. This paper proposes a unified model that accounts for both within-experiment performance and post-experiment outcomes. We then provide a sharp theory of optimal performance in large populations that unifies canonical results in the literature. This unification also uncovers novel insights. For example, the theory reveals that familiar algorithms, like the recently proposed top-two Thompson sampling algorithm, can be adapted to optimize a broad class of objectives by simply adjusting a single scalar parameter. In addition, the theory reveals that enormous reductions in experiment duration can sometimes be achieved with minimal impact on both within-experiment and post-experiment regret.
We consider pure-exploration problems in the context of stochastic sequential adaptive experiments with a finite set of alternative options. The goal of the decision-maker is to accurately answer a query question regarding the alternatives with high confidence with minimal measurement efforts. A typical query question is to identify the alternative with the best performance, leading to ranking and selection problems, or best-arm identification in the machine learning literature. We focus on the fixed-precision setting and derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples. Using dual variables, we characterize the necessary and sufficient conditions for an allocation to be optimal. The use of dual variables allow us to bypass the combinatorial structure of the optimality conditions that relies solely on primal variables. Remarkably, these optimality conditions enable an extension of top-two algorithm design principle, initially proposed for best-arm identification. Furthermore, our optimality conditions give rise to a straightforward yet efficient selection rule, termed information-directed selection, which adaptively picks from a candidate set based on information gain of the candidates. We outline the broad contexts where our algorithmic approach can be implemented. We establish that, paired with information-directed selection, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature. Our algorithm is optimal for $\epsilon$-best-arm identification and thresholding bandit problems. Our analysis also leads to a general principle to guide adaptations of Thompson sampling for pure-exploration problems. Numerical experiments highlight the exceptional efficiency of our proposed algorithms relative to existing ones.
In drone racing, the time-minimum trajectory is affected by the drone's capabilities, the layout of the race track, and the configurations of the gates (e.g., their shapes and sizes). However, previous studies neglect the configuration of the gates, simply rendering drone racing a waypoint-passing task. This formulation often leads to a conservative choice of paths through the gates, as the spatial potential of the gates is not fully utilized. To address this issue, we present a time-optimal planner that can faithfully model gate constraints with various configurations and thereby generate a more time-efficient trajectory while considering the single-rotor-thrust limits. Our approach excels in computational efficiency which only takes a few seconds to compute the full state and control trajectories of the drone through tracks with dozens of different gates. Extensive simulations and experiments confirm the effectiveness of the proposed methodology, showing that the lap time can be further reduced by taking into account the gate's configuration. We validate our planner in real-world flights and demonstrate super-extreme flight trajectory through race tracks.
Detecting breast lesion in videos is crucial for computer-aided diagnosis. Existing video-based breast lesion detection approaches typically perform temporal feature aggregation of deep backbone features based on the self-attention operation. We argue that such a strategy struggles to effectively perform deep feature aggregation and ignores the useful local information. To tackle these issues, we propose a spatial-temporal deformable attention based framework, named STNet. Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion. The spatial-temporal deformable attention module enables deep feature aggregation in each stage of both encoder and decoder. To further accelerate the detection speed, we introduce an encoder feature shuffle strategy for multi-frame prediction during inference. In our encoder feature shuffle strategy, we share the backbone and encoder features, and shuffle encoder features for decoder to generate the predictions of multiple frames. The experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance, while operating twice as fast inference speed. The code and model are available at https://github.com/AlfredQin/STNet.
Best arm identification or pure exploration problems have received much attention in the COLT community since Bubeck et al. (2009) and Audibert et al. (2010). For any bandit instance with a unique best arm, its asymptotic complexity in the so-called fixed-confidence setting has been completely characterized in Garivier and Kaufmann (2016) and Chernoff (1959), while little is known about the asymptotic complexity in its "dual" setting called fixed-budget setting. This note discusses the open problems and conjectures about the instance-dependent asymptotic complexity in the fixed-budget setting.
This paper presents an image-based visual servo control (IBVS) method for a first-person-view (FPV) quadrotor to conduct aggressive aerial tracking. There are three major challenges to maneuvering an underactuated vehicle using IBVS: (i) finding a visual feature representation that is robust to large rotations and is suited to be an optimization variable; (ii) keeping the target visible without sacrificing the robot's agility; and (iii) compensating for the rotational effects in the detected features. We propose a complete design framework to address these problems. First, we employ a rotation on $SO(3)$ to represent a spherical image feature on $S^{2}$ to gain singularity-free and second-order differentiable properties. To ensure target visibility, we formulate the IBVS as a nonlinear model predictive control (NMPC) problem with three constraints taken into account: the robot's physical limits, target visibility, and time-to-collision (TTC). Furthermore, we propose a novel attitude-compensation scheme to enable formulating the visibility constraint in the actual image plane instead of a virtual fix-orientation image plane. It guarantees that the visibility constraint is valid under large rotations. Extensive experimental results show that our method can track a fast-moving target stably and aggressively without the aid of a localization system.
Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm. However, it is still unclear what is the right form of information ratio to optimize when contextual information is available. We investigate the IDS design through two contextual bandit problems: contextual bandits with graph feedback and sparse linear contextual bandits. We provably demonstrate the advantage of contextual IDS over conditional IDS and emphasize the importance of considering the context distribution. The main message is that an intelligent agent should invest more on the actions that are beneficial for the future unseen contexts while the conditional IDS can be myopic. We further propose a computationally-efficient version of contextual IDS based on Actor-Critic and evaluate it empirically on a neural network contextual bandit.
We explore a new model of bandit experiments where a potentially nonstationary sequence of contexts influences arms' performance. Context-unaware algorithms risk confounding while those that perform correct inference face information delays. Our main insight is that an algorithm we call deconfounted Thompson sampling strikes a delicate balance between adaptivity and robustness. Its adaptivity leads to optimal efficiency properties in easy stationary instances, but it displays surprising resilience in hard nonstationary ones which cause other adaptive algorithms to fail.
Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable. In this paper, we establish a Bayesian regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem. This represents the first rigorous regret analysis of ensemble sampling and is made possible by leveraging information-theoretic concepts and novel analytic techniques that may prove useful beyond the scope of this paper.