Abstract:Firms increasingly rely on dynamic pricing to respond to evolving customer demand, yet in many applications they observe only the revenue generated by a single posted price in each period. At the same time, market conditions may shift gradually or abruptly due to changes in customer preferences, competition, or external shocks. These features create two intertwined challenges: learning the revenue--demand relationship from limited feedback and adapting pricing decisions to a changing environment. We study how a seller can learn and earn effectively under these constraints, without assuming a specific parametric form for demand. We develop a learning framework that updates prices using revenue-based gradient approximations constructed from one observation per period. To address environmental changes, we incorporate a restarting mechanism that periodically refreshes the learning process so that outdated information is discounted. When the degree of nonstationarity is unknown, we further introduce a meta-learning layer to adaptively hedge across multiple restarting schedules. We provide performance guarantees for our approach, showing how cumulative revenue loss relative to a fully informed benchmark depends on both the time horizon and the magnitude of market variation. Simulation experiments using synthetic and real-world data illustrate the effectiveness of the proposed procedures.
Abstract:Black-box optimization is often encountered for decision-making in complex systems management, where the knowledge of system is limited. Under these circumstances, it is essential to balance the utilization of new information with computational efficiency. In practice, decision-makers often face the dual tasks of optimization and statistical inference for the optimal performance, in order to achieve it with a high reliability. Our goal is to address the dual tasks in an online fashion. Wu et al (2022) [arXiv preprint: 2210.06737] point out that the sample average of performance estimates generated by the optimization algorithm needs not to admit a central limit theorem. We propose an algorithm that not only tackles this issue, but also provides an online consistent estimator for the variance of the performance. Furthermore, we characterize the convergence rate of the coverage probabilities of the asymptotic confidence intervals.