Online bidding and auction are crucial aspects of the online advertising industry. Conventionally, there is only one slot for ad display and most current studies focus on it. Nowadays, multi-slot display advertising is gradually becoming popular where many ads could be displayed in a list and shown as a whole to users. However, multi-slot display advertising leads to different cost-effectiveness. Advertisers have the incentive to adjust bid prices so as to win the most economical ad positions. In this study, we introduce bid shading into multi-slot display advertising for bid price adjustment with a Multi-task End-to-end Bid Shading(MEBS) method. We prove the optimality of our method theoretically and examine its performance experimentally. Through extensive offline and online experiments, we demonstrate the effectiveness and efficiency of our method, and we obtain a 7.01% lift in Gross Merchandise Volume, a 7.42% lift in Return on Investment, and a 3.26% lift in ad buy count.
In online advertising, advertisers participate in ad auctions to acquire ad opportunities, often by utilizing auto-bidding tools provided by demand-side platforms (DSPs). The current auto-bidding algorithms typically employ reinforcement learning (RL). However, due to safety concerns, most RL-based auto-bidding policies are trained in simulation, leading to a performance degradation when deployed in online environments. To narrow this gap, we can deploy multiple auto-bidding agents in parallel to collect a large interaction dataset. Offline RL algorithms can then be utilized to train a new policy. The trained policy can subsequently be deployed for further data collection, resulting in an iterative training framework, which we refer to as iterative offline RL. In this work, we identify the performance bottleneck of this iterative offline RL framework, which originates from the ineffective exploration and exploitation caused by the inherent conservatism of offline RL algorithms. To overcome this bottleneck, we propose Trajectory-wise Exploration and Exploitation (TEE), which introduces a novel data collecting and data utilization method for iterative offline RL from a trajectory perspective. Furthermore, to ensure the safety of online exploration while preserving the dataset quality for TEE, we propose Safe Exploration by Adaptive Action Selection (SEAS). Both offline experiments and real-world experiments on Alibaba display advertising platform demonstrate the effectiveness of our proposed method.
Automated auction design seeks to discover empirically high-revenue and incentive-compatible mechanisms using machine learning. Ensuring dominant strategy incentive compatibility (DSIC) is crucial, and the most effective approach is to confine the mechanism to Affine Maximizer Auctions (AMAs). Nevertheless, existing AMA-based approaches encounter challenges such as scalability issues (arising from combinatorial candidate allocations) and the non-differentiability of revenue. In this paper, to achieve a scalable AMA-based method, we further restrict the auction mechanism to Virtual Valuations Combinatorial Auctions (VVCAs), a subset of AMAs with significantly fewer parameters. Initially, we employ a parallelizable dynamic programming algorithm to compute the winning allocation of a VVCA. Subsequently, we propose a novel optimization method that combines both zeroth-order and first-order techniques to optimize the VVCA parameters. Extensive experiments demonstrate the efficacy and scalability of our proposed approach, termed Zeroth-order and First-order Optimization of VVCAs (ZFO-VVCA), particularly when applied to large-scale auctions.
In online advertising, auto-bidding has become an essential tool for advertisers to optimize their preferred ad performance metrics by simply expressing the high-level campaign objectives and constraints. Previous works consider the design of auto-bidding agents from the single-agent view without modeling the mutual influence between agents. In this paper, we instead consider this problem from the perspective of a distributed multi-agent system, and propose a general Multi-Agent reinforcement learning framework for Auto-Bidding, namely MAAB, to learn the auto-bidding strategies. First, we investigate the competition and cooperation relation among auto-bidding agents, and propose temperature-regularized credit assignment for establishing a mixed cooperative-competitive paradigm. By carefully making a competition and cooperation trade-off among the agents, we can reach an equilibrium state that guarantees not only individual advertiser's utility but also the system performance (social welfare). Second, due to the observed collusion behaviors of bidding low prices underlying the cooperation, we further propose bar agents to set a personalized bidding bar for each agent, and then to alleviate the degradation of revenue. Third, to deploy MAAB to the large-scale advertising system with millions of advertisers, we propose a mean-field approach. By grouping advertisers with the same objective as a mean auto-bidding agent, the interactions among advertisers are greatly simplified, making it practical to train MAAB efficiently. Extensive experiments on the offline industrial dataset and Alibaba advertising platform demonstrate that our approach outperforms several baseline methods in terms of social welfare and guarantees the ad platform's revenue.
In e-commerce advertising, it is crucial to jointly consider various performance metrics, e.g., user experience, advertiser utility, and platform revenue. Traditional auction mechanisms, such as GSP and VCG auctions, can be suboptimal due to their fixed allocation rules to optimize a single performance metric (e.g., revenue or social welfare). Recently, data-driven auctions, learned directly from auction outcomes to optimize multiple performance metrics, have attracted increasing research interests. However, the procedure of auction mechanisms involves various discrete calculation operations, making it challenging to be compatible with continuous optimization pipelines in machine learning. In this paper, we design \underline{D}eep \underline{N}eural \underline{A}uctions (DNAs) to enable end-to-end auction learning by proposing a differentiable model to relax the discrete sorting operation, a key component in auctions. We optimize the performance metrics by developing deep models to efficiently extract contexts from auctions, providing rich features for auction design. We further integrate the game theoretical conditions within the model design, to guarantee the stability of the auctions. DNAs have been successfully deployed in the e-commerce advertising system at Taobao. Experimental evaluation results on both large-scale data set as well as online A/B test demonstrated that DNAs significantly outperformed other mechanisms widely adopted in industry.
In e-commerce advertising, the ad platform usually relies on auction mechanisms to optimize different performance metrics, such as user experience, advertiser utility, and platform revenue. However, most of the state-of-the-art auction mechanisms only focus on optimizing a single performance metric, e.g., either social welfare or revenue, and are not suitable for e-commerce advertising with various, dynamic, difficult to estimate, and even conflicting performance metrics. In this paper, we propose a new mechanism called Deep GSP auction, which leverages deep learning to design new rank score functions within the celebrated GSP auction framework. These new rank score functions are implemented via deep neural network models under the constraints of monotone allocation and smooth transition. The requirement of monotone allocation ensures Deep GSP auction nice game theoretical properties, while the requirement of smooth transition guarantees the advertiser utilities would not fluctuate too much when the auction mechanism switches among candidate mechanisms to achieve different optimization objectives. We deployed the proposed mechanisms in a leading e-commerce ad platform and conducted comprehensive experimental evaluations with both offline simulations and online A/B tests. The results demonstrated the effectiveness of the Deep GSP auction compared to the state-of-the-art auction mechanisms.
In the field of fixed wing aircraft, many morphing technologies have been applied to the wing, such as adaptive airfoil, variable span aircraft, variable swept angle aircraft, etc., but few are aimed at the tail. The traditional fixed wing tail includes horizontal and vertical tail. Inspired by the bird tail, this paper will introduce a new bionic tail. The tail has a novel control mode, which has multiple control variables. Compared with the traditional fixed wing tail, it adds the area control and rotation control around the longitudinal symmetry axis, so it can control the pitch and yaw of the aircraft at the same time. When the area of the tail changes, the maneuverability and stability of the aircraft can be changed, and the aerodynamic efficiency of the aircraft can also be improved. The aircraft with morphing ability is often difficult to establish accurate mathematical model, because the model has a strong nonlinear, model-based control method is difficult to deal with the strong nonlinear aircraft. In recent years, with the rapid development of artificial intelligence technology, learning based control methods are also brilliant, in which the deep reinforcement learning algorithm can be a good solution to the control object which is difficult to establish model. In this paper, the model-free control algorithm PPO is used to control the tail, and the traditional PID is used to control the aileron and throttle. After training in simulation, the tail shows excellent attitude control ability.
Multi-valued data are commonly found in many real applications. During the process of clustering multi-valued data, most existing methods use sampling or aggregation mechanisms that cannot reflect the real distribution of objects and their instances and thus fail to obtain high-quality clusters. In this paper, a concept of $\alpha$-approximation distance is introduced to measure the connectivity between multi-valued objects by taking account of the distribution of the instances. An $\alpha$-approximation density-based clustering algorithm (DBCMO) is proposed to efficiently cluster the multi-valued objects by using global and local R* tree structures. To speed up the algorithm, four pruning rules on the tree structures are implemented. Empirical studies on synthetic and real datasets demonstrate that DBCMO can efficiently and effectively discover the multi-valued object clusters. A comparison with two existing methods further shows that DBCMO can better handle a continuous decrease in the cluster density and detect clusters of varying density.
Energy consumption is an important issue in continuous wireless telemonitoring of physiological signals. Compressed sensing (CS) is a promising framework to address it, due to its energy-efficient data compression procedure. However, most CS algorithms have difficulty in data recovery due to non-sparsity characteristic of many physiological signals. Block sparse Bayesian learning (BSBL) is an effective approach to recover such signals with satisfactory recovery quality. However, it is time-consuming in recovering multichannel signals, since its computational load almost linearly increases with the number of channels. This work proposes a spatiotemporal sparse Bayesian learning algorithm to recover multichannel signals simultaneously. It not only exploits temporal correlation within each channel signal, but also exploits inter-channel correlation among different channel signals. Furthermore, its computational load is not significantly affected by the number of channels. The proposed algorithm was applied to brain computer interface (BCI) and EEG-based driver's drowsiness estimation. Results showed that the algorithm had both better recovery performance and much higher speed than BSBL. Particularly, the proposed algorithm ensured that the BCI classification and the drowsiness estimation had little degradation even when data were compressed by 80%, making it very suitable for continuous wireless telemonitoring of multichannel signals.