Alert button
Picture for Yuning Jiang

Yuning Jiang

Alert button

Hierarchical Masked 3D Diffusion Model for Video Outpainting

Sep 05, 2023
Fanda Fan, Chaoxu Guo, Litong Gong, Biao Wang, Tiezheng Ge, Yuning Jiang, Chunjie Luo, Jianfeng Zhan

Figure 1 for Hierarchical Masked 3D Diffusion Model for Video Outpainting
Figure 2 for Hierarchical Masked 3D Diffusion Model for Video Outpainting
Figure 3 for Hierarchical Masked 3D Diffusion Model for Video Outpainting
Figure 4 for Hierarchical Masked 3D Diffusion Model for Video Outpainting

Video outpainting aims to adequately complete missing areas at the edges of video frames. Compared to image outpainting, it presents an additional challenge as the model should maintain the temporal consistency of the filled area. In this paper, we introduce a masked 3D diffusion model for video outpainting. We use the technique of mask modeling to train the 3D diffusion model. This allows us to use multiple guide frames to connect the results of multiple video clip inferences, thus ensuring temporal consistency and reducing jitter between adjacent frames. Meanwhile, we extract the global frames of the video as prompts and guide the model to obtain information other than the current video clip using cross-attention. We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem. The existing coarse-to-fine pipeline only uses the infilling strategy, which brings degradation because the time interval of the sparse frames is too large. Our pipeline benefits from bidirectional learning of the mask modeling and thus can employ a hybrid strategy of infilling and interpolation when generating sparse frames. Experiments show that our method achieves state-of-the-art results in video outpainting tasks. More results are provided at our https://fanfanda.github.io/M3DDM/.

* ACM MM 2023 accepted 
Viaarxiv icon

Federated Linear Bandit Learning via Over-the-Air Computation

Aug 28, 2023
Jiali Wang, Yuning Jiang, Xin Liu, Ting Wang, Yuanming Shi

Figure 1 for Federated Linear Bandit Learning via Over-the-Air Computation
Figure 2 for Federated Linear Bandit Learning via Over-the-Air Computation
Figure 3 for Federated Linear Bandit Learning via Over-the-Air Computation

In this paper, we investigate federated contextual linear bandit learning within a wireless system that comprises a server and multiple devices. Each device interacts with the environment, selects an action based on the received reward, and sends model updates to the server. The primary objective is to minimize cumulative regret across all devices within a finite time horizon. To reduce the communication overhead, devices communicate with the server via over-the-air computation (AirComp) over noisy fading channels, where the channel noise may distort the signals. In this context, we propose a customized federated linear bandits scheme, where each device transmits an analog signal, and the server receives a superposition of these signals distorted by channel noise. A rigorous mathematical analysis is conducted to determine the regret bound of the proposed scheme. Both theoretical analysis and numerical experiments demonstrate the competitive performance of our proposed scheme in terms of regret bounds in various settings.

Viaarxiv icon

Federated Reinforcement Learning for Electric Vehicles Charging Control on Distribution Networks

Aug 17, 2023
Junkai Qian, Yuning Jiang, Xin Liu, Qing Wang, Ting Wang, Yuanming Shi, Wei Chen

Figure 1 for Federated Reinforcement Learning for Electric Vehicles Charging Control on Distribution Networks
Figure 2 for Federated Reinforcement Learning for Electric Vehicles Charging Control on Distribution Networks
Figure 3 for Federated Reinforcement Learning for Electric Vehicles Charging Control on Distribution Networks
Figure 4 for Federated Reinforcement Learning for Electric Vehicles Charging Control on Distribution Networks

With the growing popularity of electric vehicles (EVs), maintaining power grid stability has become a significant challenge. To address this issue, EV charging control strategies have been developed to manage the switch between vehicle-to-grid (V2G) and grid-to-vehicle (G2V) modes for EVs. In this context, multi-agent deep reinforcement learning (MADRL) has proven its effectiveness in EV charging control. However, existing MADRL-based approaches fail to consider the natural power flow of EV charging/discharging in the distribution network and ignore driver privacy. To deal with these problems, this paper proposes a novel approach that combines multi-EV charging/discharging with a radial distribution network (RDN) operating under optimal power flow (OPF) to distribute power flow in real time. A mathematical model is developed to describe the RDN load. The EV charging control problem is formulated as a Markov Decision Process (MDP) to find an optimal charging control strategy that balances V2G profits, RDN load, and driver anxiety. To effectively learn the optimal EV charging control strategy, a federated deep reinforcement learning algorithm named FedSAC is further proposed. Comprehensive simulation results demonstrate the effectiveness and superiority of our proposed algorithm in terms of the diversity of the charging control strategy, the power fluctuations on RDN, the convergence efficiency, and the generalization ability.

Viaarxiv icon

TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design

Aug 13, 2023
Yifan Gao, Jinpeng Lin, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge, Yuning Jiang

Figure 1 for TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design
Figure 2 for TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design
Figure 3 for TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design
Figure 4 for TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design

Text design is one of the most critical procedures in poster design, as it relies heavily on the creativity and expertise of humans to design text images considering the visual harmony and text-semantic. This study introduces TextPainter, a novel multimodal approach that leverages contextual visual information and corresponding text semantics to generate text images. Specifically, TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony. Furthermore, we leverage the language model and introduce a text comprehension module to achieve both sentence-level and word-level style variations. Besides, we construct the PosterT80K dataset, consisting of about 80K posters annotated with sentence-level bounding boxes and text contents. We hope this dataset will pave the way for further research on multimodal text image generation. Extensive quantitative and qualitative experiments demonstrate that TextPainter can generate visually-and-semantically-harmonious text images for posters.

* Accepted to ACM MM 2023. Dataset Link: https://tianchi.aliyun.com/dataset/160034 
Viaarxiv icon

TextPainter: Multimodal Text Image Generation withVisual-harmony and Text-comprehension for Poster Design

Aug 10, 2023
Yifan Gao, Jinpeng Lin, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge, Yuning Jiang

Figure 1 for TextPainter: Multimodal Text Image Generation withVisual-harmony and Text-comprehension for Poster Design
Figure 2 for TextPainter: Multimodal Text Image Generation withVisual-harmony and Text-comprehension for Poster Design
Figure 3 for TextPainter: Multimodal Text Image Generation withVisual-harmony and Text-comprehension for Poster Design
Figure 4 for TextPainter: Multimodal Text Image Generation withVisual-harmony and Text-comprehension for Poster Design

Text design is one of the most critical procedures in poster design, as it relies heavily on the creativity and expertise of humans to design text images considering the visual harmony and text-semantic. This study introduces TextPainter, a novel multimodal approach that leverages contextual visual information and corresponding text semantics to generate text images. Specifically, TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony. Furthermore, we leverage the language model and introduce a text comprehension module to achieve both sentence-level and word-level style variations. Besides, we construct the PosterT80K dataset, consisting of about 80K posters annotated with sentence-level bounding boxes and text contents. We hope this dataset will pave the way for further research on multimodal text image generation. Extensive quantitative and qualitative experiments demonstrate that TextPainter can generate visually-and-semantically-harmonious text images for posters.

* Accepted to ACM MM 2023. Dataset Link: https://tianchi.aliyun.com/dataset/160034 
Viaarxiv icon

Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences

Jul 31, 2023
Dingyi Yang, Hongyu Chen, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin

Figure 1 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
Figure 2 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
Figure 3 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
Figure 4 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences

Stylized visual captioning aims to generate image or video descriptions with specific styles, making them more attractive and emotionally appropriate. One major challenge with this task is the lack of paired stylized captions for visual content, so most existing works focus on unsupervised methods that do not rely on parallel datasets. However, these approaches still require training with sufficient examples that have style labels, and the generated captions are limited to predefined styles. To address these limitations, we explore the problem of Few-Shot Stylized Visual Captioning, which aims to generate captions in any desired style, using only a few examples as guidance during inference, without requiring further training. We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module. Our two-step training scheme proceeds as follows: first, we train a style extractor to generate style representations on an unlabeled text-only corpus. Then, we freeze the extractor and enable our decoder to generate stylized descriptions based on the extracted style vector and projected visual content vectors. During inference, our model can generate desired stylized captions by deriving the style representation from user-supplied examples. Our automatic evaluation results for few-shot sentimental visual captioning outperform state-of-the-art approaches and are comparable to models that are fully trained on labeled style corpora. Human evaluations further confirm our model s ability to handle multiple styles.

* 9 pages, 6 figures 
Viaarxiv icon

Bayesian Optimization of Expensive Nested Grey-Box Functions

Jun 08, 2023
Wenjie Xu, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones

Figure 1 for Bayesian Optimization of Expensive Nested Grey-Box Functions
Figure 2 for Bayesian Optimization of Expensive Nested Grey-Box Functions
Figure 3 for Bayesian Optimization of Expensive Nested Grey-Box Functions
Figure 4 for Bayesian Optimization of Expensive Nested Grey-Box Functions

We consider the problem of optimizing a grey-box objective function, i.e., nested function composed of both black-box and white-box functions. A general formulation for such grey-box problems is given, which covers the existing grey-box optimization formulations as special cases. We then design an optimism-driven algorithm to solve it. Under certain regularity assumptions, our algorithm achieves similar regret bound as that for the standard black-box Bayesian optimization algorithm, up to a constant multiplicative term depending on the Lipschitz constants of the functions considered. We further extend our method to the constrained case and discuss several special cases. For the commonly used kernel functions, the regret bounds allow us to derive a convergence rate to the optimal solution. Experimental results show that our grey-box optimization method empirically improves the speed of finding the global optimal solution significantly, as compared to the standard black-box optimization algorithm.

Viaarxiv icon

Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao

Jun 06, 2023
Jingyue Gao, Shuguang Han, Han Zhu, Siran Yang, Yuning Jiang, Jian Xu, Bo Zheng

Figure 1 for Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao
Figure 2 for Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao
Figure 3 for Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao
Figure 4 for Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao

Click-Through Rate (CTR) prediction serves as a fundamental component in online advertising. A common practice is to train a CTR model on advertisement (ad) impressions with user feedback. Since ad impressions are purposely selected by the model itself, their distribution differs from the inference distribution and thus exhibits sample selection bias (SSB) that affects model performance. Existing studies on SSB mainly employ sample re-weighting techniques which suffer from high variance and poor model calibration. Another line of work relies on costly uniform data that is inadequate to train industrial models. Thus mitigating SSB in industrial models with a uniform-data-free framework is worth exploring. Fortunately, many platforms display mixed results of organic items (i.e., recommendations) and sponsored items (i.e., ads) to users, where impressions of ads and recommendations are selected by different systems but share the same user decision rationales. Based on the above characteristics, we propose to leverage recommendations samples as a free lunch to mitigate SSB for ads CTR model (Rec4Ad). After elaborating data augmentation, Rec4Ad learns disentangled representations with alignment and decorrelation modules for enhancement. When deployed in Taobao display advertising system, Rec4Ad achieves substantial gains in key business metrics, with a lift of up to +6.6\% CTR and +2.9\% RPM.

Viaarxiv icon

COPR: Consistency-Oriented Pre-Ranking for Online Advertising

Jun 06, 2023
Zhishan Zhao, Jingyue Gao, Yu Zhang, Shuguang Han, Siyuan Lou, Xiang-Rong Sheng, Zhe Wang, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng

Figure 1 for COPR: Consistency-Oriented Pre-Ranking for Online Advertising
Figure 2 for COPR: Consistency-Oriented Pre-Ranking for Online Advertising
Figure 3 for COPR: Consistency-Oriented Pre-Ranking for Online Advertising
Figure 4 for COPR: Consistency-Oriented Pre-Ranking for Online Advertising

Cascading architecture has been widely adopted in large-scale advertising systems to balance efficiency and effectiveness. In this architecture, the pre-ranking model is expected to be a lightweight approximation of the ranking model, which handles more candidates with strict latency requirements. Due to the gap in model capacity, the pre-ranking and ranking models usually generate inconsistent ranked results, thus hurting the overall system effectiveness. The paradigm of score alignment is proposed to regularize their raw scores to be consistent. However, it suffers from inevitable alignment errors and error amplification by bids when applied in online advertising. To this end, we introduce a consistency-oriented pre-ranking framework for online advertising, which employs a chunk-based sampling module and a plug-and-play rank alignment module to explicitly optimize consistency of ECPM-ranked results. A $\Delta NDCG$-based weighting mechanism is adopted to better distinguish the importance of inter-chunk samples in optimization. Both online and offline experiments have validated the superiority of our framework. When deployed in Taobao display advertising system, it achieves an improvement of up to +12.3\% CTR and +5.6\% RPM.

Viaarxiv icon

Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach

May 22, 2023
Zhangming Chan, Yu Zhang, Shuguang Han, Yong Bai, Xiang-Rong Sheng, Siyuan Lou, Jiacen Hu, Baolin Liu, Yuning Jiang, Jian Xu, Bo Zheng

Figure 1 for Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach
Figure 2 for Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach
Figure 3 for Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach
Figure 4 for Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach

Conversion rate (CVR) prediction is one of the core components in online recommender systems, and various approaches have been proposed to obtain accurate and well-calibrated CVR estimation. However, we observe that a well-trained CVR prediction model often performs sub-optimally during sales promotions. This can be largely ascribed to the problem of the data distribution shift, in which the conventional methods no longer work. To this end, we seek to develop alternative modeling techniques for CVR prediction. Observing similar purchase patterns across different promotions, we propose reusing the historical promotion data to capture the promotional conversion patterns. Herein, we propose a novel \textbf{H}istorical \textbf{D}ata \textbf{R}euse (\textbf{HDR}) approach that first retrieves historically similar promotion data and then fine-tunes the CVR prediction model with the acquired data for better adaptation to the promotion mode. HDR consists of three components: an automated data retrieval module that seeks similar data from historical promotions, a distribution shift correction module that re-weights the retrieved data for better aligning with the target promotion, and a TransBlock module that quickly fine-tunes the original model for better adaptation to the promotion mode. Experiments conducted with real-world data demonstrate the effectiveness of HDR, as it improves both ranking and calibration metrics to a large extent. HDR has also been deployed on the display advertising system in Alibaba, bringing a lift of $9\%$ RPM and $16\%$ CVR during Double 11 Sales in 2022.

* Accepted at KDD 2023 (camera-ready version coming soon). This work has already been deployed on the display advertising system in Alibaba, bringing substantial economic gains 
Viaarxiv icon