Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yun Xiao

Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce

May 21, 2022

Xueying Zhang, Kai Shen, Chi Zhang, Xiaochuan Fan, Yun Xiao, Zhen He, Bo Long, Lingfei Wu

Figure 1 for Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce

Figure 2 for Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce

Figure 3 for Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce

Figure 4 for Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce

Abstract:In this paper, we proposed an automatic Scenario-based Multi-product Advertising Copywriting Generation system (SMPACG) for E-Commerce, which has been deployed on a leading Chinese e-commerce platform. The proposed SMPACG consists of two main components: 1) an automatic multi-product combination selection module, which itself is consisted of a topic prediction model, a pattern and attribute-based selection model and an arbitrator model; and 2) an automatic multi-product advertising copywriting generation module, which combines our proposed domain-specific pretrained language model and knowledge-based data enhancement model. The SMPACG is the first system that realizes automatic scenario-based multi-product advertising contents generation, which achieves significant improvements over other state-of-the-art methods. The SMPACG has been not only developed for directly serving for our e-commerce recommendation system, but also used as a real-time writing assistant tool for merchants.

Via

Access Paper or Ask Questions

SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Apr 12, 2022

Zhengyi Liu, Yacheng Tan, Qian He, Yun Xiao

Figure 1 for SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Figure 2 for SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Figure 3 for SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Figure 4 for SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Abstract:Convolutional neural networks (CNNs) are good at extracting contexture features within certain receptive fields, while transformers can model the global long-range dependency features. By absorbing the advantage of transformer and the merit of CNN, Swin Transformer shows strong feature representation ability. Based on it, we propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection. It is driven by Swin Transformer to extract the hierarchical features, boosted by attention mechanism to bridge the gap between two modalities, and guided by edge information to sharp the contour of salient object. To be specific, two-stream Swin Transformer encoder first extracts multi-modality features, and then spatial alignment and channel re-calibration module is presented to optimize intra-level cross-modality features. To clarify the fuzzy boundary, edge-guided decoder achieves inter-level cross-modality fusion under the guidance of edge features. The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets, showing that it provides more insight into the cross-modality complementarity task.

* IEEE Transactions on Circuits and Systems for Video Technology, 2021
* Online published in TCSVT

Via

Access Paper or Ask Questions

Beyond Fixation: Dynamic Window Visual Transformer

Apr 08, 2022

Pengzhen Ren, Changlin Li, Guangrun Wang, Yun Xiao, Qing Du, Xiaodan Liang, Xiaojun Chang

Figure 1 for Beyond Fixation: Dynamic Window Visual Transformer

Figure 2 for Beyond Fixation: Dynamic Window Visual Transformer

Figure 3 for Beyond Fixation: Dynamic Window Visual Transformer

Figure 4 for Beyond Fixation: Dynamic Window Visual Transformer

Abstract:Recently, a surge of interest in visual transformers is to reduce the computational cost by limiting the calculation of self-attention to a local window. Most current work uses a fixed single-scale window for modeling by default, ignoring the impact of window size on model performance. However, this may limit the modeling potential of these window-based models for multi-scale information. In this paper, we propose a novel method, named Dynamic Window Vision Transformer (DW-ViT). The dynamic window strategy proposed by DW-ViT goes beyond the model that employs a fixed single window setting. To the best of our knowledge, we are the first to use dynamic multi-scale windows to explore the upper limit of the effect of window settings on model performance. In DW-ViT, multi-scale information is obtained by assigning windows of different sizes to different head groups of window multi-head self-attention. Then, the information is dynamically fused by assigning different weights to the multi-scale window branches. We conducted a detailed performance evaluation on three datasets, ImageNet-1K, ADE20K, and COCO. Compared with related state-of-the-art (SoTA) methods, DW-ViT obtains the best performance. Specifically, compared with the current SoTA Swin Transformers \cite{liu2021swin}, DW-ViT has achieved consistent and substantial improvements on all three datasets with similar parameters and computational costs. In addition, DW-ViT exhibits good scalability and can be easily inserted into any window-based visual transformers.

* CVPR2022

Via

Access Paper or Ask Questions

Reducing Flipping Errors in Deep Neural Networks

Mar 16, 2022

Xiang Deng, Yun Xiao, Bo Long, Zhongfei Zhang

Figure 1 for Reducing Flipping Errors in Deep Neural Networks

Figure 2 for Reducing Flipping Errors in Deep Neural Networks

Figure 3 for Reducing Flipping Errors in Deep Neural Networks

Figure 4 for Reducing Flipping Errors in Deep Neural Networks

Abstract:Deep neural networks (DNNs) have been widely applied in various domains in artificial intelligence including computer vision and natural language processing. A DNN is typically trained for many epochs and then a validation dataset is used to select the DNN in an epoch (we simply call this epoch "the last epoch") as the final model for making predictions on unseen samples, while it usually cannot achieve a perfect accuracy on unseen samples. An interesting question is "how many test (unseen) samples that a DNN misclassifies in the last epoch were ever correctly classified by the DNN before the last epoch?". In this paper, we empirically study this question and find on several benchmark datasets that the vast majority of the misclassified samples in the last epoch were ever classified correctly before the last epoch, which means that the predictions for these samples were flipped from "correct" to "wrong". Motivated by this observation, we propose to restrict the behavior changes of a DNN on the correctly-classified samples so that the correct local boundaries can be maintained and the flipping error on unseen samples can be largely reduced. Extensive experiments on different benchmark datasets with different modern network architectures demonstrate that the proposed flipping error reduction (FER) approach can substantially improve the generalization, the robustness, and the transferability of DNNs without introducing any additional network parameters or inference cost, only with a negligible training overhead.

Via

Access Paper or Ask Questions

Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes

Mar 09, 2022

Yunjiang Jiang, Han Zhang, Yiming Qiu, Yun Xiao, Bo Long, Wen-Yun Yang

Figure 1 for Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes

Figure 2 for Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes

Figure 3 for Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes

Figure 4 for Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes

Abstract:Product quantization (PQ) coupled with a space rotation, is widely used in modern approximate nearest neighbor (ANN) search systems to significantly compress the disk storage for embeddings and speed up the inner product computation. Existing rotation learning methods, however, minimize quantization distortion for fixed embeddings, which are not applicable to an end-to-end training scenario where embeddings are updated constantly. In this paper, based on geometric intuitions from Lie group theory, in particular the special orthogonal group $SO(n)$, we propose a family of block Givens coordinate descent algorithms to learn rotation matrix that are provably convergent on any convex objectives. Compared to the state-of-the-art SVD method, the Givens algorithms are much more parallelizable, reducing runtime by orders of magnitude on modern GPUs, and converge more stably according to experimental studies. They further improve upon vanilla product quantization significantly in an end-to-end training scenario.

* The Tenth International Conference on Learning Representations (ICLR 2022)
* published in ICLR 2022

Via

Access Paper or Ask Questions

Sequential Search with Off-Policy Reinforcement Learning

Feb 01, 2022

Dadong Miao, Yanan Wang, Guoyu Tang, Lin Liu, Sulong Xu, Bo Long, Yun Xiao, Lingfei Wu, Yunjiang Jiang

Figure 1 for Sequential Search with Off-Policy Reinforcement Learning

Figure 2 for Sequential Search with Off-Policy Reinforcement Learning

Figure 3 for Sequential Search with Off-Policy Reinforcement Learning

Figure 4 for Sequential Search with Off-Policy Reinforcement Learning

Abstract:Recent years have seen a significant amount of interests in Sequential Recommendation (SR), which aims to understand and model the sequential user behaviors and the interactions between users and items over time. Surprisingly, despite the huge success Sequential Recommendation has achieved, there is little study on Sequential Search (SS), a twin learning task that takes into account a user's current and past search queries, in addition to behavior on historical query sessions. The SS learning task is even more important than the counterpart SR task for most of E-commence companies due to its much larger online serving demands as well as traffic volume. To this end, we propose a highly scalable hybrid learning model that consists of an RNN learning framework leveraging all features in short-term user-item interactions, and an attention model utilizing selected item-only features from long-term interactions. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly. Moreover, we explore the use of off-policy reinforcement learning in multi-session personalized search ranking. Specifically, we design a pairwise Deep Deterministic Policy Gradient model that efficiently captures users' long term reward in terms of pairwise classification error. Extensive ablation experiments demonstrate significant improvement each component brings to its state-of-the-art baseline, on a variety of offline and online metrics.

* 10 pages, 7 figures, CIKM 2021

Via

Access Paper or Ask Questions

Intelligent Online Selling Point Extraction for E-Commerce Recommendation

Dec 16, 2021

Xiaojie Guo, Shugen Wang, Hanqing Zhao, Shiliang Diao, Jiajia Chen, Zhuoye Ding, Zhen He, Yun Xiao, Bo Long, Han Yu(+1 more)

Figure 1 for Intelligent Online Selling Point Extraction for E-Commerce Recommendation

Figure 2 for Intelligent Online Selling Point Extraction for E-Commerce Recommendation

Figure 3 for Intelligent Online Selling Point Extraction for E-Commerce Recommendation

Figure 4 for Intelligent Online Selling Point Extraction for E-Commerce Recommendation

Abstract:In the past decade, automatic product description generation for e-commerce have witnessed significant advancement. As the services provided by e-commerce platforms become diverse, it is necessary to dynamically adapt the patterns of descriptions generated. The selling point of products is an important type of product description for which the length should be as short as possible while still conveying key information. In addition, this kind of product description should be eye-catching to the readers. Currently, product selling points are normally written by human experts. Thus, the creation and maintenance of these contents incur high costs. These costs can be significantly reduced if product selling points can be automatically generated by machines. In this paper, we report our experience developing and deploying the Intelligent Online Selling Point Extraction (IOSPE) system to serve the recommendation system in the JD.com e-commerce platform. Since July 2020, IOSPE has become a core service for 62 key categories of products (covering more than 4 million products). So far, it has generated more than 0.1 billion selling points, thereby significantly scaling up the selling point creation operation and saving human labour. These IOSPE generated selling points have increased the click-through rate (CTR) by 1.89\% and the average duration the customers spent on the products by more than 2.03\% compared to the previous practice, which are significant improvements for such a large-scale e-commerce platform.

* IAAI 2022 industry award

Via

Access Paper or Ask Questions

Automatic Product Copywriting for E-Commerce

Dec 15, 2021

Xueying Zhang, Yanyan Zou, Hainan Zhang, Jing Zhou, Shiliang Diao, Jiajia Chen, Zhuoye Ding, Zhen He, Xueqi He, Yun Xiao(+3 more)

Figure 1 for Automatic Product Copywriting for E-Commerce

Figure 2 for Automatic Product Copywriting for E-Commerce

Figure 3 for Automatic Product Copywriting for E-Commerce

Figure 4 for Automatic Product Copywriting for E-Commerce

Abstract:Product copywriting is a critical component of e-commerce recommendation platforms. It aims to attract users' interest and improve user experience by highlighting product characteristics with textual descriptions. In this paper, we report our experience deploying the proposed Automatic Product Copywriting Generation (APCG) system into the JD.com e-commerce product recommendation platform. It consists of two main components: 1) natural language generation, which is built from a transformer-pointer network and a pre-trained sequence-to-sequence model based on millions of training data from our in-house platform; and 2) copywriting quality control, which is based on both automatic evaluation and human screening. For selected domains, the models are trained and updated daily with the updated training data. In addition, the model is also used as a real-time writing assistant tool on our live broadcast platform. The APCG system has been deployed in JD.com since Feb 2021. By Sep 2021, it has generated 2.53 million product descriptions, and improved the overall averaged click-through rate (CTR) and the Conversion Rate (CVR) by 4.22% and 3.61%, compared to baselines, respectively on a year-on-year basis. The accumulated Gross Merchandise Volume (GMV) made by our system is improved by 213.42%, compared to the number in Feb 2021.

* Accepted by AAAI 2022/IAAI 2022 under the track of "Highly Innovative Applications of AI"

Via

Access Paper or Ask Questions

DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization

Dec 15, 2021

Xueying Zhang, Yunjiang Jiang, Yue Shang, Zhaomeng Cheng, Chi Zhang, Xiaochuan Fan, Yun Xiao, Bo Long

Figure 1 for DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization

Figure 2 for DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization

Figure 3 for DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization

Figure 4 for DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization

Abstract:We propose a novel domain-specific generative pre-training (DS-GPT) method for text generation and apply it to the product titleand review summarization problems on E-commerce mobile display.First, we adopt a decoder-only transformer architecture, which fitswell for fine-tuning tasks by combining input and output all to-gether. Second, we demonstrate utilizing only small amount of pre-training data in related domains is powerful. Pre-training a languagemodel from a general corpus such as Wikipedia or the CommonCrawl requires tremendous time and resource commitment, andcan be wasteful if the downstream tasks are limited in variety. OurDSGPT is pre-trained on a limited dataset, the Chinese short textsummarization dataset (LCSTS). Third, our model does not requireproduct-related human-labeled data. For title summarization task,the state of art explicitly uses additional background knowledgein training and predicting stages. In contrast, our model implic-itly captures this knowledge and achieves significant improvementover other methods, after fine-tuning on the public Taobao.comdataset. For review summarization task, we utilize JD.com in-housedataset, and observe similar improvement over standard machinetranslation methods which lack the flexibility of fine-tuning. Ourproposed work can be simply extended to other domains for a widerange of text generation tasks.

* SIGIR 2021: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2021, Pages 2146-2150

Via

Access Paper or Ask Questions

TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

Aug 09, 2021

Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, Bin Tang

Figure 1 for TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

Figure 2 for TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

Figure 3 for TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

Figure 4 for TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

Abstract:Salient object detection is the pixel-level dense prediction task which can highlight the prominent object in the scene. Recently U-Net framework is widely used, and continuous convolution and pooling operations generate multi-level features which are complementary with each other. In view of the more contribution of high-level features for the performance, we propose a triplet transformer embedding module to enhance them by learning long-range dependencies across layers. It is the first to use three transformer encoders with shared weights to enhance multi-level features. By further designing scale adjustment module to process the input, devising three-stream decoder to process the output and attaching depth features to color features for the multi-modal fusion, the proposed triplet transformer embedding network (TriTransNet) achieves the state-of-the-art performance in RGB-D salient object detection, and pushes the performance to a new level. Experimental results demonstrate the effectiveness of the proposed modules and the competition of TriTransNet.

Via

Access Paper or Ask Questions