Alert button
Picture for Yun Xiao

Yun Xiao

Alert button

Automatic Generation of Product-Image Sequence in E-commerce

Jun 26, 2022
Xiaochuan Fan, Chi Zhang, Yong Yang, Yue Shang, Xueying Zhang, Zhen He, Yun Xiao, Bo Long, Lingfei Wu

Figure 1 for Automatic Generation of Product-Image Sequence in E-commerce
Figure 2 for Automatic Generation of Product-Image Sequence in E-commerce
Figure 3 for Automatic Generation of Product-Image Sequence in E-commerce
Figure 4 for Automatic Generation of Product-Image Sequence in E-commerce

Product images are essential for providing desirable user experience in an e-commerce platform. For a platform with billions of products, it is extremely time-costly and labor-expensive to manually pick and organize qualified images. Furthermore, there are the numerous and complicated image rules that a product image needs to comply in order to be generated/selected. To address these challenges, in this paper, we present a new learning framework in order to achieve Automatic Generation of Product-Image Sequence (AGPIS) in e-commerce. To this end, we propose a Multi-modality Unified Image-sequence Classifier (MUIsC), which is able to simultaneously detect all categories of rule violations through learning. MUIsC leverages textual review feedback as the additional training target and utilizes product textual description to provide extra semantic information. Based on offline evaluations, we show that the proposed MUIsC significantly outperforms various baselines. Besides MUIsC, we also integrate some other important modules in the proposed framework, such as primary image selection, noncompliant content detection, and image deduplication. With all these modules, our framework works effectively and efficiently in JD.com recommendation platform. By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.

* Accepted by KDD 2022 ADS 
Viaarxiv icon

Automatic Controllable Product Copywriting for E-Commerce

Jun 21, 2022
Xiaojie Guo, Qingkai Zeng, Meng Jiang, Yun Xiao, Bo Long, Lingfei Wu

Figure 1 for Automatic Controllable Product Copywriting for E-Commerce
Figure 2 for Automatic Controllable Product Copywriting for E-Commerce
Figure 3 for Automatic Controllable Product Copywriting for E-Commerce
Figure 4 for Automatic Controllable Product Copywriting for E-Commerce

Automatic product description generation for e-commerce has witnessed significant advancement in the past decade. Product copywriting aims to attract users' interest and improve user experience by highlighting product characteristics with textual descriptions. As the services provided by e-commerce platforms become diverse, it is necessary to adapt the patterns of automatically-generated descriptions dynamically. In this paper, we report our experience in deploying an E-commerce Prefix-based Controllable Copywriting Generation (EPCCG) system into the JD.com e-commerce product recommendation platform. The development of the system contains two main components: 1) copywriting aspect extraction; 2) weakly supervised aspect labeling; 3) text generation with a prefix-based language model; 4) copywriting quality control. We conduct experiments to validate the effectiveness of the proposed EPCCG. In addition, we introduce the deployed architecture which cooperates with the EPCCG into the real-time JD.com e-commerce recommendation platform and the significant payoff since deployment.

* This paper has been accepted by KDD 2022 ADS 
Viaarxiv icon

TeKo: Text-Rich Graph Neural Networks with External Knowledge

Jun 15, 2022
Zhizhi Yu, Di Jin, Jianguo Wei, Ziyang Liu, Yue Shang, Yun Xiao, Jiawei Han, Lingfei Wu

Figure 1 for TeKo: Text-Rich Graph Neural Networks with External Knowledge
Figure 2 for TeKo: Text-Rich Graph Neural Networks with External Knowledge
Figure 3 for TeKo: Text-Rich Graph Neural Networks with External Knowledge
Figure 4 for TeKo: Text-Rich Graph Neural Networks with External Knowledge

Graph Neural Networks (GNNs) have gained great popularity in tackling various analytical tasks on graph-structured data (i.e., networks). Typical GNNs and their variants follow a message-passing manner that obtains network representations by the feature propagation process along network topology, which however ignore the rich textual semantics (e.g., local word-sequence) that exist in many real-world networks. Existing methods for text-rich networks integrate textual semantics by mainly utilizing internal information such as topics or phrases/words, which often suffer from an inability to comprehensively mine the text semantics, limiting the reciprocal guidance between network structure and text semantics. To address these problems, we propose a novel text-rich graph neural network with external knowledge (TeKo), in order to take full advantage of both structural and textual information within text-rich networks. Specifically, we first present a flexible heterogeneous semantic network that incorporates high-quality entities and interactions among documents and entities. We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description, to gain a deeper insight into textual semantics. We further design a reciprocal convolutional mechanism for the constructed heterogeneous semantic network, enabling network structure and textual semantics to collaboratively enhance each other and learn high-level network representations. Extensive experimental results on four public text-rich networks as well as a large-scale e-commerce searching dataset illustrate the superior performance of TeKo over state-of-the-art baselines.

Viaarxiv icon

Meta Policy Learning for Cold-Start Conversational Recommendation

May 24, 2022
Zhendong Chu, Hongning Wang, Yun Xiao, Bo Long, Lingfei Wu

Figure 1 for Meta Policy Learning for Cold-Start Conversational Recommendation
Figure 2 for Meta Policy Learning for Cold-Start Conversational Recommendation
Figure 3 for Meta Policy Learning for Cold-Start Conversational Recommendation
Figure 4 for Meta Policy Learning for Cold-Start Conversational Recommendation

Conversational recommender systems (CRS) explicitly solicit users' preferences for improved recommendations on the fly. Most existing CRS solutions employ reinforcement learning methods to train a single policy for a population of users. However, for users new to the system, such a global policy becomes ineffective to produce conversational recommendations, i.e., the cold-start challenge. In this paper, we study CRS policy learning for cold-start users via meta reinforcement learning. We propose to learn a meta policy and adapt it to new users with only a few trials of conversational recommendations. To facilitate policy adaptation, we design three synergetic components. First is a meta-exploration policy dedicated to identify user preferences via exploratory conversations. Second is a Transformer-based state encoder to model a user's both positive and negative feedback during the conversation. And third is an adaptive item recommender based on the embedded states. Extensive experiments on three datasets demonstrate the advantage of our solution in serving new users, compared with a rich set of state-of-the-art CRS solutions.

* 22 pages 
Viaarxiv icon

Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce

May 21, 2022
Xueying Zhang, Kai Shen, Chi Zhang, Xiaochuan Fan, Yun Xiao, Zhen He, Bo Long, Lingfei Wu

Figure 1 for Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce
Figure 2 for Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce
Figure 3 for Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce
Figure 4 for Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce

In this paper, we proposed an automatic Scenario-based Multi-product Advertising Copywriting Generation system (SMPACG) for E-Commerce, which has been deployed on a leading Chinese e-commerce platform. The proposed SMPACG consists of two main components: 1) an automatic multi-product combination selection module, which itself is consisted of a topic prediction model, a pattern and attribute-based selection model and an arbitrator model; and 2) an automatic multi-product advertising copywriting generation module, which combines our proposed domain-specific pretrained language model and knowledge-based data enhancement model. The SMPACG is the first system that realizes automatic scenario-based multi-product advertising contents generation, which achieves significant improvements over other state-of-the-art methods. The SMPACG has been not only developed for directly serving for our e-commerce recommendation system, but also used as a real-time writing assistant tool for merchants.

Viaarxiv icon

SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Apr 12, 2022
Zhengyi Liu, Yacheng Tan, Qian He, Yun Xiao

Figure 1 for SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection
Figure 2 for SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection
Figure 3 for SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection
Figure 4 for SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Convolutional neural networks (CNNs) are good at extracting contexture features within certain receptive fields, while transformers can model the global long-range dependency features. By absorbing the advantage of transformer and the merit of CNN, Swin Transformer shows strong feature representation ability. Based on it, we propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection. It is driven by Swin Transformer to extract the hierarchical features, boosted by attention mechanism to bridge the gap between two modalities, and guided by edge information to sharp the contour of salient object. To be specific, two-stream Swin Transformer encoder first extracts multi-modality features, and then spatial alignment and channel re-calibration module is presented to optimize intra-level cross-modality features. To clarify the fuzzy boundary, edge-guided decoder achieves inter-level cross-modality fusion under the guidance of edge features. The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets, showing that it provides more insight into the cross-modality complementarity task.

* IEEE Transactions on Circuits and Systems for Video Technology, 2021  
* Online published in TCSVT 
Viaarxiv icon

Beyond Fixation: Dynamic Window Visual Transformer

Apr 08, 2022
Pengzhen Ren, Changlin Li, Guangrun Wang, Yun Xiao, Qing Du, Xiaodan Liang, Xiaojun Chang

Figure 1 for Beyond Fixation: Dynamic Window Visual Transformer
Figure 2 for Beyond Fixation: Dynamic Window Visual Transformer
Figure 3 for Beyond Fixation: Dynamic Window Visual Transformer
Figure 4 for Beyond Fixation: Dynamic Window Visual Transformer

Recently, a surge of interest in visual transformers is to reduce the computational cost by limiting the calculation of self-attention to a local window. Most current work uses a fixed single-scale window for modeling by default, ignoring the impact of window size on model performance. However, this may limit the modeling potential of these window-based models for multi-scale information. In this paper, we propose a novel method, named Dynamic Window Vision Transformer (DW-ViT). The dynamic window strategy proposed by DW-ViT goes beyond the model that employs a fixed single window setting. To the best of our knowledge, we are the first to use dynamic multi-scale windows to explore the upper limit of the effect of window settings on model performance. In DW-ViT, multi-scale information is obtained by assigning windows of different sizes to different head groups of window multi-head self-attention. Then, the information is dynamically fused by assigning different weights to the multi-scale window branches. We conducted a detailed performance evaluation on three datasets, ImageNet-1K, ADE20K, and COCO. Compared with related state-of-the-art (SoTA) methods, DW-ViT obtains the best performance. Specifically, compared with the current SoTA Swin Transformers \cite{liu2021swin}, DW-ViT has achieved consistent and substantial improvements on all three datasets with similar parameters and computational costs. In addition, DW-ViT exhibits good scalability and can be easily inserted into any window-based visual transformers.

* CVPR2022  
Viaarxiv icon

Reducing Flipping Errors in Deep Neural Networks

Mar 16, 2022
Xiang Deng, Yun Xiao, Bo Long, Zhongfei Zhang

Figure 1 for Reducing Flipping Errors in Deep Neural Networks
Figure 2 for Reducing Flipping Errors in Deep Neural Networks
Figure 3 for Reducing Flipping Errors in Deep Neural Networks
Figure 4 for Reducing Flipping Errors in Deep Neural Networks

Deep neural networks (DNNs) have been widely applied in various domains in artificial intelligence including computer vision and natural language processing. A DNN is typically trained for many epochs and then a validation dataset is used to select the DNN in an epoch (we simply call this epoch "the last epoch") as the final model for making predictions on unseen samples, while it usually cannot achieve a perfect accuracy on unseen samples. An interesting question is "how many test (unseen) samples that a DNN misclassifies in the last epoch were ever correctly classified by the DNN before the last epoch?". In this paper, we empirically study this question and find on several benchmark datasets that the vast majority of the misclassified samples in the last epoch were ever classified correctly before the last epoch, which means that the predictions for these samples were flipped from "correct" to "wrong". Motivated by this observation, we propose to restrict the behavior changes of a DNN on the correctly-classified samples so that the correct local boundaries can be maintained and the flipping error on unseen samples can be largely reduced. Extensive experiments on different benchmark datasets with different modern network architectures demonstrate that the proposed flipping error reduction (FER) approach can substantially improve the generalization, the robustness, and the transferability of DNNs without introducing any additional network parameters or inference cost, only with a negligible training overhead.

Viaarxiv icon

Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes

Mar 09, 2022
Yunjiang Jiang, Han Zhang, Yiming Qiu, Yun Xiao, Bo Long, Wen-Yun Yang

Figure 1 for Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes
Figure 2 for Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes
Figure 3 for Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes
Figure 4 for Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes

Product quantization (PQ) coupled with a space rotation, is widely used in modern approximate nearest neighbor (ANN) search systems to significantly compress the disk storage for embeddings and speed up the inner product computation. Existing rotation learning methods, however, minimize quantization distortion for fixed embeddings, which are not applicable to an end-to-end training scenario where embeddings are updated constantly. In this paper, based on geometric intuitions from Lie group theory, in particular the special orthogonal group $SO(n)$, we propose a family of block Givens coordinate descent algorithms to learn rotation matrix that are provably convergent on any convex objectives. Compared to the state-of-the-art SVD method, the Givens algorithms are much more parallelizable, reducing runtime by orders of magnitude on modern GPUs, and converge more stably according to experimental studies. They further improve upon vanilla product quantization significantly in an end-to-end training scenario.

* The Tenth International Conference on Learning Representations (ICLR 2022)  
* published in ICLR 2022 
Viaarxiv icon

Sequential Search with Off-Policy Reinforcement Learning

Feb 01, 2022
Dadong Miao, Yanan Wang, Guoyu Tang, Lin Liu, Sulong Xu, Bo Long, Yun Xiao, Lingfei Wu, Yunjiang Jiang

Figure 1 for Sequential Search with Off-Policy Reinforcement Learning
Figure 2 for Sequential Search with Off-Policy Reinforcement Learning
Figure 3 for Sequential Search with Off-Policy Reinforcement Learning
Figure 4 for Sequential Search with Off-Policy Reinforcement Learning

Recent years have seen a significant amount of interests in Sequential Recommendation (SR), which aims to understand and model the sequential user behaviors and the interactions between users and items over time. Surprisingly, despite the huge success Sequential Recommendation has achieved, there is little study on Sequential Search (SS), a twin learning task that takes into account a user's current and past search queries, in addition to behavior on historical query sessions. The SS learning task is even more important than the counterpart SR task for most of E-commence companies due to its much larger online serving demands as well as traffic volume. To this end, we propose a highly scalable hybrid learning model that consists of an RNN learning framework leveraging all features in short-term user-item interactions, and an attention model utilizing selected item-only features from long-term interactions. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly. Moreover, we explore the use of off-policy reinforcement learning in multi-session personalized search ranking. Specifically, we design a pairwise Deep Deterministic Policy Gradient model that efficiently captures users' long term reward in terms of pairwise classification error. Extensive ablation experiments demonstrate significant improvement each component brings to its state-of-the-art baseline, on a variety of offline and online metrics.

* 10 pages, 7 figures, CIKM 2021 
Viaarxiv icon