Alert button
Picture for Runze Su

Runze Su

Alert button

Short Video-based Advertisements Evaluation System: Self-Organizing Learning Approach

Oct 23, 2020
Yunjie Zhang, Fei Tao, Xudong Liu, Runze Su, Xiaorong Mei, Weicong Ding, Zhichen Zhao, Lei Yuan, Ji Liu

Figure 1 for Short Video-based Advertisements Evaluation System: Self-Organizing Learning Approach
Figure 2 for Short Video-based Advertisements Evaluation System: Self-Organizing Learning Approach
Figure 3 for Short Video-based Advertisements Evaluation System: Self-Organizing Learning Approach
Figure 4 for Short Video-based Advertisements Evaluation System: Self-Organizing Learning Approach

With the rising of short video apps, such as TikTok, Snapchat and Kwai, advertisement in short-term user-generated videos (UGVs) has become a trending form of advertising. Prediction of user behavior without specific user profile is required by advertisers, as they expect to acquire advertisement performance in advance in the scenario of cold start. Current recommender system do not take raw videos as input; additionally, most previous work of Multi-Modal Machine Learning may not deal with unconstrained videos like UGVs. In this paper, we proposed a novel end-to-end self-organizing framework for user behavior prediction. Our model is able to learn the optimal topology of neural network architecture, as well as optimal weights, through training data. We evaluate our proposed method on our in-house dataset. The experimental results reveal that our model achieves the best performance in all our experiments.

* Submitting to ICASSP 2021 
Viaarxiv icon

Ensemble Chinese End-to-End Spoken Language Understanding for Abnormal Event Detection from audio stream

Oct 19, 2020
Haoran Wei, Fei Tao, Runze Su, Sen Yang, Ji Liu

Figure 1 for Ensemble Chinese End-to-End Spoken Language Understanding for Abnormal Event Detection from audio stream
Figure 2 for Ensemble Chinese End-to-End Spoken Language Understanding for Abnormal Event Detection from audio stream
Figure 3 for Ensemble Chinese End-to-End Spoken Language Understanding for Abnormal Event Detection from audio stream
Figure 4 for Ensemble Chinese End-to-End Spoken Language Understanding for Abnormal Event Detection from audio stream

Conventional spoken language understanding (SLU) consist of two stages, the first stage maps speech to text by automatic speech recognition (ASR), and the second stage maps text to intent by natural language understanding (NLU). End-to-end SLU maps speech directly to intent through a single deep learning model. Previous end-to-end SLU models are primarily used for English environment due to lacking large scale SLU dataset in Chines, and use only one ASR model to extract features from speech. With the help of Kuaishou technology, a large scale SLU dataset in Chinese is collected to detect abnormal event in their live audio stream. Based on this dataset, this paper proposed a ensemble end-to-end SLU model used for Chinese environment. This ensemble SLU models extracted hierarchies features using multiple pre-trained ASR models, leading to better representation of phoneme level and word level information. This proposed approached achieve 9.7% increase of accuracy compared to previous end-to-end SLU model.

* Submitting to ICASSP 2021 
Viaarxiv icon

Themes Inferred Audio-visual Correspondence Learning

Sep 14, 2020
Runze Su, Fei Tao, Xudong Liu, Haoran Wei, Xiaorong Mei, Zhiyao Duan, Lei Yuan, Ji Liu, Yuying Xie

Figure 1 for Themes Inferred Audio-visual Correspondence Learning
Figure 2 for Themes Inferred Audio-visual Correspondence Learning
Figure 3 for Themes Inferred Audio-visual Correspondence Learning
Figure 4 for Themes Inferred Audio-visual Correspondence Learning

The applications of short-termuser generated video(UGV),such as snapchat, youtube short-term videos, booms recently,raising lots of multimodal machine learning tasks. Amongthem, learning the correspondence between audio and vi-sual information from videos is a challenging one. Mostprevious work of theaudio-visual correspondence(AVC)learning only investigated on constrained videos or simplesettings, which may not fit the application of UGV. In thispaper, we proposed new principles for AVC and introduced anew framework to set sight on the themes of videos to facili-tate AVC learning. We also released the KWAI-AD-AudViscorpus which contained 85432 short advertisement videos(around 913 hours) made by users. We evaluated our pro-posed approach on this corpus and it was able to outperformthe baseline by 23.15% absolute differenc

* Submitting to ICASSP 2020 
Viaarxiv icon