Alert button
Picture for Jing Lu

Jing Lu

Alert button

Kuaishou

Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework

Jun 30, 2023
Yang Zhang, Yimeng Bai, Jianxin Chang, Xiaoxue Zang, Song Lu, Jing Lu, Fuli Feng, Yanan Niu, Yang Song

Figure 1 for Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework
Figure 2 for Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework
Figure 3 for Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework
Figure 4 for Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework

With the proliferation of short video applications, the significance of short video recommendations has vastly increased. Unlike other recommendation scenarios, short video recommendation systems heavily rely on feedback from watch time. Existing approaches simply treat watch time as a direct label, failing to effectively harness its extensive semantics and introduce bias, thereby limiting the potential for modeling user interests based on watch time. To overcome this challenge, we propose a framework named Debiasied Multiple-semantics-extracting Labeling (DML). DML constructs labels that encompass various semantics by utilizing quantiles derived from the distribution of watch time, prioritizing relative order rather than absolute label values. This approach facilitates easier model learning while aligning with the ranking objective of recommendations. Furthermore, we introduce a method inspired by causal adjustment to refine label definitions, thereby reducing the impact of bias on the label and directly mitigating bias at the label level. We substantiate the effectiveness of our DML framework through both online and offline experiments. Extensive results demonstrate that our DML could effectively leverage watch time to discover users' real interests, enhancing their engagement in our application.

* 7 pages, 4 figures 
Viaarxiv icon

Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model

Jun 01, 2023
Xiaohuai Le, Tong Lei, Li Chen, Yiqing Guo, Chao He, Cheng Chen, Xianjun Xia, Hua Gao, Yijian Xiao, Piao Ding, Shenyi Song, Jing Lu

Figure 1 for Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
Figure 2 for Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
Figure 3 for Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
Figure 4 for Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model

With fewer feature dimensions, filter banks are often used in light-weight full-band speech enhancement models. In order to further enhance the coarse speech in the sub-band domain, it is necessary to apply a post-filtering for harmonic retrieval. The signal processing-based comb filters used in RNNoise and PercepNet have limited performance and may cause speech quality degradation due to inaccurate fundamental frequency estimation. To tackle this problem, we propose a learnable comb filter to enhance harmonics. Based on the sub-band model, we design a DNN-based fundamental frequency estimator to estimate the discrete fundamental frequencies and a comb filter for harmonic enhancement, which are trained via an end-to-end pattern. The experiments show the advantages of our proposed method over PecepNet and DeepFilterNet.

* accepted by Interspeech 2023 
Viaarxiv icon

Personalized speech enhancement combining band-split RNN and speaker attentive module

Feb 20, 2023
Xiaohuai Le, Zhongshu Hou, Li Chen, Chao He, Yiqing Guo, Cheng Chen, Xianjun Xia, Jing Lu

Figure 1 for Personalized speech enhancement combining band-split RNN and speaker attentive module

Target speaker information can be utilized in speech enhancement (SE) models to more effectively extract the desired speech. Previous works introduce the speaker embedding into speech enhancement models by means of concatenation or affine transformation. In this paper, we propose a speaker attentive module to calculate the attention scores between the speaker embedding and the intermediate features, which are used to rescale the features. By merging this module in the state-of-the-art SE model, we construct the personalized SE model for ICASSP Signal Processing Grand Challenge: DNS Challenge 5 (2023). Our system achieves a final score of 0.529 on the blind test set of track1 and 0.549 on track2.

Viaarxiv icon

Local spectral attention for full-band speech enhancement

Feb 11, 2023
Zhongshu Hou, Qinwen Hu, Kai Chen, Jing Lu

Figure 1 for Local spectral attention for full-band speech enhancement
Figure 2 for Local spectral attention for full-band speech enhancement
Figure 3 for Local spectral attention for full-band speech enhancement
Figure 4 for Local spectral attention for full-band speech enhancement

Attention mechanism has been widely utilized in speech enhancement (SE) because theoretically it can effectively model the inherent connection of signal both in time domain and spectrum domain. Usually, the span of attention is limited in time domain while the attention in frequency domain spans the whole frequency range. In this paper, we notice that the attention over the whole frequency range hampers the inference for full-band SE and possibly leads to excessive residual noise. To alleviate this problem, we introduce local spectral attention (LSA) into full-band SE model by limiting the span of attention. The ablation test on the state-of-the-art (SOTA) full-band SE model reveals that the local frequency attention can effectively improve overall performance. The improved model achieves the best objective score on the full-band VoiceBank+DEMAND set.

Viaarxiv icon

Attention does not guarantee best performance in speech enhancement

Feb 11, 2023
Zhongshu Hou, Qinwen Hu, Kai Chen, Jing Lu

Figure 1 for Attention does not guarantee best performance in speech enhancement

Attention mechanism has been widely utilized in speech enhancement (SE) because theoretically it can effectively model the long-term inherent connection of signal both in time domain and spectrum domain. However, the generally used global attention mechanism might not be the best choice since the adjacent information naturally imposes more influence than the far-apart information in speech enhancement. In this paper, we validate this conjecture by replacing attention with RNN in two typical state-of-the-art (SOTA) models, multi-scale temporal frequency convolutional network (MTFAA) with axial attention and conformer-based metric-GAN network (CMGAN).

Viaarxiv icon

TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou

Feb 05, 2023
Jianxin Chang, Chenbin Zhang, Zhiyi Fu, Xiaoxue Zang, Lin Guan, Jing Lu, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, Kun Gai

Figure 1 for TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou
Figure 2 for TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou
Figure 3 for TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou
Figure 4 for TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou

Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective Target Attention (TA) over the small number of finalists from GSU. Although efficient, existing algorithms mostly suffer from a crucial limitation: the \textit{inconsistent} target-behavior relevance metrics between GSU and ESU. As a result, their GSU usually misses highly relevant behaviors but retrieves ones considered irrelevant by ESU. In such case, the TA in ESU, no matter how attention is allocated, mostly deviates from the real user interests and thus degrades the overall CTR prediction accuracy. To address such inconsistency, we propose \textbf{TWo-stage Interest Network (TWIN)}, where our Consistency-Preserved GSU (CP-GSU) adopts the identical target-behavior relevance metric as the TA in ESU, making the two stages twins. Specifically, to break TA's computational bottleneck and extend it from ESU to GSU, or namely from behavior length $10^2$ to length $10^4-10^5$, we build a novel attention mechanism by behavior feature splitting. For the video inherent features of a behavior, we calculate their linear projection by efficient pre-computing \& caching strategies. And for the user-item cross features, we compress each into a one-dimentional bias term in the attention score calculation to save the computational cost. The consistency between two stages, together with the effective TA-based relevance metric in CP-GSU, contributes to significant performance gain in CTR prediction.

Viaarxiv icon

Distributed Active Noise Control System Based on a Block Diffusion FxLMS Algorithm with Bidirectional Communication

Dec 28, 2022
Tianyou Li, Hongji Duan, Sipei Zhao, Jing Lu, Ian S. Burnett

Figure 1 for Distributed Active Noise Control System Based on a Block Diffusion FxLMS Algorithm with Bidirectional Communication
Figure 2 for Distributed Active Noise Control System Based on a Block Diffusion FxLMS Algorithm with Bidirectional Communication
Figure 3 for Distributed Active Noise Control System Based on a Block Diffusion FxLMS Algorithm with Bidirectional Communication
Figure 4 for Distributed Active Noise Control System Based on a Block Diffusion FxLMS Algorithm with Bidirectional Communication

Recently, distributed active noise control systems based on diffusion adaptation have attracted significant research interest due to their balance between computational complexity and stability compared to conventional centralized and decentralized adaptation schemes. However, the existing diffusion FxLMS algorithm employs node-specific adaptation and neighborhood-wide combination, and assumes that the control filters of neighbor nodes are similar to each other. This assumption is not true in practical applications, and it leads to inferior performance to the centralized controller approach. In contrast, this paper proposes a Block Diffusion FxLMS algorithm with bidirectional communication, which uses neighborhood-wide adaptation and node-specific combination to update the control filters. Simulation results validate that the proposed algorithm converges to the solution of the centralized controller with reduced computational burden.

Viaarxiv icon

Learning List-Level Domain-Invariant Representations for Ranking

Dec 21, 2022
Ruicheng Xian, Honglei Zhuang, Zhen Qin, Hamed Zamani, Jing Lu, Ji Ma, Kai Hui, Han Zhao, Xuanhui Wang, Michael Bendersky

Figure 1 for Learning List-Level Domain-Invariant Representations for Ranking
Figure 2 for Learning List-Level Domain-Invariant Representations for Ranking
Figure 3 for Learning List-Level Domain-Invariant Representations for Ranking
Figure 4 for Learning List-Level Domain-Invariant Representations for Ranking

Domain adaptation aims to transfer the knowledge acquired by models trained on (data-rich) source domains to (low-resource) target domains, for which a popular method is invariant representation learning. While they have been studied extensively for classification and regression problems, how they apply to ranking problems, where the data and metrics have a list structure, is not well understood. Theoretically, we establish a domain adaptation generalization bound for ranking under listwise metrics such as MRR and NDCG. The bound suggests an adaptation method via learning list-level domain-invariant feature representations, whose benefits are empirically demonstrated by unsupervised domain adaptation experiments on real-world ranking tasks, including passage reranking. A key message is that for domain adaptation, the representations should be analyzed at the same level at which the metric is computed, as we show that learning invariant representations at the list level is most effective for adaptation on ranking problems.

Viaarxiv icon

HYRR: Hybrid Infused Reranking for Passage Retrieval

Dec 20, 2022
Jing Lu, Keith Hall, Ji Ma, Jianmo Ni

Figure 1 for HYRR: Hybrid Infused Reranking for Passage Retrieval
Figure 2 for HYRR: Hybrid Infused Reranking for Passage Retrieval
Figure 3 for HYRR: Hybrid Infused Reranking for Passage Retrieval
Figure 4 for HYRR: Hybrid Infused Reranking for Passage Retrieval

We present Hybrid Infused Reranking for Passages Retrieval (HYRR), a framework for training rerankers based on a hybrid of BM25 and neural retrieval models. Retrievers based on hybrid models have been shown to outperform both BM25 and neural models alone. Our approach exploits this improved performance when training a reranker, leading to a robust reranking model. The reranker, a cross-attention neural model, is shown to be robust to different first-stage retrieval systems, achieving better performance than rerankers simply trained upon the first-stage retrievers in the multi-stage systems. We present evaluations on a supervised passage retrieval task using MS MARCO and zero-shot retrieval tasks using BEIR. The empirical results show strong performance on both evaluations.

Viaarxiv icon

Distilling Object Detectors With Global Knowledge

Oct 17, 2022
Sanli Tang, Zhongyu Zhang, Zhanzhan Cheng, Jing Lu, Yunlu Xu, Yi Niu, Fan He

Figure 1 for Distilling Object Detectors With Global Knowledge
Figure 2 for Distilling Object Detectors With Global Knowledge
Figure 3 for Distilling Object Detectors With Global Knowledge
Figure 4 for Distilling Object Detectors With Global Knowledge

Knowledge distillation learns a lightweight student model that mimics a cumbersome teacher. Existing methods regard the knowledge as the feature of each instance or their relations, which is the instance-level knowledge only from the teacher model, i.e., the local knowledge. However, the empirical studies show that the local knowledge is much noisy in object detection tasks, especially on the blurred, occluded, or small instances. Thus, a more intrinsic approach is to measure the representations of instances w.r.t. a group of common basis vectors in the two feature spaces of the teacher and the student detectors, i.e., global knowledge. Then, the distilling algorithm can be applied as space alignment. To this end, a novel prototype generation module (PGM) is proposed to find the common basis vectors, dubbed prototypes, in the two feature spaces. Then, a robust distilling module (RDM) is applied to construct the global knowledge based on the prototypes and filtrate noisy global and local knowledge by measuring the discrepancy of the representations in two feature spaces. Experiments with Faster-RCNN and RetinaNet on PASCAL and COCO datasets show that our method achieves the best performance for distilling object detectors with various backbones, which even surpasses the performance of the teacher model. We also show that the existing methods can be easily combined with global knowledge and obtain further improvement. Code is available: https://github.com/hikvision-research/DAVAR-Lab-ML.

* Accepted by ECCV2022 
Viaarxiv icon