Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jing Lu

Kuaishou

The feasibility of sound zone control using an array of parametric array loudspeakers

Jul 14, 2024

Tao Zhuang, Jia-Xin Zhong, Jing Lu

Figure 1 for The feasibility of sound zone control using an array of parametric array loudspeakers

Figure 2 for The feasibility of sound zone control using an array of parametric array loudspeakers

Figure 3 for The feasibility of sound zone control using an array of parametric array loudspeakers

Figure 4 for The feasibility of sound zone control using an array of parametric array loudspeakers

Abstract:Parametric array loudspeakers (PALs) are known for producing highly directional audio beams, a feat more challenging to achieve with conventional electro-dynamic loudspeakers (EDLs). Due to their intrinsic physical mechanisms, PALs hold promising potential for spatial audio applications such as virtual reality (VR). However, the feasibility of using an array of PALs for sound zone control (SZC) has remained unexplored, mainly due to the complexity of the nonlinear demodulation process inherent in PALs. Leveraging recent advancements in PAL modeling, this work proposes an optimization algorithm to achieve the acoustic contrast control (ACC) between two target areas using a PAL array. The performance and robustness of the proposed ACC-based SZC using PAL arrays are investigated through simulations, and the results are compared with those obtained using EDL arrays. The results show that the PAL array outperforms the EDL array in SZC performance and robustness at higher frequencies and lower signal-to-noise ratio, while being comparable under other conditions. This work paves the way for high-contrast acoustic control using PAL arrays.

Via

Access Paper or Ask Questions

SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

Jun 24, 2024

Zhongshu Hou, Qinwen Hu, Zhanzhong Cao, Ming Tang, Jing Lu

Figure 1 for SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

Figure 2 for SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

Figure 3 for SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

Abstract:Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from the intermediate output, which has the benefit of retaining more speech components than the coarse estimate while possessing a significant higher SNR than the input noisy speech. An effective harmonic compensation mechanism is introduced for better harmonic recovery. Extensive ex-periments demonstrate the advantage of our proposed model. A multi-modal speech extraction system based on the proposed backbone model ranks first in the ICASSP 2024 MISP Challenge: https://mispchallenge.github.io/mispchallenge2023/index.html.

Via

Access Paper or Ask Questions

LabelCraft: Empowering Short Video Recommendations with Automated Label Crafting

Dec 18, 2023

Yimeng Bai, Yang Zhang, Jing Lu, Jianxin Chang, Xiaoxue Zang, Yanan Niu, Yang Song, Fuli Feng

Abstract:Short video recommendations often face limitations due to the quality of user feedback, which may not accurately depict user interests. To tackle this challenge, a new task has emerged: generating more dependable labels from original feedback. Existing label generation methods rely on manual rules, demanding substantial human effort and potentially misaligning with the desired objectives of the platform. To transcend these constraints, we introduce LabelCraft, a novel automated label generation method explicitly optimizing pivotal operational metrics for platform success. By formulating label generation as a higher-level optimization problem above recommender model optimization, LabelCraft introduces a trainable labeling model for automatic label mechanism modeling. Through meta-learning techniques, LabelCraft effectively addresses the bi-level optimization hurdle posed by the recommender and labeling models, enabling the automatic acquisition of intricate label generation mechanisms.Extensive experiments on real-world datasets corroborate LabelCraft's excellence across varied operational metrics, encompassing usage time, user engagement, and retention. Codes are available at https://github.com/baiyimeng/LabelCraft.

* Accepted by WSDM'24

Via

Access Paper or Ask Questions

Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework

Jun 30, 2023

Yang Zhang, Yimeng Bai, Jianxin Chang, Xiaoxue Zang, Song Lu, Jing Lu, Fuli Feng, Yanan Niu, Yang Song

Figure 1 for Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework

Figure 2 for Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework

Figure 3 for Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework

Figure 4 for Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework

Abstract:With the proliferation of short video applications, the significance of short video recommendations has vastly increased. Unlike other recommendation scenarios, short video recommendation systems heavily rely on feedback from watch time. Existing approaches simply treat watch time as a direct label, failing to effectively harness its extensive semantics and introduce bias, thereby limiting the potential for modeling user interests based on watch time. To overcome this challenge, we propose a framework named Debiasied Multiple-semantics-extracting Labeling (DML). DML constructs labels that encompass various semantics by utilizing quantiles derived from the distribution of watch time, prioritizing relative order rather than absolute label values. This approach facilitates easier model learning while aligning with the ranking objective of recommendations. Furthermore, we introduce a method inspired by causal adjustment to refine label definitions, thereby reducing the impact of bias on the label and directly mitigating bias at the label level. We substantiate the effectiveness of our DML framework through both online and offline experiments. Extensive results demonstrate that our DML could effectively leverage watch time to discover users' real interests, enhancing their engagement in our application.

* 7 pages, 4 figures

Via

Access Paper or Ask Questions

Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model

Jun 01, 2023

Xiaohuai Le, Tong Lei, Li Chen, Yiqing Guo, Chao He, Cheng Chen, Xianjun Xia, Hua Gao, Yijian Xiao, Piao Ding(+2 more)

Figure 1 for Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model

Figure 2 for Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model

Figure 3 for Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model

Figure 4 for Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model

Abstract:With fewer feature dimensions, filter banks are often used in light-weight full-band speech enhancement models. In order to further enhance the coarse speech in the sub-band domain, it is necessary to apply a post-filtering for harmonic retrieval. The signal processing-based comb filters used in RNNoise and PercepNet have limited performance and may cause speech quality degradation due to inaccurate fundamental frequency estimation. To tackle this problem, we propose a learnable comb filter to enhance harmonics. Based on the sub-band model, we design a DNN-based fundamental frequency estimator to estimate the discrete fundamental frequencies and a comb filter for harmonic enhancement, which are trained via an end-to-end pattern. The experiments show the advantages of our proposed method over PecepNet and DeepFilterNet.

* accepted by Interspeech 2023

Via

Access Paper or Ask Questions

Personalized speech enhancement combining band-split RNN and speaker attentive module

Feb 20, 2023

Xiaohuai Le, Zhongshu Hou, Li Chen, Chao He, Yiqing Guo, Cheng Chen, Xianjun Xia, Jing Lu

Figure 1 for Personalized speech enhancement combining band-split RNN and speaker attentive module

Abstract:Target speaker information can be utilized in speech enhancement (SE) models to more effectively extract the desired speech. Previous works introduce the speaker embedding into speech enhancement models by means of concatenation or affine transformation. In this paper, we propose a speaker attentive module to calculate the attention scores between the speaker embedding and the intermediate features, which are used to rescale the features. By merging this module in the state-of-the-art SE model, we construct the personalized SE model for ICASSP Signal Processing Grand Challenge: DNS Challenge 5 (2023). Our system achieves a final score of 0.529 on the blind test set of track1 and 0.549 on track2.

Via

Access Paper or Ask Questions

Attention does not guarantee best performance in speech enhancement

Feb 11, 2023

Zhongshu Hou, Qinwen Hu, Kai Chen, Jing Lu

Abstract:Attention mechanism has been widely utilized in speech enhancement (SE) because theoretically it can effectively model the long-term inherent connection of signal both in time domain and spectrum domain. However, the generally used global attention mechanism might not be the best choice since the adjacent information naturally imposes more influence than the far-apart information in speech enhancement. In this paper, we validate this conjecture by replacing attention with RNN in two typical state-of-the-art (SOTA) models, multi-scale temporal frequency convolutional network (MTFAA) with axial attention and conformer-based metric-GAN network (CMGAN).

Via

Access Paper or Ask Questions

Local spectral attention for full-band speech enhancement

Feb 11, 2023

Zhongshu Hou, Qinwen Hu, Kai Chen, Jing Lu

Abstract:Attention mechanism has been widely utilized in speech enhancement (SE) because theoretically it can effectively model the inherent connection of signal both in time domain and spectrum domain. Usually, the span of attention is limited in time domain while the attention in frequency domain spans the whole frequency range. In this paper, we notice that the attention over the whole frequency range hampers the inference for full-band SE and possibly leads to excessive residual noise. To alleviate this problem, we introduce local spectral attention (LSA) into full-band SE model by limiting the span of attention. The ablation test on the state-of-the-art (SOTA) full-band SE model reveals that the local frequency attention can effectively improve overall performance. The improved model achieves the best objective score on the full-band VoiceBank+DEMAND set.

Via

Access Paper or Ask Questions

TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou

Feb 05, 2023

Jianxin Chang, Chenbin Zhang, Zhiyi Fu, Xiaoxue Zang, Lin Guan, Jing Lu, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song(+1 more)

Abstract:Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective Target Attention (TA) over the small number of finalists from GSU. Although efficient, existing algorithms mostly suffer from a crucial limitation: the \textit{inconsistent} target-behavior relevance metrics between GSU and ESU. As a result, their GSU usually misses highly relevant behaviors but retrieves ones considered irrelevant by ESU. In such case, the TA in ESU, no matter how attention is allocated, mostly deviates from the real user interests and thus degrades the overall CTR prediction accuracy. To address such inconsistency, we propose \textbf{TWo-stage Interest Network (TWIN)}, where our Consistency-Preserved GSU (CP-GSU) adopts the identical target-behavior relevance metric as the TA in ESU, making the two stages twins. Specifically, to break TA's computational bottleneck and extend it from ESU to GSU, or namely from behavior length $10^2$ to length $10^4-10^5$, we build a novel attention mechanism by behavior feature splitting. For the video inherent features of a behavior, we calculate their linear projection by efficient pre-computing \& caching strategies. And for the user-item cross features, we compress each into a one-dimentional bias term in the attention score calculation to save the computational cost. The consistency between two stages, together with the effective TA-based relevance metric in CP-GSU, contributes to significant performance gain in CTR prediction.

Via

Access Paper or Ask Questions

Distributed Active Noise Control System Based on a Block Diffusion FxLMS Algorithm with Bidirectional Communication

Dec 28, 2022

Tianyou Li, Hongji Duan, Sipei Zhao, Jing Lu, Ian S. Burnett

Figure 1 for Distributed Active Noise Control System Based on a Block Diffusion FxLMS Algorithm with Bidirectional Communication

Figure 2 for Distributed Active Noise Control System Based on a Block Diffusion FxLMS Algorithm with Bidirectional Communication

Figure 3 for Distributed Active Noise Control System Based on a Block Diffusion FxLMS Algorithm with Bidirectional Communication

Figure 4 for Distributed Active Noise Control System Based on a Block Diffusion FxLMS Algorithm with Bidirectional Communication

Abstract:Recently, distributed active noise control systems based on diffusion adaptation have attracted significant research interest due to their balance between computational complexity and stability compared to conventional centralized and decentralized adaptation schemes. However, the existing diffusion FxLMS algorithm employs node-specific adaptation and neighborhood-wide combination, and assumes that the control filters of neighbor nodes are similar to each other. This assumption is not true in practical applications, and it leads to inferior performance to the centralized controller approach. In contrast, this paper proposes a Block Diffusion FxLMS algorithm with bidirectional communication, which uses neighborhood-wide adaptation and node-specific combination to update the control filters. Simulation results validate that the proposed algorithm converges to the solution of the centralized controller with reduced computational burden.

Via

Access Paper or Ask Questions