Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech Separation

Jan 26, 2022
Chenda Li, Lei Yang, Weiqin Wang, Yanmin Qian

Continuous speech separation for meeting pre-processing has recently become a focused research topic. Compared to the data in utterance-level speech separation, the meeting-style audio stream lasts longer, has an uncertain number of speakers. We adopt the time-domain speech separation method and the recently proposed Graph-PIT to build super low-latency online speech separation model, which is very important for the real application. The low-latency time-domain encoder with a small stride leads to an extremely long feature sequence. We proposed a simple yet efficient model named Skipping Memory (SkiM) for the long sequence modeling. Experimental results show that SkiM achieves on par or even better separation performance than DPRNN. Meanwhile, the computational cost of SkiM is reduced by 75% compared to DPRNN. The strong long sequence modeling capability and low computational cost make SkiM a suitable model for online CSS applications. Our fastest real-time model gets 17.1 dB signal-to-distortion (SDR) improvement with less than 1.0 millisecond latency in the simulated meeting-style evaluation.

* Accepted by ICASSP 2022 

  Access Paper or Ask Questions

Representation Learning on Heterostructures via Heterogeneous Anonymous Walks

Jan 18, 2022
Xuan Guo, Pengfei Jiao, Ting Pan, Wang Zhang, Mengyu Jia, Danyang Shi, Wenjun Wang

Capturing structural similarity has been a hot topic in the field of network embedding recently due to its great help in understanding the node functions and behaviors. However, existing works have paid very much attention to learning structures on homogeneous networks while the related study on heterogeneous networks is still a void. In this paper, we try to take the first step for representation learning on heterostructures, which is very challenging due to their highly diverse combinations of node types and underlying structures. To effectively distinguish diverse heterostructures, we firstly propose a theoretically guaranteed technique called heterogeneous anonymous walk (HAW) and its variant coarse HAW (CHAW). Then, we devise the heterogeneous anonymous walk embedding (HAWE) and its variant coarse HAWE in a data-driven manner to circumvent using an extremely large number of possible walks and train embeddings by predicting occurring walks in the neighborhood of each node. Finally, we design and apply extensive and illustrative experiments on synthetic and real-world networks to build a benchmark on heterostructure learning and evaluate the effectiveness of our methods. The results demonstrate our methods achieve outstanding performance compared with both homogeneous and heterogeneous classic methods, and can be applied on large-scale networks.

* 13 pages, 6 figures, 5 tables 

  Access Paper or Ask Questions

Scalable and Decentralized Algorithms for Anomaly Detection via Learning-Based Controlled Sensing

Dec 08, 2021
Geethu Joseph, Chen Zhong, M. Cenk Gursoy, Senem Velipasalar, Pramod K. Varshney

We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision-maker observes a subset of the processes at any given time instant and obtains a noisy binary indicator of whether or not the corresponding process is anomalous. In this setting, we develop an anomaly detection algorithm that chooses the processes to be observed at a given time instant, decides when to stop taking observations, and declares the decision on anomalous processes. The objective of the detection algorithm is to identify the anomalies with an accuracy exceeding the desired value while minimizing the delay in decision making. We devise a centralized algorithm where the processes are jointly selected by a common agent as well as a decentralized algorithm where the decision of whether to select a process is made independently for each process. Our algorithms rely on a Markov decision process defined using the marginal probability of each process being normal or anomalous, conditioned on the observations. We implement the detection algorithms using the deep actor-critic reinforcement learning framework. Unlike prior work on this topic that has exponential complexity in the number of processes, our algorithms have computational and memory requirements that are both polynomial in the number of processes. We demonstrate the efficacy of these algorithms using numerical experiments by comparing them with state-of-the-art methods.

* 13 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2105.06289 

  Access Paper or Ask Questions

Escape saddle points by a simple gradient-descent based algorithm

Nov 28, 2021
Chenyi Zhang, Tongyang Li

Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$, it outputs an $\epsilon$-approximate second-order stationary point in $\tilde{O}(\log n/\epsilon^{1.75})$ iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with $\tilde{O}((\log n)^{4}/\epsilon^{2})$ or $\tilde{O}((\log n)^{6}/\epsilon^{1.75})$ iterations, our algorithm is polynomially better in terms of $\log n$ and matches their complexities in terms of $1/\epsilon$. For the stochastic setting, our algorithm outputs an $\epsilon$-approximate second-order stationary point in $\tilde{O}((\log n)^{2}/\epsilon^{4})$ iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in $\log n$ compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results.

* 34 pages, 8 figures, to appear in the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) 

  Access Paper or Ask Questions

Aspect-driven User Preference and News Representation Learning for News Recommendation

Oct 12, 2021
Rongyao Wang, Wenpeng Lu, Shoujin Wang, Xueping Peng, Hao Wu, Qian Zhang

News recommender systems are essential for helping users to efficiently and effectively find out those interesting news from a large amount of news. Most of existing news recommender systems usually learn topic-level representations of users and news for recommendation, and neglect to learn more informative aspect-level features of users and news for more accurate recommendation. As a result, they achieve limited recommendation performance. Aiming at addressing this deficiency, we propose a novel Aspect-driven News Recommender System (ANRS) built on aspect-level user preference and news representation learning. Here, \textit{news aspect} is fine-grained semantic information expressed by a set of related words, which indicates specific aspects described by the news. In ANRS, \textit{news aspect-level encoder} and \textit{user aspect-level encoder} are devised to learn the fine-grained aspect-level representations of user's preferences and news characteristics respectively, which are fed into \textit{click predictor} to judge the probability of the user clicking the candidate news. Extensive experiments are done on the commonly used real-world dataset MIND, which demonstrate the superiority of our method compared with representative and state-of-the-art methods.

* 20 pages 

  Access Paper or Ask Questions

SimpleX: A Simple and Strong Baseline for Collaborative Filtering

Sep 26, 2021
Kelong Mao, Jieming Zhu, Jinpeng Wang, Quanyu Dai, Zhenhua Dong, Xi Xiao, Xiuqiang He

Collaborative filtering (CF) is a widely studied research topic in recommender systems. The learning of a CF model generally depends on three major components, namely interaction encoder, loss function, and negative sampling. While many existing studies focus on the design of more powerful interaction encoders, the impacts of loss functions and negative sampling ratios have not yet been well explored. In this work, we show that the choice of loss function as well as negative sampling ratio is equivalently important. More specifically, we propose the cosine contrastive loss (CCL) and further incorporate it to a simple unified CF model, dubbed SimpleX. Extensive experiments have been conducted on 11 benchmark datasets and compared with 29 existing CF models in total. Surprisingly, the results show that, under our CCL loss and a large negative sampling ratio, SimpleX can surpass most sophisticated state-of-the-art models by a large margin (e.g., max 48.5% improvement in [email protected] over LightGCN). We believe that SimpleX could not only serve as a simple strong baseline to foster future research on CF, but also shed light on the potential research direction towards improving loss function and negative sampling.

* Accepted in CIKM 2021 

  Access Paper or Ask Questions

Incremental Learning Techniques for Online Human Activity Recognition

Sep 20, 2021
Meysam Vakili, Masoumeh Rezaei

Unobtrusive and smart recognition of human activities using smartphones inertial sensors is an interesting topic in the field of artificial intelligence acquired tremendous popularity among researchers, especially in recent years. A considerable challenge that needs more attention is the real-time detection of physical activities, since for many real-world applications such as health monitoring and elderly care, it is required to recognize users' activities immediately to prevent severe damages to individuals' wellness. In this paper, we propose a human activity recognition (HAR) approach for the online prediction of physical movements, benefiting from the capabilities of incremental learning algorithms. We develop a HAR system containing monitoring software and a mobile application that collects accelerometer and gyroscope data and send them to a remote server via the Internet for classification and recognition operations. Six incremental learning algorithms are employed and evaluated in this work and compared with several batch learning algorithms commonly used for developing offline HAR systems. The Final results indicated that considering all performance evaluation metrics, Incremental K-Nearest Neighbors and Incremental Naive Bayesian outperformed other algorithms, exceeding a recognition accuracy of 95% in real-time.

* 16 pages, 5 figures, 7 tables 

  Access Paper or Ask Questions

PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

Sep 17, 2021
Chen Zhang, Jiaxing Yu, LuChin Chang, Xu Tan, Jiawei Chen, Tao Qin, Kejun Zhang

Automatic lyrics transcription (ALT), which can be regarded as automatic speech recognition (ASR) on singing voice, is an interesting and practical topic in academia and industry. ALT has not been well developed mainly due to the dearth of paired singing voice and lyrics datasets for model training. Considering that there is a large amount of ASR training data, a straightforward method is to leverage ASR data to enhance ALT training. However, the improvement is marginal when training the ALT system directly with ASR data, because of the gap between the singing voice and standard speech data which is rooted in music-specific acoustic characteristics in singing voice. In this paper, we propose PDAugment, a data augmentation method that adjusts pitch and duration of speech at syllable level under the guidance of music scores to help ALT training. Specifically, we adjust the pitch and duration of each syllable in natural speech to those of the corresponding note extracted from music scores, so as to narrow the gap between natural speech and singing voice. Experiments on DSing30 and Dali corpus show that the ALT system equipped with our PDAugment outperforms previous state-of-the-art systems by 5.9% and 18.1% WERs respectively, demonstrating the effectiveness of PDAugment for ALT.

* 7 pages 

  Access Paper or Ask Questions

Elbert: Fast Albert with Confidence-Window Based Early Exit

Jul 01, 2021
Keli Xie, Siyuan Lu, Meiqi Wang, Zhongfeng Wang

Despite the great success in Natural Language Processing (NLP) area, large pre-trained language models like BERT are not well-suited for resource-constrained or real-time applications owing to the large number of parameters and slow inference speed. Recently, compressing and accelerating BERT have become important topics. By incorporating a parameter-sharing strategy, ALBERT greatly reduces the number of parameters while achieving competitive performance. Nevertheless, ALBERT still suffers from a long inference time. In this work, we propose the ELBERT, which significantly improves the average inference speed compared to ALBERT due to the proposed confidence-window based early exit mechanism, without introducing additional parameters or extra training overhead. Experimental results show that ELBERT achieves an adaptive inference speedup varying from 2$\times$ to 10$\times$ with negligible accuracy degradation compared to ALBERT on various datasets. Besides, ELBERT achieves higher accuracy than existing early exit methods used for accelerating BERT under the same computation cost. Furthermore, to understand the principle of the early exit mechanism, we also visualize the decision-making process of it in ELBERT.

  Access Paper or Ask Questions

MathBERT: A Pre-Trained Model for Mathematical Formula Understanding

May 02, 2021
Shuai Peng, Ke Yuan, Liangcai Gao, Zhi Tang

Large-scale pre-trained models like BERT, have obtained a great success in various Natural Language Processing (NLP) tasks, while it is still a challenge to adapt them to the math-related tasks. Current pre-trained models neglect the structural features and the semantic correspondence between formula and its context. To address these issues, we propose a novel pre-trained model, namely \textbf{MathBERT}, which is jointly trained with mathematical formulas and their corresponding contexts. In addition, in order to further capture the semantic-level structural features of formulas, a new pre-training task is designed to predict the masked formula substructures extracted from the Operator Tree (OPT), which is the semantic structural representation of formulas. We conduct various experiments on three downstream tasks to evaluate the performance of MathBERT, including mathematical information retrieval, formula topic classification and formula headline generation. Experimental results demonstrate that MathBERT significantly outperforms existing methods on all those three tasks. Moreover, we qualitatively show that this pre-trained model effectively captures the semantic-level structural information of formulas. To the best of our knowledge, MathBERT is the first pre-trained model for mathematical formula understanding.

  Access Paper or Ask Questions