Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Corpus-Driven Knowledge Acquisition for Discourse Analysis

Jun 07, 1994
Stephen Soderland, Wendy Lehnert

The availability of large on-line text corpora provides a natural and promising bridge between the worlds of natural language processing (NLP) and machine learning (ML). In recent years, the NLP community has been aggressively investigating statistical techniques to drive part-of-speech taggers, but application-specific text corpora can be used to drive knowledge acquisition at much higher levels as well. In this paper we will show how ML techniques can be used to support knowledge acquisition for information extraction systems. It is often very difficult to specify an explicit domain model for many information extraction applications, and it is always labor intensive to implement hand-coded heuristics for each new domain. We have discovered that it is nevertheless possible to use ML algorithms in order to capture knowledge that is only implicitly present in a representative text corpus. Our work addresses issues traditionally associated with discourse analysis and intersentential inference generation, and demonstrates the utility of ML algorithms at this higher level of language analysis. The benefits of our work address the portability and scalability of information extraction (IE) technologies. When hand-coded heuristics are used to manage discourse analysis in an information extraction system, months of programming effort are easily needed to port a successful IE system to a new domain. We will show how ML algorithms can reduce this

* 6 pages, AAAI-94 

  Access Paper or Ask Questions

Robustness Testing of Data and Knowledge Driven Anomaly Detection in Cyber-Physical Systems

Apr 20, 2022
Xugui Zhou, Maxfield Kouzel, Homa Alemzadeh

The growing complexity of Cyber-Physical Systems (CPS) and challenges in ensuring safety and security have led to the increasing use of deep learning methods for accurate and scalable anomaly detection. However, machine learning (ML) models often suffer from low performance in predicting unexpected data and are vulnerable to accidental or malicious perturbations. Although robustness testing of deep learning models has been extensively explored in applications such as image classification and speech recognition, less attention has been paid to ML-driven safety monitoring in CPS. This paper presents the preliminary results on evaluating the robustness of ML-based anomaly detection methods in safety-critical CPS against two types of accidental and malicious input perturbations, generated using a Gaussian-based noise model and the Fast Gradient Sign Method (FGSM). We test the hypothesis of whether integrating the domain knowledge (e.g., on unsafe system behavior) with the ML models can improve the robustness of anomaly detection without sacrificing accuracy and transparency. Experimental results with two case studies of Artificial Pancreas Systems (APS) for diabetes management show that ML-based safety monitors trained with domain knowledge can reduce on average up to 54.2% of robustness error and keep the average F1 scores high while improving transparency.

* 8 pages, 10 figures, to appear in the 52nd IEEE/IFIP International Conference on Dependable Systems and Networks Workshop on Dependable and Secure Machine Learning (DSN-DSML) 

  Access Paper or Ask Questions

BiFSMN: Binary Neural Network for Keyword Spotting

Feb 15, 2022
Haotong Qin, Xudong Ma, Yifu Ding, Xiaoyang Li, Yang Zhang, Yao Tian, Zejun Ma, Jie Luo, Xianglong Liu

The deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications. However, computational resources for these networks are significantly constrained since they usually run on-call on edge devices. In this paper, we present BiFSMN, an accurate and extreme-efficient binary neural network for KWS. We first construct a High-frequency Enhancement Distillation scheme for the binarization-aware training, which emphasizes the high-frequency information from the full-precision network's representation that is more crucial for the optimization of the binarized network. Then, to allow the instant and adaptive accuracy-efficiency trade-offs at runtime, we also propose a Thinnable Binarization Architecture to further liberate the acceleration potential of the binarized network from the topology perspective. Moreover, we implement a Fast Bitwise Computation Kernel for BiFSMN on ARMv8 devices which fully utilizes registers and increases instruction throughput to push the limit of deployment efficiency. Extensive experiments show that BiFSMN outperforms existing binarization methods by convincing margins on various datasets and is even comparable with the full-precision counterpart (e.g., less than 3% drop on Speech Commands V1-12). We highlight that benefiting from the thinnable architecture and the optimized 1-bit implementation, BiFSMN can achieve an impressive 22.3x speedup and 15.5x storage-saving on real-world edge hardware.

* request from company 

  Access Paper or Ask Questions

Automated Deep Learning: Neural Architecture Search Is Not the End

Jan 21, 2022
Xuanyi Dong, David Jacob Kedziora, Katarzyna Musial, Bogdan Gabrys

Deep learning (DL) has proven to be a highly effective approach for developing models in diverse contexts, including visual perception, speech recognition, and machine translation. However, the end-to-end process for applying DL is not trivial. It requires grappling with problem formulation and context understanding, data engineering, model development, deployment, continuous monitoring and maintenance, and so on. Moreover, each of these steps typically relies heavily on humans, in terms of both knowledge and interactions, which impedes the further advancement and democratization of DL. Consequently, in response to these issues, a new field has emerged over the last few years: automated deep learning (AutoDL). This endeavor seeks to minimize the need for human involvement and is best known for its achievements in neural architecture search (NAS), a topic that has been the focus of several surveys. That stated, NAS is not the be-all and end-all of AutoDL. Accordingly, this review adopts an overarching perspective, examining research efforts into automation across the entirety of an archetypal DL workflow. In so doing, this work also proposes a comprehensive set of ten criteria by which to assess existing work in both individual publications and broader research areas. These criteria are: novelty, solution quality, efficiency, stability, interpretability, reproducibility, engineering quality, scalability, generalizability, and eco-friendliness. Thus, ultimately, this review provides an evaluative overview of AutoDL in the early 2020s, identifying where future opportunities for progress may exist.

* 65 pages, 9 tables, 4 figures 

  Access Paper or Ask Questions

Control Architecture of the Double-Cross-Correlation Processor for Sampling-Rate-Offset Estimation in Acoustic Sensor Networks

May 28, 2021
Aleksej Chinaev, Sven Wienand, Gerald Enzner

Distributed hardware of acoustic sensor networks bears inconsistency of local sampling frequencies, which is detrimental to signal processing. Fundamentally, sampling rate offset (SRO) nonlinearly relates the discrete-time signals acquired by different sensor nodes. As such, retrieval of SRO from the available signals requires nonlinear estimation, like double-cross-correlation processing (DXCP), and frequently results in biased estimation. SRO compensation by asynchronous sampling rate conversion (ASRC) on the signals then leaves an unacceptable residual. As a remedy to this problem, multi-stage procedures have been devised to diminish the SRO residual with multiple iterations of SRO estimation and ASRC over the entire signal. This paper converts the mechanism of offline multi-stage processing into a continuous feedback-control loop comprising a controlled ASRC unit followed by an online implementation of DXCP-based SRO estimation. To support the design of an optimum internal model control unit for this closed-loop system, the paper deploys an analytical dynamical model of the proposed online DXCP. The resulting control architecture then merely applies a single treatment of each signal frame, while efficiently diminishing SRO bias with time. Evaluations with both speech and Gaussian input demonstrate that the high accuracy of multi-stage processing is maintained at the low complexity of single-stage (open-loop) processing.

  Access Paper or Ask Questions

The Challenges of Persian User-generated Textual Content: A Machine Learning-Based Approach

Jan 20, 2021
Mohammad Kasra Habib

Over recent years a lot of research papers and studies have been published on the development of effective approaches that benefit from a large amount of user-generated content and build intelligent predictive models on top of them. This research applies machine learning-based approaches to tackle the hurdles that come with Persian user-generated textual content. Unfortunately, there is still inadequate research in exploiting machine learning approaches to classify/cluster Persian text. Further, analyzing Persian text suffers from a lack of resources; specifically from datasets and text manipulation tools. Since the syntax and semantics of the Persian language is different from English and other languages, the available resources from these languages are not instantly usable for Persian. In addition, recognition of nouns and pronouns, parts of speech tagging, finding words' boundary, stemming or character manipulations for Persian language are still unsolved issues that require further studying. Therefore, efforts have been made in this research to address some of the challenges. This presented approach uses a machine-translated datasets to conduct sentiment analysis for the Persian language. Finally, the dataset has been rehearsed with different classifiers and feature engineering approaches. The results of the experiments have shown promising state-of-the-art performance in contrast to the previous efforts; the best classifier was Support Vector Machines which achieved a precision of 91.22%, recall of 91.71%, and F1 score of 91.46%.

* 12 Pages bib inc., 5 Figures and 5 Tables 

  Access Paper or Ask Questions

Detecting Suspicious Events in Fast Information Flows

Jan 07, 2021
Kristiaan Pelckmans, Moustafa Aboushady, Andreas Brosemyr

We describe a computational feather-light and intuitive, yet provably efficient algorithm, named HALFADO. HALFADO is designed for detecting suspicious events in a high-frequency stream of complex entries, based on a relatively small number of examples of human judgement. Operating a sufficiently accurate detection system is vital for {\em assisting} teams of human experts in many different areas of the modern digital society. These systems have intrinsically a far-reaching normative effect, and public knowledge of the workings of such technology should be a human right. On a conceptual level, the present approach extends one of the most classical learning algorithms for classification, inheriting its theoretical properties. It however works in a semi-supervised way integrating human and computational intelligence. On a practical level, this algorithm transcends existing approaches (expert systems) by managing and boosting their performance into a single global detector. We illustrate HALFADO's efficacy on two challenging applications: (1) for detecting {\em hate speech} messages in a flow of text messages gathered from a social media platform, and (2) for a Transaction Monitoring System (TMS) in FinTech detecting fraudulent transactions in a stream of financial transactions. This algorithm illustrates that - contrary to popular belief - advanced methods of machine learning need not require neither advanced levels of computation power nor expensive annotation efforts.

  Access Paper or Ask Questions

Efficiently Mitigating Classification Bias via Transfer Learning

Oct 24, 2020
Xisen Jin, Francesco Barbieri, Aida Mostafazadeh Davani, Brendan Kennedy, Leonardo Neves, Xiang Ren

Prediction bias in machine learning models refers to unintended model behaviors that discriminate against inputs mentioning or produced by certain groups; for example, hate speech classifiers predict more false positives for neutral text mentioning specific social groups. Mitigating bias for each task or domain is inefficient, as it requires repetitive model training, data annotation (e.g., demographic information), and evaluation. In pursuit of a more accessible solution, we propose the Upstream Bias Mitigation for Downstream Fine-Tuning (UBM) framework, which mitigate one or multiple bias factors in downstream classifiers by transfer learning from an upstream model. In the upstream bias mitigation stage, explanation regularization and adversarial training are applied to mitigate multiple bias factors. In the downstream fine-tuning stage, the classifier layer of the model is re-initialized, and the entire model is fine-tuned to downstream tasks in potentially novel domains without any further bias mitigation. We expect downstream classifiers to be less biased by transfer learning from de-biased upstream models. We conduct extensive experiments varying the similarity between the source and target data, as well as varying the number of dimensions of bias (e.g., discrimination against specific social groups or dialects). Our results indicate the proposed UBM framework can effectively reduce bias in downstream classifiers.

* 10 pages 

  Access Paper or Ask Questions

Multimodal Inductive Transfer Learning for Detection of Alzheimer's Dementia and its Severity

Aug 30, 2020
Utkarsh Sarawgi, Wazeer Zulfikar, Nouran Soliman, Pattie Maes

Alzheimer's disease is estimated to affect around 50 million people worldwide and is rising rapidly, with a global economic burden of nearly a trillion dollars. This calls for scalable, cost-effective, and robust methods for detection of Alzheimer's dementia (AD). We present a novel architecture that leverages acoustic, cognitive, and linguistic features to form a multimodal ensemble system. It uses specialized artificial neural networks with temporal characteristics to detect AD and its severity, which is reflected through Mini-Mental State Exam (MMSE) scores. We first evaluate it on the ADReSS challenge dataset, which is a subject-independent and balanced dataset matched for age and gender to mitigate biases, and is available through DementiaBank. Our system achieves state-of-the-art test accuracy, precision, recall, and F1-score of 83.3% each for AD classification, and state-of-the-art test root mean squared error (RMSE) of 4.60 for MMSE score regression. To the best of our knowledge, the system further achieves state-of-the-art AD classification accuracy of 88.0% when evaluated on the full benchmark DementiaBank Pitt database. Our work highlights the applicability and transferability of spontaneous speech to produce a robust inductive transfer learning model, and demonstrates generalizability through a task-agnostic feature-space. The source code is available at

* To appear in INTERSPEECH 2020 

  Access Paper or Ask Questions