Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

Jul 27, 2018
Shubham Toshniwal, Anjuli Kannan, Chung-Cheng Chiu, Yonghui Wu, Tara N Sainath, Karen Livescu

Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and requires only a parallel corpus of speech and text for training. However, unlike in conventional approaches that combine separate acoustic and language models, it is not clear how to use additional (unpaired) text. While there has been previous work on methods addressing this problem, a thorough comparison among methods is still lacking. In this paper, we compare a suite of past methods and some of our own proposed methods for using unpaired text data to improve encoder-decoder models. For evaluation, we use the medium-sized Switchboard data set and the large-scale Google voice search and dictation data sets. Our results confirm the benefits of using unpaired text across a range of methods and data sets. Surprisingly, for first-pass decoding, the rather simple approach of shallow fusion performs best across data sets. However, for Google data sets we find that cold fusion has a lower oracle error rate and outperforms other approaches after second-pass rescoring on the Google voice search data set.

* Submitted to SLT 2018 

  Access Paper or Ask Questions

Leveraging Search History for Improving Person-Job Fit

Mar 27, 2022
Yupeng Hou, Xingyu Pan, Wayne Xin Zhao, Shuqing Bian, Yang Song, Tao Zhang, Ji-Rong Wen

As the core technique of online recruitment platforms, person-job fit can improve hiring efficiency by accurately matching job positions with qualified candidates. However, existing studies mainly focus on the recommendation scenario, while neglecting another important channel for linking positions with job seekers, i.e. search. Intuitively, search history contains rich user behavior in job seeking, reflecting important evidence for job intention of users. In this paper, we present a novel Search History enhanced Person-Job Fit model, named as SHPJF. To utilize both text content from jobs/resumes and search histories from users, we propose two components with different purposes. For text matching component, we design a BERT-based text encoder for capturing the semantic interaction between resumes and job descriptions. For intention modeling component, we design two kinds of intention modeling approaches based on the Transformer architecture, either based on the click sequence or query text sequence. To capture underlying job intentions, we further propose an intention clustering technique to identify and summarize the major intentions from search logs. Extensive experiments on a large real-world recruitment dataset have demonstrated the effectiveness of our approach.

* Accepted by DASFAA 2022 

  Access Paper or Ask Questions

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Sep 22, 2021
Fu Sun, Feng-Lin Li, Ruize Wang, Qianglong Chen, Xingyi Cheng, Ji Zhang

Knowledge enhanced pre-trained language models (K-PLMs) are shown to be effective for many public tasks in the literature but few of them have been successfully applied in practice. To address this problem, we propose K-AID, a systematic approach that includes a low-cost knowledge acquisition process for acquiring domain knowledge, an effective knowledge infusion module for improving model performance, and a knowledge distillation component for reducing the model size and deploying K-PLMs on resource-restricted devices (e.g., CPU) for real-world application. Importantly, instead of capturing entity knowledge like the majority of existing K-PLMs, our approach captures relational knowledge, which contributes to better-improving sentence-level text classification and text matching tasks that play a key role in question answering (QA). We conducted a set of experiments on five text classification tasks and three text matching tasks from three domains, namely E-commerce, Government, and Film&TV, and performed online A/B tests in E-commerce. Experimental results show that our approach is able to achieve substantial improvement on sentence-level question answering tasks and bring beneficial business value in industrial settings.

* CIKM 2021 

  Access Paper or Ask Questions

Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate

Jun 10, 2021
Austin Botelho, Bertie Vidgen, Scott A. Hale

Accurate detection and classification of online hate is a difficult task. Implicit hate is particularly challenging as such content tends to have unusual syntax, polysemic words, and fewer markers of prejudice (e.g., slurs). This problem is heightened with multimodal content, such as memes (combinations of text and images), as they are often harder to decipher than unimodal content (e.g., text alone). This paper evaluates the role of semantic and multimodal context for detecting implicit and explicit hate. We show that both text- and visual- enrichment improves model performance, with the multimodal model (0.771) outperforming other models' F1 scores (0.544, 0.737, and 0.754). While the unimodal-text context-aware (transformer) model was the most accurate on the subtask of implicit hate detection, the multimodal model outperformed it overall because of a lower propensity towards false positives. We find that all models perform better on content with full annotator agreement and that multimodal models are best at classifying the content where annotators disagree. To conduct these investigations, we undertook high-quality annotation of a sample of 5,000 multimodal entries. Tweets were annotated for primary category, modality, and strategy. We make this corpus, along with the codebook, code, and final model, freely available.

* Findings of ACL, 2021 
* Please note the paper contains examples of hateful content 

  Access Paper or Ask Questions

Wavelet Scattering Networks for Atomistic Systems with Extrapolation of Material Properties

Jun 01, 2020
Paul Sinz, Michael W. Swift, Xavier Brumwell, Jialin Liu, Kwang Jin Kim, Yue Qi, Matthew Hirn

The dream of machine learning in materials science is for a machine learned model to learn the underlying physics of an atomic system, allowing the model to move beyond interpolation of the training set to the prediction of properties that were not present in the original training data. In addition to advances in machine learning architectures and training techniques, achieving this ambitious goal requires a method to convert a 3D atomic system into a feature representation that preserves rotational and translational symmetry, smoothness under small perturbations, and invariance under re-ordering. The atomic orbital wavelet scattering transform preserves these symmetries by construction, and has achieved great success as a featurization method for machine learning energy prediction. Both in small molecules and in the bulk amorphous $\text{Li}_{\alpha}\text{Si}$ system, machine learning models using wavelet scattering coefficients as features have demonstrated a comparable accuracy to Density Functional Theory at a small fraction of the computational cost. In this work, we test the generalizability of our $\text{Li}_{\alpha}\text{Si}$ energy predictor to properties that were not included in the training set, such as elastic constants and migration barriers. We demonstrate that statistical feature selection methods can reduce over-fitting and lead to remarkable accuracy in these extrapolation tasks.

* 15 pages; 10 figures; 4 tables 

  Access Paper or Ask Questions

An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering

May 25, 2020
Chia-Chih Kuo, Shang-Bao Luo, Kuan-Yu Chen

In a spoken multiple-choice question answering (SMCQA) task, given a passage, a question, and multiple choices all in the form of speech, the machine needs to pick the correct choice to answer the question. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in system development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, this study concentrates on designing a BERT-based SMCQA framework, which not only inherits the advantages of contextualized language representations learned by BERT, but integrates the complementary acoustic-level information distilled from audio with the text-level information. Consequently, an audio-enriched BERT-based SMCQA framework is proposed. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.

  Access Paper or Ask Questions

Cloud-Based Face and Speech Recognition for Access Control Applications

May 08, 2020
Nathalie Tkauc, Thao Tran, Kevin Hernandez-Diaz, Fernando Alonso-Fernandez

This paper describes the implementation of a system to recognize employees and visitors wanting to gain access to a physical office through face images and speech-to-text recognition. The system helps employees to unlock the entrance door via face recognition without the need of tag-keys or cards. To prevent spoofing attacks and increase security, a randomly generated code is sent to the employee, who then has to type it into the screen. On the other hand, visitors and delivery persons are provided with a speech-to-text service where they utter the name of the employee that they want to meet, and the system then sends a notification to the right employee automatically. The hardware of the system is constituted by two Raspberry Pi, a 7-inch LCD-touch display, a camera, and a sound card with a microphone and speaker. To carry out face recognition and speech-to-text conversion, the cloud-based platforms Amazon Web Services and the Google Speech-to-Text API service are used respectively. The two-step face authentication mechanism for employees provides an increased level of security and protection against spoofing attacks without the need of carrying key-tags or access cards, while disturbances by visitors or couriers are minimized by notifying their arrival to the right employee, without disturbing other co-workers by means of ring-bells.

* Published at Proc. 6th International Workshop on Security and Privacy in the Cloud, SPC, in conjunction with IEEE Conference on Communications and Network Security, CNS, Avignon, France, 29 June - 1 July 2020 

  Access Paper or Ask Questions

Semantics of the Unwritten

Apr 05, 2020
He Bai, Peng Shi, Jimmy Lin, Luchen Tan, Kun Xiong, Wen Gao, Jie Liu, Ming Li

The semantics of a text is manifested not only by what is read, but also by what is not read. In this article, we will study how those implicit "not read" information such as end-of-paragraph (EOP) and end-of-sequence (EOS) affect the quality of text generation. Transformer-based pretrained language models (LMs) have demonstrated the ability to generate long continuations with good quality. This model gives us a platform for the first time to demonstrate that paragraph layouts and text endings are also important components of human writing. Specifically, we find that pretrained LMs can generate better continuations by learning to generate the end of the paragraph (EOP) in the fine-tuning stage. Experimental results on English story generation show that EOP can lead to higher BLEU score and lower EOS perplexity. To further investigate the relationship between text ending and EOP, we conduct experiments with a self-collected Chinese essay dataset on Chinese-GPT2, a character level LM without paragraph breaker or EOS during pre-training. Experimental results show that the Chinese GPT2 can generate better essay endings with paragraph information. Experiments on both English stories and Chinese essays demonstrate that learning to end paragraphs can benefit the continuation generation with pretrained LMs.

  Access Paper or Ask Questions

OCR accuracy improvement on document images through a novel pre-processing approach

Sep 11, 2015
Abdeslam El Harraj, Naoufal Raissouni

Digital camera and mobile document image acquisition are new trends arising in the world of Optical Character Recognition and text detection. In some cases, such process integrates many distortions and produces poorly scanned text or text-photo images and natural images, leading to an unreliable OCR digitization. In this paper, we present a novel nonparametric and unsupervised method to compensate for undesirable document image distortions aiming to optimally improve OCR accuracy. Our approach relies on a very efficient stack of document image enhancing techniques to recover deformation of the entire document image. First, we propose a local brightness and contrast adjustment method to effectively handle lighting variations and the irregular distribution of image illumination. Second, we use an optimized greyscale conversion algorithm to transform our document image to greyscale level. Third, we sharpen the useful information in the resulting greyscale image using Un-sharp Masking method. Finally, an optimal global binarization approach is used to prepare the final document image to OCR recognition. The proposed approach can significantly improve text detection rate and optical character recognition accuracy. To demonstrate the efficiency of our approach, an exhaustive experimentation on a standard dataset is presented.

* Signal & Image Processing : An International Journal (SIPIJ) Vol.6, No.4, August 2015 

  Access Paper or Ask Questions