Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Improving low-resource ASR performance with untranscribed out-of-domain data

Jun 02, 2021
Jayadev Billa

Semi-supervised training (SST) is a common approach to leverage untranscribed/unlabeled speech data to improve automatic speech recognition performance in low-resource languages. However, if the available unlabeled speech is mismatched to the target domain, SST is not as effective, and in many cases performs worse than the original system. In this paper, we address the issue of low-resource ASR when only untranscribed out-of-domain speech data is readily available in the target language. Specifically, we look to improve performance on conversational/telephony speech (target domain) using web resources, in particular YouTube data, which more closely resembles news/topical broadcast data. Leveraging SST, we show that while in some cases simply pooling the out-of-domain data with the training data lowers word error rate (WER), in all cases, we see improvements if we train first with the out-of-domain data and then fine-tune the resulting model with the original training data. Using 2000 hours of speed perturbed YouTube audio in each target language, with semi-supervised transcripts, we show improvements on multiple languages/data sets, of up to 16.3% relative improvement in WER over the baseline systems and up to 7.4% relative improvement in WER over a system that simply pools the out-of-domain data with the training data.

  Access Paper or Ask Questions

Vehicle Localization via Cooperative Channel Mapping

Feb 09, 2021
Xinghe Chu, Zhaoming Lu, David Gesbert, Luhan Wang, Xiangming Wen

This paper addresses vehicle positioning, a topic whose importance has risen dramatically in the context of future autonomous driving systems. While classical methods that use GPS and/or beacon signals from network infrastructure for triangulation tend to be sensitive to multi-paths and signal obstruction, our method exhibits robustness with respect to such phenomena. Our approach builds on the recently proposed Channel-SLAM method which first enabled leveraging of multi-path so as to improve (single) vehicle positioning. Here, we propose a cooperative mapping approach which builds upon the Channel-SLAM concept, referred to here as Team Channel-SLAM. Team Channel-SLAM not only exploits the stationary nature of many reflecting objects around the vehicle, but also capitalizes on the multi-vehicle nature of road traffic. The key intuition behind our method is the exploitation for the first time of the correlation between reflectors around multiple neighboring vehicles. An algorithm is derived for reflector selection and estimation, combined with a team particle filter (TPF) so as to achieve high precision simultaneous multiple vehicle positioning. We obtain large improvement over the single-vehicle positioning scenario, with gains being already noticeable for moderate vehicle densities, such as over 40% improvement for a vehicle density as low as 4 vehicles in 132 meters' length road.

  Access Paper or Ask Questions

Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking

Jan 07, 2021
Yingjie Gu, Xiaoye Qu, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan, Xiaolin Gui

Entity linking (EL) for the rapidly growing short text (e.g. search queries and news titles) is critical to industrial applications. Most existing approaches relying on adequate context for long text EL are not effective for the concise and sparse short text. In this paper, we propose a novel framework called Multi-turn Multiple-choice Machine reading comprehension (M3}) to solve the short text EL from a new perspective: a query is generated for each ambiguous mention exploiting its surrounding context, and an option selection module is employed to identify the golden entity from candidates using the query. In this way, M3 framework sufficiently interacts limited context with candidate entities during the encoding process, as well as implicitly considers the dissimilarities inside the candidate bunch in the selection stage. In addition, we design a two-stage verifier incorporated into M3 to address the commonly existed unlinkable problem in short text. To further consider the topical coherence and interdependence among referred entities, M3 leverages a multi-turn fashion to deal with mentions in a sequence manner by retrospecting historical cues. Evaluation shows that our M3 framework achieves the state-of-the-art performance on five Chinese and English datasets for the real-world short text EL.

* Accepted at AAAI 2021 

  Access Paper or Ask Questions

Scalable Verification of Quantized Neural Networks (Technical Report)

Dec 15, 2020
Thomas A. Henzinger, Mathias Lechner, Đorđe Žikelić

Formal verification of neural networks is an active topic of research, and recent advances have significantly increased the size of the networks that verification tools can handle. However, most methods are designed for verification of an idealized model of the actual network which works over real arithmetic and ignores rounding imprecisions. This idealization is in stark contrast to network quantization, which is a technique that trades numerical precision for computational efficiency and is, therefore, often applied in practice. Neglecting rounding errors of such low-bit quantized neural networks has been shown to lead to wrong conclusions about the network's correctness. Thus, the desired approach for verifying quantized neural networks would be one that takes these rounding errors into account. In this paper, we show that verifying the bit-exact implementation of quantized neural networks with bit-vector specifications is PSPACE-hard, even though verifying idealized real-valued networks and satisfiability of bit-vector specifications alone are each in NP. Furthermore, we explore several practical heuristics toward closing the complexity gap between idealized and bit-exact verification. In particular, we propose three techniques for making SMT-based verification of quantized neural networks more scalable. Our experiments demonstrate that our proposed methods allow a speedup of up to three orders of magnitude over existing approaches.

  Access Paper or Ask Questions

Tensor-based Intrinsic Subspace Representation Learning for Multi-view Clustering

Nov 12, 2020
Qinghai Zheng, Jihua Zhu, Zhongyu Li, Haoyu Tang, Shuangxun Ma

As a hot research topic, many multi-view clustering approaches are proposed over the past few years. Nevertheless, most existing algorithms merely take the consensus information among different views into consideration for clustering. Actually, it may hinder the multi-view clustering performance in real-life applications, since different views usually contain diverse statistic properties. To address this problem, we propose a novel Tensor-based Intrinsic Subspace Representation Learning (TISRL) for multi-view clustering in this paper. Concretely, the rank preserving decomposition is proposed firstly to effectively deal with the diverse statistic information contained in different views. Then, to achieve the intrinsic subspace representation, the tensor-singular value decomposition based low-rank tensor constraint is also utilized in our method. It can be seen that specific information contained in different views is fully investigated by the rank preserving decomposition, and the high-order correlations of multi-view data are also mined by the low-rank tensor constraint. The objective function can be optimized by an augmented Lagrangian multiplier based alternating direction minimization algorithm. Experimental results on nine common used real-world multi-view datasets illustrate the superiority of TISRL.

  Access Paper or Ask Questions

Self-supervised Exposure Trajectory Recovery for Dynamic Blur Estimation

Oct 06, 2020
Youjian Zhang, Chaoyue Wang, Stephen J. Maybank, Dacheng Tao

Dynamic scene blurring is an important yet challenging topic. Recently, deep learning methods have achieved impressive performance for dynamic scene deblurring. However, the motion information contained in a blurry image has yet to be fully explored and accurately formulated because: (i) the ground truth of blurry motion is difficult to obtain; (ii) the temporal ordering is destroyed during the exposure; and (iii) the motion estimation is highly ill-posed. By revisiting the principle of camera exposure, dynamic blur can be described by the relative motions of sharp content with respect to each exposed pixel. We define exposure trajectories, which record the trajectories of relative motions to represent the motion information contained in a blurry image and explain the causes of the dynamic blur. A new blur representation, which we call motion offset, is proposed to model pixel-wise displacements of the latent sharp image at multiple timepoints. Under mild constraints, the learned motion offsets can recover dense, (non-)linear exposure trajectories, which significantly reduce temporal disorder and ill-posed problems. Finally, we demonstrate that the estimated exposure trajectories can fit real-world dynamic blurs and further contribute to motion-aware image deblurring and warping-based video extraction from a single blurry image.

  Access Paper or Ask Questions

Legal Judgment Prediction (LJP) Amid the Advent of Autonomous AI Legal Reasoning

Sep 29, 2020
Lance Eliot

Legal Judgment Prediction (LJP) is a longstanding and open topic in the theory and practice-of-law. Predicting the nature and outcomes of judicial matters is abundantly warranted, keenly sought, and vigorously pursued by those within the legal industry and also by society as a whole. The tenuous act of generating judicially laden predictions has been limited in utility and exactitude, requiring further advancement. Various methods and techniques to predict legal cases and judicial actions have emerged over time, especially arising via the advent of computer-based modeling. There has been a wide range of approaches attempted, including simple calculative methods to highly sophisticated and complex statistical models. Artificial Intelligence (AI) based approaches have also been increasingly utilized. In this paper, a review of the literature encompassing Legal Judgment Prediction is undertaken, along with innovatively proposing that the advent of AI Legal Reasoning (AILR) will have a pronounced impact on how LJP is performed and its predictive accuracy. Legal Judgment Prediction is particularly examined using the Levels of Autonomy (LoA) of AI Legal Reasoning, plus, other considerations are explored including LJP probabilistic tendencies, biases handling, actor predictors, transparency, judicial reliance, legal case outcomes, and other crucial elements entailing the overarching legal judicial milieu.

* 39 pages, 13 figures 

  Access Paper or Ask Questions

Data Science for Motion and Time Analysis with Modern Motion Sensor Data

Aug 25, 2020
Chiwoo Park, Sang Do Noh, Anuj Srivastava

The motion-and-time analysis has been a popular research topic in operations research, especially for analyzing work performances in manufacturing and service operations. It is regaining attention as continuous improvement tools for lean manufacturing and smart factory. This paper develops a framework for data-driven analysis of work motions and studies their correlations to work speeds or execution rates, using data collected from modern motion sensors. The past analyses largely relied on manual steps involving time-consuming stop-watching and video-taping, followed by manual data analysis. While modern sensing devices have automated the collection of motion data, the motion analytics that transform the new data into knowledge are largely underdeveloped. Unsolved technical questions include: How the motion and time information can be extracted from the motion sensor data, how work motions and execution rates are statistically modeled and compared, and what are the statistical correlations of motions to the rates? In this paper, we develop a novel mathematical framework for motion and time analysis with motion sensor data, by defining new mathematical representation spaces of human motions and execution rates and by developing statistical tools on these new spaces. This methodological research is demonstrated using five use cases applied to manufacturing motion data.

* Keywords: motion and time study, motion sensors, Riemannian manifold, probability distribution on manifold, temporal evolution of probability distributions 

  Access Paper or Ask Questions

Interpretable Contextual Team-aware Item Recommendation: Application in Multiplayer Online Battle Arena Games

Jul 30, 2020
Andrés Villa, Vladimir Araujo, Francisca Cattan, Denis Parra

The video game industry has adopted recommendation systems to boost users interest with a focus on game sales. Other exciting applications within video games are those that help the player make decisions that would maximize their playing experience, which is a desirable feature in real-time strategy video games such as Multiplayer Online Battle Arena (MOBA) like as DotA and LoL. Among these tasks, the recommendation of items is challenging, given both the contextual nature of the game and how it exposes the dependence on the formation of each team. Existing works on this topic do not take advantage of all the available contextual match data and dismiss potentially valuable information. To address this problem we develop TTIR, a contextual recommender model derived from the Transformer neural architecture that suggests a set of items to every team member, based on the contexts of teams and roles that describe the match. TTIR outperforms several approaches and provides interpretable recommendations through visualization of attention weights. Our evaluation indicates that both the Transformer architecture and the contextual information are essential to get the best results for this item recommendation task. Furthermore, a preliminary user survey indicates the usefulness of attention weights for explaining recommendations as well as ideas for future work. The code and dataset are available at:

  Access Paper or Ask Questions

Exploiting Contextual Information with Deep Neural Networks

Jun 27, 2020
Ismail Elezi

Context matters! Nevertheless, there has not been much research in exploiting contextual information in deep neural networks. For most part, the entire usage of contextual information has been limited to recurrent neural networks. Attention models and capsule networks are two recent ways of introducing contextual information in non-recurrent models, however both of these algorithms have been developed after this work has started. In this thesis, we show that contextual information can be exploited in 2 fundamentally different ways: implicitly and explicitly. In the DeepScore project, where the usage of context is very important for the recognition of many tiny objects, we show that by carefully crafting convolutional architectures, we can achieve state-of-the-art results, while also being able to implicitly correctly distinguish between objects which are virtually identical, but have different meanings based on their surrounding. In parallel, we show that by explicitly designing algorithms (motivated from graph theory and game theory) that take into considerations the entire structure of the dataset, we can achieve state-of-the-art results in different topics like semi-supervised learning and similarity learning. To the best of our knowledge, we are the first to integrate graph-theoretical modules, carefully crafted for the problem of similarity learning and that are designed to consider contextual information, not only outperforming the other models, but also gaining a speed improvement while using a smaller number of parameters.

* Ph.D. thesis 

  Access Paper or Ask Questions