Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making. Considering that manual diagnosis could be error-prone and time-consuming, many intelligent approaches based on clinical text mining have been proposed to perform automatic diagnosis. However, these methods may not achieve satisfactory results due to the following challenges. First, most of the diagnosis codes are rare, and the distribution is extremely unbalanced. Second, existing methods are challenging to capture the correlation between diagnosis codes. Third, the lengthy clinical note leads to the excessive dispersion of key information related to codes. To tackle these challenges, we propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis. Specifically, we propose a hierarchical joint prediction strategy to address the challenge of unbalanced codes distribution. Then, we utilize graph convolutional neural networks to obtain the correlation and semantic representations of medical ontology. Furthermore, we introduce multi attention mechanisms to extract crucial information. Finally, extensive experiments on MIMIC-III dataset clearly validate the effectiveness of our method.
We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language. This is a fundamentally important routine to historians and digital humanities researchers but has never been automated. We compile a high-quality gold-standard text summarisation dataset, which consists of historical German and Chinese news from hundreds of years ago summarised in modern German or Chinese. Based on cross-lingual transfer learning techniques, we propose a summarisation model that can be trained even with no cross-lingual (historical to modern) parallel data, and further benchmark it against state-of-the-art algorithms. We report automatic and human evaluations that distinguish the historic to modern language summarisation task from standard cross-lingual summarisation (i.e., modern to modern language), highlight the distinctness and value of our dataset, and demonstrate that our transfer learning approach outperforms standard cross-lingual benchmarks on this task.
A general three-dimensional (3D) non-stationary massive multiple-input multiple-output (MIMO) geometry-based stochastic model (GBSM) for the sixth generation (6G) communication systems is proposed in the paper. The novelty of the model is that the model is designed to cover a variety of channel characteristics, including space-time-frequency (STF) non-stationarity, spherical wavefront, spatial consistency, channel hardening, etc. Firstly, the introduction of the twin-cluster channel model is given in detail. Secondly, the key statistical properties such as space-time-frequency correlation function (STFCF), space cross-correlation function (CCF), temporal autocorrelation function (ACF), frequency correlation function (FCF), and performance indicators, e.g., singular value spread (SVS), and channel capacity are derived. Finally, the simulation results are given and consistent with some measurements in relevant literatures, which validate that the proposed channel model has a certain value as a reference to model massive MIMO channel characteristics.
A new label smoothing method that makes use of prior knowledge of a language at human level, homophone, is proposed in this paper for automatic speech recognition (ASR). Compared with its forerunners, the proposed method uses pronunciation knowledge of homophones in a more complex way. End-to-end ASR models that learn acoustic model and language model jointly and modelling units of characters are necessary conditions for this method. Experiments with hybrid CTC sequence-to-sequence model show that the new method can reduce character error rate (CER) by 0.4% absolutely.
We introduce a new framework for text detection named SA-Text meaning "Simple but Accurate," which utilizes heatmaps to detect text regions in natural scene images effectively. SA-Text detects text that occurs in various fonts, shapes, and orientations in natural scene images with complicated backgrounds. Experiments on three challenging and public scene-text-detection datasets, Total-Text, SCUT-CTW1500, and MSRA-TD500 show the effectiveness and generalization ability of SA-Text in detecting not only multi-lingual oriented straight but also curved text in scripts of multiple languages. To further show the practicality of SA-Text, we combine it with a powerful state-of-the-art text recognition model and thus propose a pipeline-based text spotting system called SAA ("text spotting" is used as the technical term for "detection and recognition of text"). Our experimental results of SAA on the Total-Text dataset show that SAA outperforms four state-of-the-art text spotting frameworks by at least 9 percent points in the F-measure, which means that SA-Text can be used as a complete text detection and recognition system in real applications.
Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56 million images over a period of 20 days in March 2019. We also propose a scrape-by-keywords methodology and used it to scrape ~30,000 images and their captions of 38 Kenyan food types. We publish two datasets of 104,000 and 8,174 image/caption pairs, respectively. With the first dataset, Kenya104K, we train a Kenyan Food Classifier, called KenyanFC, to distinguish Kenyan food from non-food images posted in Kenya. We used the second dataset, KenyanFood13, to train a classifier KenyanFTR, short for Kenyan Food Type Recognizer, to recognize 13 popular food types in Kenya. The KenyanFTR is a multimodal deep neural network that can identify 13 types of Kenyan foods using both images and their corresponding captions. Experiments show that the average top-1 accuracy of KenyanFC is 99% over 10,400 tested Instagram images and of KenyanFTR is 81% over 8,174 tested data points. Ablation studies show that three of the 13 food types are particularly difficult to categorize based on image content only and that adding analysis of captions to the image analysis yields a classifier that is 9 percent points more accurate than a classifier that relies only on images. Our food trend analysis revealed that cakes and roasted meats were the most popular foods in photographs on Instagram in Kenya in March 2019.
State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word text in images of natural scenes and ignore the semantic coherence within a region of text. However, when interpreted together, seemingly isolated words may be easier to recognize. On this basis, we propose a novel "semantic-based text recognition" (STR) deep learning model that reads text in images with the help of understanding context. STR consists of several modules. We introduce the Text Grouping and Arranging (TGA) algorithm to connect and order isolated text regions. A text-recognition network interprets isolated words. Benefiting from semantic information, a sequenceto-sequence network model efficiently corrects inaccurate and uncertain phrases produced earlier in the STR pipeline. We present experiments on two new distinct datasets that contain scanned catalog images of interior designs and photographs of protesters with hand-written signs, respectively. Our results show that our STR model outperforms a baseline method that uses state-of-the-art single-wordrecognition techniques on both datasets. STR yields a high accuracy rate of 90% on the catalog images and 71% on the more difficult protest images, suggesting its generality in recognizing text.