How to measure the degree of uncertainty of a given frame of discernment has been a hot topic for years. A lot of meaningful works have provided some effective methods to measure the degree properly. However, a crucial factor, sequence of propositions, is missing in the definition of traditional frame of discernment. In this paper, a detailed definition of ordinal frame of discernment has been provided. Besides, an innovative method utilizing a concept of computer vision to combine the order of propositions and the mass of them is proposed to better manifest relationships between the two important element of the frame of discernment. More than that, a specially designed method covering some powerful tools in indicating the degree of uncertainty of a traditional frame of discernment is also offered to give an indicator of level of uncertainty of an ordinal frame of discernment on the level of vector.
Paraphrasing is often performed with less concern for controlled style conversion. Especially for questions and commands, style-variant paraphrasing can be crucial in tone and manner, which also matters with industrial applications such as dialog system. In this paper, we attack this issue with a corpus construction scheme that simultaneously considers the core content and style of directives, namely intent and formality, for the Korean language. Utilizing manually generated natural language queries on six daily topics, we expand the corpus to formal and informal sentences by human rewriting and transferring. We verify the validity and industrial applicability of our approach by checking the adequate classification and inference performance that fit with the fine-tuning approaches, at the same time proposing a supervised formality transfer task.
Emotion and expressivity in music have been topics of considerable interest in the field of music information retrieval. In recent years, mid-level perceptual features have been suggested as means to explain computational predictions of musical emotion. We find that the diversity of musical styles and genres in the available dataset for learning these features is not sufficient for models to generalise well to specialised acoustic domains such as solo piano music. In this work, we show that by utilising unsupervised domain adaptation together with receptive-field regularised deep neural networks, it is possible to significantly improve generalisation to this domain. Additionally, we demonstrate that our domain-adapted models can better predict and explain expressive qualities in classical piano performances, as perceived and described by human listeners.
Missing value imputation is a challenging and well-researched topic in data mining. In this paper, we propose IFGAN, a missing value imputation algorithm based on Feature-specific Generative Adversarial Networks (GAN). Our idea is intuitive yet effective: a feature-specific generator is trained to impute missing values, while a discriminator is expected to distinguish the imputed values from observed ones. The proposed architecture is capable of handling different data types, data distributions, missing mechanisms, and missing rates. It also improves post-imputation analysis by preserving inter-feature correlations. We empirically show on several real-life datasets that IFGAN outperforms current state-of-the-art algorithm under various missing conditions.
Patient similarity analysis is important in health care applications. It takes patient information such as their electronic medical records and genetic data as input and computes the pairwise similarity between patients. Procedures of typical a patient similarity study can be divided into several steps including data integration, similarity measurement, and neighborhood identification. And according to an analysis of patient similarity, doctors can easily find the most suitable treatments. There are many methods to analyze the similarity such as cluster analysis. And during machine learning become more and more popular, Using neural networks such as CNN is a new hot topic. This review summarizes representative methods used in each step and discusses applications of patient similarity networks especially in the context of precision medicine.
Podcast summarization is different from summarization of other data formats, such as news, patents, and scientific papers in that podcasts are often longer, conversational, colloquial, and full of sponsorship and advertising information, which imposes great challenges for existing models. In this paper, we focus on abstractive podcast summarization and propose a two-phase approach: sentence selection and seq2seq learning. Specifically, we first select important sentences from the noisy long podcast transcripts. The selection is based on sentence similarity to the reference to reduce the redundancy and the associated latent topics to preserve semantics. Then the selected sentences are fed into a pre-trained encoder-decoder framework for the summary generation. Our approach achieves promising results regarding both ROUGE-based measures and human evaluations.
Text classification is a critical research topic with broad applications in natural language processing. Recently, graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task. Despite the success, their performance could be largely jeopardized in practice since they are: (1) unable to capture high-order interaction between words; (2) inefficient to handle large datasets and new documents. To address those issues, in this paper, we propose a principled model -- hypergraph attention networks (HyperGAT), which can obtain more expressive power with less computational consumption for text representation learning. Extensive experiments on various benchmark datasets demonstrate the efficacy of the proposed approach on the text classification task.
Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning. Our work is twofold: firstly we demonstrate via human evaluation that classifiers trained to discriminate between human and machine-generated text emerge as unsupervised predictors of "page quality", able to detect low quality content without any training. This enables fast bootstrapping of quality indicators in a low-resource setting. Secondly, curious to understand the prevalence and nature of low quality pages in the wild, we conduct extensive qualitative and quantitative analysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.
The massive amount of misinformation spreading on the Internet on a daily basis has enormous negative impacts on societies. Therefore, we need automated systems helping fact-checkers in the combat against misinformation. In this paper, we propose a model prioritizing the claims based on their check-worthiness. We use BERT model with additional features including domain-specific controversial topics, word embeddings, and others. In our experiments, we show that our proposed model outperforms all state-of-the-art models in both test collections of CLEF Check That! Lab in 2018 and 2019. We also conduct a qualitative analysis to shed light-detecting check-worthy claims. We suggest requesting rationales behind judgments are needed to understand subjective nature of the task and problematic labels.
Recently, concerns regarding potential biases in the underlying algorithms of many automated systems (including biometrics) have been raised. In this context, a biased algorithm produces statistically different outcomes for different groups of individuals based on certain (often protected by anti-discrimination legislation) attributes such as sex and age. While several preliminary studies investigating this matter for facial recognition algorithms do exist, said topic has not yet been addressed for vascular biometric characteristics. Accordingly, in this paper, several popular types of recognition algorithms are benchmarked to ascertain the matter for fingervein recognition. The experimental evaluation suggests lack of bias for the tested algorithms, although future works with larger datasets are needed to validate and confirm those preliminary results.