Homophony's widespread presence in natural languages is a controversial topic. Recent theories of language optimality have tried to justify its prevalence, despite its negative effects on cognitive processing time; e.g., Piantadosi et al. (2012) argued homophony enables the reuse of efficient wordforms and is thus beneficial for languages. This hypothesis has recently been challenged by Trott and Bergen (2020), who posit that good wordforms are more often homophonous simply because they are more phonotactically probable. In this paper, we join in on the debate. We first propose a new information-theoretic quantification of a language's homophony: the sample R\'enyi entropy. Then, we use this quantification to revisit Trott and Bergen's claims. While their point is theoretically sound, a specific methodological issue in their experiments raises doubts about their results. After addressing this issue, we find no clear pressure either towards or against homophony -- a much more nuanced result than either Piantadosi et al.'s or Trott and Bergen's findings.
Ethics in AI becomes a global topic of interest for both policymakers and academic researchers. In the last few years, various research organizations, lawyers, think tankers and regulatory bodies get involved in developing AI ethics guidelines and principles. However, there is still debate about the implications of these principles. We conducted a systematic literature review (SLR) study to investigate the agreement on the significance of AI principles and identify the challenging factors that could negatively impact the adoption of AI ethics principles. The results reveal that the global convergence set consists of 22 ethical principles and 15 challenges. Transparency, privacy, accountability and fairness are identified as the most common AI ethics principles. Similarly, lack of ethical knowledge and vague principles are reported as the significant challenges for considering ethics in AI. The findings of this study are the preliminary inputs for proposing a maturity model that assess the ethical capabilities of AI systems and provide best practices for further improvements.
In recent years, many design automation methods have been developed to routinely create approximate implementations of circuits and programs that show excellent trade-offs between the quality of output and required resources. This paper deals with evolutionary approximation as one of the popular approximation methods. The paper provides the first survey of evolutionary algorithm (EA)-based approaches applied in the context of approximate computing. The survey reveals that EAs are primarily applied as multi-objective optimizers. We propose to divide these approaches into two main classes: (i) parameter optimization in which the EA optimizes a vector of system parameters, and (ii) synthesis and optimization in which EA is responsible for determining the architecture and parameters of the resulting system. The evolutionary approximation has been applied at all levels of design abstraction and in many different applications. The neural architecture search enabling the automated hardware-aware design of approximate deep neural networks was identified as a newly emerging topic in this area.
Electronic health records (EHRs), digital collections of patient healthcare events and observations, are ubiquitous in medicine and critical to healthcare delivery, operations, and research. Despite this central role, EHRs are notoriously difficult to process automatically. Well over half of the information stored within EHRs is in the form of unstructured text (e.g. provider notes, operation reports) and remains largely untapped for secondary use. Recently, however, newer neural network and deep learning approaches to Natural Language Processing (NLP) have made considerable advances, outperforming traditional statistical and rule-based systems on a variety of tasks. In this survey paper, we summarize current neural NLP methods for EHR applications. We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics such as question answering, phenotyping, knowledge graphs, medical dialogue, multilinguality, interpretability, etc.
The area of computer vision is one of the most discussed topics amongst many scholars, and stereo matching is its most important sub fields. After the parallax map is transformed into a depth map, it can be applied to many intelligent fields. In this paper, a stereo matching algorithm based on visual sensitive information is proposed by using standard images from Middlebury dataset. Aiming at the limitation of traditional stereo matching algorithms regarding the cost window, a cost aggregation algorithm based on the dynamic window is proposed, and the disparity image is optimized by using left and right consistency detection to further reduce the error matching rate. The experimental results show that the proposed algorithm can effectively enhance the stereo matching effect of the image providing significant improvement in accuracy as compared with the classical census algorithm. The proposed model code, dataset, and experimental results are available at https://github.com/WangHewei16/Stereo-Matching.
Soccer broadcast video understanding has been drawing a lot of attention in recent years within data scientists and industrial companies. This is mainly due to the lucrative potential unlocked by effective deep learning techniques developed in the field of computer vision. In this work, we focus on the topic of camera calibration and on its current limitations for the scientific community. More precisely, we tackle the absence of a large-scale calibration dataset and of a public calibration network trained on such a dataset. Specifically, we distill a powerful commercial calibration tool in a recent neural network architecture on the large-scale SoccerNet dataset, composed of untrimmed broadcast videos of 500 soccer games. We further release our distilled network, and leverage it to provide 3 ways of representing the calibration results along with player localization. Finally, we exploit those representations within the current best architecture for the action spotting task of SoccerNet-v2, and achieve new state-of-the-art performances.
We consider the problem of learning to simplify medical texts. This is important because most reliable, up-to-date information in biomedicine is dense with jargon and thus practically inaccessible to the lay audience. Furthermore, manual simplification does not scale to the rapidly growing body of biomedical literature, motivating the need for automated approaches. Unfortunately, there are no large-scale resources available for this task. In this work we introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics. We then propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts. We show that this automated measure better differentiates between technical and lay summaries than existing heuristics. We introduce and evaluate baseline encoder-decoder Transformer models for simplification and propose a novel augmentation to these in which we explicitly penalize the decoder for producing "jargon" terms; we find that this yields improvements over baselines in terms of readability.
Instance segmentation is an active topic in computer vision that is usually solved by using supervised learning approaches over very large datasets composed of object level masks. Obtaining such a dataset for any new domain can be very expensive and time-consuming. In addition, models trained on certain annotated categories do not generalize well to unseen objects. The goal of this paper is to propose a method that can perform unsupervised discovery of long-tail categories in instance segmentation, through learning instance embeddings of masked regions. Leveraging rich relationship and hierarchical structure between objects in the images, we propose self-supervised losses for learning mask embeddings. Trained on COCO dataset without additional annotations of the long-tail objects, our model is able to discover novel and more fine-grained objects than the common categories in COCO. We show that the model achieves competitive quantitative results on LVIS as compared to the supervised and partially supervised methods.
Detecting facial forgery images and videos is an increasingly important topic in multimedia forensics. As forgery images and videos are usually compressed to different formats such as JPEG and H264 when circulating on the Internet, existing forgery-detection methods trained on uncompressed data often have significantly decreased performance in identifying them. To solve this problem, we propose a novel anti-compression facial forgery detection framework, which learns a compression-insensitive embedding feature space utilizing both original and compressed forgeries. Specifically, our approach consists of two novel ideas: (i) extracting compression-insensitive features from both uncompressed and compressed forgeries using an adversarial learning strategy; (ii) learning a robust partition by constructing a metric loss that can reduce the distance of the paired original and compressed images in the embedding space. Experimental results demonstrate that, the proposed method is highly effective in handling both compressed and uncompressed facial forgery images.
Neural architecture search (NAS) is a hot topic in the field of AutoML, and has begun to outperform human-designed architectures on many machine learning tasks. Motivated by the natural representation form of neural networks by the Cartesian genetic programming (CGP), we propose an evolutionary approach of NAS based on CGP, called CPGNAS, for CNN architectures solving sentence classification task. To evolve the CNN architectures under the framework of CGP, the existing key operations are identified as the types of function nodes of CGP and the evolutionary operations are designed based on evolutionary strategy (ES). The experimental results show that the searched architecture can reach the accuracy of human-designed architectures. The ablation tests identify the Attention function as the single key function node and the Convolution and Attention as the joint key function nodes. However, the linear transformations along could keep the accuracy of evolved architectures over 70%, which is worth of investigating in the future.