Computational cytology is a critical, rapid-developing, yet challenging topic in the field of medical image computing which analyzes the digitized cytology image by computer-aided technologies for cancer screening. Recently, an increasing number of deep learning (DL) algorithms have made significant progress in medical image analysis, leading to the boosting publications of cytological studies. To investigate the advanced methods and comprehensive applications, we survey more than 120 publications of DL-based cytology image analysis in this article. We first introduce various deep learning methods, including fully supervised, weakly supervised, unsupervised, and transfer learning. Then, we systematically summarize the public datasets, evaluation metrics, versatile cytology image analysis applications including classification, detection, segmentation, and other related tasks. Finally, we discuss current challenges and potential research directions of computational cytology.
Principal Component Analysis (PCA) and other multi-variate models are often used in the analysis of "omics" data. These models contain much information which is currently neither easily accessible nor interpretable. Here we present an algorithmic method which has been developed to integrate this information with existing databases of background knowledge, stored in the form of known sets (for instance genesets or pathways). To make this accessible we have produced a Graphical User Interface (GUI) in Matlab which allows the overlay of known set information onto the loadings plot and thus improves the interpretability of the multi-variate model. For each known set the optimal convex hull, covering a subset of elements from the known set, is found through a search algorithm and displayed. In this paper we discuss two main topics; the details of the search algorithm for the optimal convex hull for this problem and the GUI interface which is freely available for download for academic use.
Large-scale language models are rapidly improving, performing well on a wide variety of tasks with little to no customization. In this work we investigate how language models can support science writing, a challenging writing task that is both open-ended and highly constrained. We present a system for generating "sparks", sentences related to a scientific concept intended to inspire writers. We find that our sparks are more coherent and diverse than a competitive language model baseline, and approach a human-created gold standard. In a study with 13 PhD students writing on topics of their own selection, we find three main use cases of sparks: aiding with crafting detailed sentences, providing interesting angles to engage readers, and demonstrating common reader perspectives. We also report on the various reasons sparks were considered unhelpful, and discuss how we might improve language models as writing support tools.
The Human-Machine Interaction (HMI) research field is an important topic in machine learning that has been deeply investigated thanks to the rise of computing power in the last years. The first time, it is possible to use machine learning to classify images and/or videos instead of the traditional computer vision algorithms. The aim of this project is to builda symbiosis between a convolutional neural network (CNN) and a recurrent neural network (RNN)  to recognize cultural/anthropological Italian sign language gestures from videos. The CNN extracts important features that later areused by the RNN. With RNNs we are able to store temporal information inside the model to provide contextual information from previous frames to enhance the prediction accuracy. Our novel approach uses different data augmentation techniquesand regularization methods from only RGB frames to avoid overfitting and provide a small generalization error.
Named entity recognition (NER) is usually developed and tested on text from well-written sources. However, in intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error. In applications, entity labels may change frequently, and non-textual properties like topicality or popularity may be needed to choose among alternatives. We describe a NER system intended to address these problems. We test and train this system on a proprietary user-derived dataset. We compare with a baseline text-only NER system; the baseline enhanced with external gazetteers; and the baseline enhanced with the search and indirect labelling techniques we describe below. The final configuration gives around 6% reduction in NER error rate. We also show that this technique improves related tasks, such as semantic parsing, with an improvement of up to 5% in error rate.
Offline Chinese handwriting text recognition is a long-standing research topic in the field of pattern recognition. In previous studies, text detection and recognition are separated, which leads to the fact that text recognition is highly dependent on the detection results. In this paper, we propose a robust end-to-end Chinese text page spotter framework. It unifies text detection and text recognition with text kernel that integrates global text feature information to optimize the recognition from multiple scales, which reduces the dependence of detection and improves the robustness of the system. Our method achieves state-of-the-art results on the CASIA-HWDB2.0-2.2 dataset and ICDAR-2013 competition dataset. Without any language model, the correct rates are 99.12% and 94.27% for line-level recognition, and 99.03% and 94.20% for page-level recognition, respectively.
Gradient-descent based iterative algorithms pervade a variety of problems in estimation, prediction, learning, control, and optimization. Recently iterative algorithms based on higher-order information have been explored in an attempt to lead to accelerated learning. In this paper, we explore a specific a high-order tuner that has been shown to result in stability with time-varying regressors in linearly parametrized systems, and accelerated convergence with constant regressors. We show that this tuner continues to provide bounded parameter estimates even if the gradients are corrupted by noise. Additionally, we also show that the parameter estimates converge exponentially to a compact set whose size is dependent on noise statistics. As the HT algorithms can be applied to a wide range of problems in estimation, filtering, control, and machine learning, the result obtained in this paper represents an important extension to the topic of real-time and fast decision making.
Over the past decade, unprecedented progress in the development of neural networks influenced dozens of different industries, including weed recognition in the agro-industrial sector. The use of neural networks in agro-industrial activity in the task of recognizing cultivated crops is a new direction. The absence of any standards significantly complicates the understanding of the real situation of the use of the neural network in the agricultural sector. The manuscript presents the complete analysis of researches over the past 10 years on the use of neural networks for the classification and tracking of weeds due to neural networks. In particular, the analysis of the results of using various neural network algorithms for the task of classification and tracking was presented. As a result, we presented the recommendation for the use of neural networks in the tasks of recognizing a cultivated object and weeds. Using this standard can significantly improve the quality of research on this topic and simplify the analysis and understanding of any paper.
The novel coronavirus (SARS-CoV-2) which causes COVID-19 is an ongoing pandemic. There are ongoing studies with up to hundreds of publications uploaded to databases daily. We are exploring the use-case of artificial intelligence and natural language processing in order to efficiently sort through these publications. We demonstrate that clinical trial information, preclinical studies, and a general topic model can be used as text mining data intelligence tools for scientists all over the world to use as a resource for their own research. To evaluate our method, several metrics are used to measure the information extraction and clustering results. In addition, we demonstrate that our workflow not only have a use-case for COVID-19, but for other disease areas as well. Overall, our system aims to allow scientists to more efficiently research coronavirus. Our automatically updating modules are available on our information portal at https://ghddi-ailab.github.io/Targeting2019-nCoV/ for public viewing.