After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop appropriate technologies to efficiently search, retrieve and explore information from this 'big data of the past'. Among semantic indexing opportunities, the recognition and classification of named entities are in great demand among humanities scholars. Yet, named entity recognition (NER) systems are heavily challenged with diverse, historical and noisy inputs. In this survey, we present the array of challenges posed by historical documents to NER, inventory existing resources, describe the main approaches deployed so far, and identify key priorities for future developments.
Recently, many evolutionary computation methods have been developed to solve the feature selection problem. However, the studies focused mainly on small-scale issues, resulting in stagnation issues in local optima and numerical instability when dealing with large-scale feature selection dilemmas. To address these challenges, this paper proposes a novel weighted differential evolution algorithm based on self-adaptive mechanism, named SaWDE, to solve large-scale feature selection. First, a multi-population mechanism is adopted to enhance the diversity of the population. Then, we propose a new self-adaptive mechanism that selects several strategies from a strategy pool to capture the diverse characteristics of the datasets from the historical information. Finally, a weighted model is designed to identify the important features, which enables our model to generate the most suitable feature-selection solution. We demonstrate the effectiveness of our algorithm on twelve large-scale datasets. The performance of SaWDE is superior compared to six non-EC algorithms and six other EC algorithms, on both training and test datasets and on subset size, indicating that our algorithm is a favorable tool to solve the large-scale feature selection problem. Moreover, we have experimented SaWDE with six EC algorithms on twelve higher-dimensional data, which demonstrates that SaWDE is more robust and efficient compared to those state-of-the-art methods. SaWDE source code is available on Github at https://github.com/wangxb96/SaWDE.
This paper proposes an artificial neural network to determine orientation using polarized skylight. This neural network has specific dilated convolution, which can extract light intensity information of different polarization directions. Then, the degree of polarization (DOP) and angle of polarization (AOP) are directly extracted in the network. In addition, the exponential function encoding of orientation is designed as the network output, which can better reflect the insect's encoding of polarization information, and improve the accuracy of orientation determination. Finally, training and testing were conducted on a public polarized skylight navigation dataset, and the experimental results proved the stability and effectiveness of the network.
As unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. The question is how to effectively make use of such data. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM. Given two views of a text sample via weak and strong augmentation techniques, SFLM generates a pseudo label on the weakly augmented version. Then, the model predicts the same pseudo label when fine-tuned with the strongly augmented version. This simple approach is shown to outperform other state-of-the-art supervised and semi-supervised counterparts on six sentence classification and six sentence-pair classification benchmarking tasks. In addition, SFLM only relies on a few in-domain unlabeled data. We conduct a comprehensive analysis to demonstrate the robustness of our proposed approach under various settings, including augmentation techniques, model scale, and few-shot knowledge transfer across tasks.
Multi-modal information is essential to describe what has happened in a video. In this work, we represent videos by various appearance, motion and audio information guided with video topic. By following multi-stage training strategy, our experiments show steady and significant improvement on the VATEX benchmark. This report presents an overview and comparative analysis of our system designed for both Chinese and English tracks on VATEX Captioning Challenge 2019.
Accurate traffic state prediction is the foundation of transportation control and guidance. It is very challenging due to the complex spatiotemporal dependencies in traffic data. Existing works cannot perform well for multi-step traffic prediction that involves long future time period. The spatiotemporal information dilution becomes serve when the time gap between input step and predicted step is large, especially when traffic data is not sufficient or noisy. To address this issue, we propose a multi-spatial graph convolution based Seq2Seq model. Our main novelties are three aspects: (1) We enrich the spatiotemporal information of model inputs by fusing multi-view features (time, location and traffic states) (2) We build multiple kinds of spatial correlations based on both prior knowledge and data-driven knowledge to improve model performance especially in insufficient or noisy data cases. (3) A spatiotemporal attention mechanism based on reachability knowledge is novelly designed to produce high-level features fed into decoder of Seq2Seq directly to ease information dilution. Our model is evaluated on two real world traffic datasets and achieves better performance than other competitors.
3D to 2D retinal vessel segmentation is a challenging problem in Optical Coherence Tomography Angiography (OCTA) images. Accurate retinal vessel segmentation is important for the diagnosis and prevention of ophthalmic diseases. However, making full use of the 3D data of OCTA volumes is a vital factor for obtaining satisfactory segmentation results. In this paper, we propose a Progressive Attention-Enhanced Network (PAENet) based on attention mechanisms to extract rich feature representation. Specifically, the framework consists of two main parts, the three-dimensional feature learning path and the two-dimensional segmentation path. In the three-dimensional feature learning path, we design a novel Adaptive Pooling Module (APM) and propose a new Quadruple Attention Module (QAM). The APM captures dependencies along the projection direction of volumes and learns a series of pooling coefficients for feature fusion, which efficiently reduces feature dimension. In addition, the QAM reweights the features by capturing four-group cross-dimension dependencies, which makes maximum use of 4D feature tensors. In the two-dimensional segmentation path, to acquire more detailed information, we propose a Feature Fusion Module (FFM) to inject 3D information into the 2D path. Meanwhile, we adopt the Polarized Self-Attention (PSA) block to model the semantic interdependencies in spatial and channel dimensions respectively. Experimentally, our extensive experiments on the OCTA-500 dataset show that our proposed algorithm achieves state-of-the-art performance compared with previous methods.
Transit timing variations (TTVs) can provide useful information for systems observed by transit, as they allow us to put constraints on the masses and eccentricities of the observed planets, or even to constrain the existence of non-transiting companions. However, TTVs can also act as a detection bias that can prevent the detection of small planets in transit surveys that would otherwise be detected by standard algorithms such as the Boxed Least Square algorithm (BLS) if their orbit was not perturbed. This bias is especially present for surveys with a long baseline, such as Kepler, some of the TESS sectors, and the upcoming PLATO mission. Here we introduce a detection method that is robust to large TTVs, and illustrate its use by recovering and confirming a pair of resonant super-Earths with ten-hour TTVs around Kepler-1705. The method is based on a neural network trained to recover the tracks of low-signal-to-noise-ratio(S/N) perturbed planets in river diagrams. We recover the transit parameters of these candidates by fitting the light curve. The individual transit S/N of Kepler-1705b and c are about three times lower than all the previously known planets with TTVs of 3 hours or more, pushing the boundaries in the recovery of these small, dynamically active planets. Recovering this type of object is essential for obtaining a complete picture of the observed planetary systems, and solving for a bias not often taken into account in statistical studies of exoplanet populations. In addition, TTVs are a means of obtaining mass estimates which can be essential for studying the internal structure of planets discovered by transit surveys. Finally, we show that due to the strong orbital perturbations, it is possible that the spin of the outer resonant planet of Kepler-1705 is trapped in a sub- or super-synchronous spin-orbit resonance.
The Dice similarity coefficient (DSC) is both a widely used metric and loss function for biomedical image segmentation due to its robustness to class imbalance. However, it is well known that the DSC loss is poorly calibrated, resulting in overconfident predictions that cannot be usefully interpreted in biomedical and clinical practice. Performance is often the only metric used to evaluate segmentations produced by deep neural networks, and calibration is often neglected. However, calibration is important for translation into biomedical and clinical practice, providing crucial contextual information to model predictions for interpretation by scientists and clinicians. In this study, we identify poor calibration as an emerging challenge of deep learning based biomedical image segmentation. We provide a simple yet effective extension of the DSC loss, named the DSC++ loss, that selectively modulates the penalty associated with overconfident, incorrect predictions. As a standalone loss function, the DSC++ loss achieves significantly improved calibration over the conventional DSC loss across five well-validated open-source biomedical imaging datasets. Similarly, we observe significantly improved when integrating the DSC++ loss into four DSC-based loss functions. Finally, we use softmax thresholding to illustrate that well calibrated outputs enable tailoring of precision-recall bias, an important post-processing technique to adapt the model predictions to suit the biomedical or clinical task. The DSC++ loss overcomes the major limitation of the DSC, providing a suitable loss function for training deep learning segmentation models for use in biomedical and clinical practice.
Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks. The self-attention mechanism is position agnostic. In order to capture positional ordering information, various flavors of absolute and relative position embeddings have been proposed. However, there is no systematic analysis on their contributions and a comprehensive comparison of these methods is missing in the literature. In this paper, we review major existing position embedding methods and compare their accuracy on downstream NLP tasks, using our own implementations. We also propose a novel multiplicative embedding method which leads to superior accuracy when compared to existing methods. Finally, we show that our proposed embedding method, served as a drop-in replacement of the default absolute position embedding, can improve the RoBERTa-base and RoBERTa-large models on SQuAD1.1 and SQuAD2.0 datasets.