



Abstract:Reliable data-driven estimation of Shannon entropy from small data sets, where the number of examples is potentially smaller than the number of possible outcomes, is a critical matter in several applications. In this paper, we introduce a discrete entropy estimator, where we use the decomposability property in combination with estimations of the missing mass and the number of unseen outcomes to compensate for the negative bias induced by them. Experimental results show that the proposed method outperforms some classical estimators in undersampled regimes, and performs comparably with some well-established state-of-the-art estimators.




Abstract:An approach is proposed to quantify, in bits of information, the actual relevance of analogies in analogy tests. The main component of this approach is a softaccuracy estimator that also yields entropy estimates with compensated biases. Experimental results obtained with pre-trained GloVe 300-D vectors and two public analogy test sets show that proximity hints are much more relevant than analogies in analogy tests, from an information content perspective. Accordingly, a simple word embedding model is used to predict that analogies carry about one bit of information, which is experimentally corroborated.




Abstract:Intrinsic dimension and differential entropy estimators are studied in this paper, including their systematic bias. A pragmatic approach for joint estimation and bias correction of these two fundamental measures is proposed. Shared steps on both estimators are highlighted, along with their useful consequences to data analysis. It is shown that both estimators can be complementary parts of a single approach, and that the simultaneous estimation of differential entropy and intrinsic dimension give meaning to each other, where estimates at different observation scales convey different perspectives of underlying manifolds. Experiments with synthetic and real datasets are presented to illustrate how to extract meaning from visual inspections, and how to compensate for biases.




Abstract:A method for offline signature verification is presented in this paper. It is based on the segmentation of the signature skeleton (through standard image skeletonization) into unambiguous sequences of points, or unambiguously connected skeleton segments corresponding to vectorial representations of signature portions. These segments are assumed to be the fundamental carriers of useful information for authenticity verification, and are compactly encoded as sets of 9 scalars (4 sampled coordinates and 1 length measure). Thus signature authenticity is inferred through Euclidean distance based comparisons between pairs of such compact representations. The average performance of this method is evaluated through experiments with offline versions of signatures from the MCYT-100 database. For comparison purposes, three other approaches are applied to the same set of signatures, namely: (1) a straightforward approach based on Dynamic Time Warping distances between segments, (2) a published method by [shanker2007], also based on DTW, and (3) the average human performance under equivalent experimental protocol. Results suggest that if human performance is taken as a goal for automatic verification, then we should discard signature shape details to approach this goal. Moreover, our best result -- close to human performance -- was obtained by the simplest strategy, where equal weights were given to segment shape and length.