Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

A. N. Gorban

Correction of AI systems by linear discriminants: Probabilistic foundations

Nov 11, 2018

A. N. Gorban, A. Golubkov, B. Grechuk, E. M. Mirkes, I. Y. Tyukin

Figure 1 for Correction of AI systems by linear discriminants: Probabilistic foundations

Figure 2 for Correction of AI systems by linear discriminants: Probabilistic foundations

Figure 3 for Correction of AI systems by linear discriminants: Probabilistic foundations

Figure 4 for Correction of AI systems by linear discriminants: Probabilistic foundations

Abstract:Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources involved. The important challenge is to develop fast methods to correct errors without damaging existing skills. We formulated the technical requirements to the 'ideal' correctors. Such correctors include binary classifiers, which separate the situations with high risk of errors from the situations where the AI systems work properly. Surprisingly, for essentially high-dimensional data such methods are possible: simple linear Fisher discriminant can separate the situations with errors from correctly solved tasks even for exponentially large samples. The paper presents the probabilistic basis for fast non-destructive correction of AI systems. A series of new stochastic separation theorems is proven. These theorems provide new instruments for fast non-iterative correction of errors of legacy AI systems. The new approaches become efficient in high-dimensions, for correction of high-dimensional systems in high-dimensional world (i.e. for processing of essentially high-dimensional data by large systems).

* Information Sciences 466 (2018), 303-322
* arXiv admin note: text overlap with arXiv:1809.07656 and arXiv:1802.02172

Via

Access Paper or Ask Questions

The unreasonable effectiveness of small neural ensembles in high-dimensional brain

Sep 20, 2018

A. N. Gorban, V. A. Makarov, I. Y. Tyukin

Figure 1 for The unreasonable effectiveness of small neural ensembles in high-dimensional brain

Figure 2 for The unreasonable effectiveness of small neural ensembles in high-dimensional brain

Figure 3 for The unreasonable effectiveness of small neural ensembles in high-dimensional brain

Figure 4 for The unreasonable effectiveness of small neural ensembles in high-dimensional brain

Abstract:Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular. Ensembles of non-interacting or weakly interacting simple units prove to be an effective tool for solving essentially multidimensional problems. This approach is especially useful for one-shot (non-iterative) correction of errors in large legacy artificial intelligence systems. These simplicity revolutions in the era of complexity have deep fundamental reasons grounded in geometry of multidimensional data spaces. To explore and understand these reasons we revisit the background ideas of statistical physics. In the course of the 20th century they were developed into the concentration of measure theory. New stochastic separation theorems reveal the fine structure of the data clouds. We review and analyse biological, physical, and mathematical problems at the core of the fundamental question: how can high-dimensional brain organise reliable and fast learning in high-dimensional world of data by simple tools? Two critical applications are reviewed to exemplify the approach: one-shot correction of errors in intellectual systems and emergence of static and associative memories in ensembles of single neurons.

* Review paper, accepted in Physics of Life Reviews

Via

Access Paper or Ask Questions

How deep should be the depth of convolutional neural networks: a backyard dog case study

May 03, 2018

A. N. Gorban, E. M. Mirkes, I. Y. Tukin

Figure 1 for How deep should be the depth of convolutional neural networks: a backyard dog case study

Figure 2 for How deep should be the depth of convolutional neural networks: a backyard dog case study

Figure 3 for How deep should be the depth of convolutional neural networks: a backyard dog case study

Figure 4 for How deep should be the depth of convolutional neural networks: a backyard dog case study

Abstract:We present a straightforward non-iterative method for shallowing of deep Convolutional Neural Network (CNN) by combination of several layers of CNNs with Advanced Supervised Principal Component Analysis (ASPCA) of their outputs. We tested this new method on a practically important case of `friend-or-foe' face recognition. This is the backyard dog problem: the dog should (i) distinguish the members of the family from possible strangers and (ii) identify the members of the family. Our experiments revealed that the method is capable of drastically reducing the depth of deep learning CNNs, albeit at the cost of mild performance deterioration.

Via

Access Paper or Ask Questions

Blessing of dimensionality: mathematical foundations of the statistical physics of data

Jan 10, 2018

A. N. Gorban, I. Y. Tyukin

Figure 1 for Blessing of dimensionality: mathematical foundations of the statistical physics of data

Figure 2 for Blessing of dimensionality: mathematical foundations of the statistical physics of data

Figure 3 for Blessing of dimensionality: mathematical foundations of the statistical physics of data

Figure 4 for Blessing of dimensionality: mathematical foundations of the statistical physics of data

Abstract:The concentration of measure phenomena were discovered as the mathematical background of statistical mechanics at the end of the XIX - beginning of the XX century and were then explored in mathematics of the XX-XXI centuries. At the beginning of the XXI century, it became clear that the proper utilisation of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality. This paper summarises recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher's discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us by such classifiers and a non-iterative (one-shot) procedure for learning.

* Phil. Trans. R. Soc. A volume 376, issue 2118, 376 20170237, 2018
* Accepted for publication in Philosophical Transactions of the Royal Society A, 2018. Comprises of 17 pages and 4 figures

Via

Access Paper or Ask Questions

Stochastic Separation Theorems

Aug 03, 2017

A. N. Gorban, I. Y. Tyukin

Figure 1 for Stochastic Separation Theorems

Figure 2 for Stochastic Separation Theorems

Abstract:The problem of non-iterative one-shot and non-destructive correction of unavoidable mistakes arises in all Artificial Intelligence applications in the real world. Its solution requires robust separation of samples with errors from samples where the system works properly. We demonstrate that in (moderately) high dimension this separation could be achieved with probability close to one by linear discriminants. Surprisingly, separation of a new image from a very large set of known images is almost always possible even in moderately high dimensions by linear functionals, and coefficients of these functionals can be found explicitly. Based on fundamental properties of measure concentration, we show that for $M<a\exp(b{n})$ random $M$-element sets in $\mathbb{R}^n$ are linearly separable with probability $p$, $p>1-\vartheta$, where $1>\vartheta>0$ is a given small constant. Exact values of $a,b>0$ depend on the probability distribution that determines how the random $M$-element sets are drawn, and on the constant $\vartheta$. These {\em stochastic separation theorems} provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. Theoretical statements are illustrated with numerical examples.

* Neural Networks 94 (2017), 255-259
* 6 pages, accepted for publication in Neural Networks (Letter section)

Via

Access Paper or Ask Questions

Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning

Aug 21, 2016

A. N. Gorban, E. M. Mirkes, A. Zinovyev

Figure 1 for Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning

Figure 2 for Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning

Figure 3 for Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning

Figure 4 for Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning

Abstract:Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on $L_1$ norm or even sub-linear potentials corresponding to quasinorms $L_p$ ($0<p<1$). The back side of these approaches is increase in computational cost for optimization. Till so far, no approaches have been suggested to deal with {\it arbitrary} error functionals, in a flexible and computationally efficient framework. In this paper, we develop a theory and basic universal data approximation algorithms ($k$-means, principal components, principal manifolds and graphs, regularized and sparse regression), based on piece-wise quadratic error potentials of subquadratic growth (PQSQ potentials). We develop a new and universal framework to minimize {\it arbitrary sub-quadratic error potentials} using an algorithm with guaranteed fast convergence to the local or global error minimum. The theory of PQSQ potentials is based on the notion of the cone of minorant functions, and represents a natural approximation formalism based on the application of min-plus algebra. The approach can be applied in most of existing machine learning methods, including methods of data approximation and regularized and sparse regression, leading to the improvement in the computational cost/accuracy trade-off. We demonstrate that on synthetic and real-life datasets PQSQ-based machine learning methods achieve orders of magnitude faster computational performance than the corresponding state-of-the-art methods.

* Neural Networks, Volume 84, December 2016, 28-38
* Edited and extended version with algortihms of regularized regression

Via

Access Paper or Ask Questions

Nonlinear Quality of Life Index

Jul 24, 2014

A. Zinovyev, A. N. Gorban

Figure 1 for Nonlinear Quality of Life Index

Figure 2 for Nonlinear Quality of Life Index

Abstract:We present details of the analysis of the nonlinear quality of life index for 171 countries. This index is based on four indicators: GDP per capita by Purchasing Power Parities, Life expectancy at birth, Infant mortality rate, and Tuberculosis incidence. We analyze the structure of the data in order to find the optimal and independent on expert's opinion way to map several numerical indicators from a multidimensional space onto the one-dimensional space of the quality of life. In the 4D space we found a principal curve that goes "through the middle" of the dataset and project the data points on this curve. The order along this principal curve gives us the ranking of countries. Projection onto the principal curve provides a solution to the classical problem of unsupervised ranking of objects. It allows us to find the independent on expert's opinion way to project several numerical indicators from a multidimensional space onto the one-dimensional space of the index values. This projection is, in some sense, optimal and preserves as much information as possible. For computation we used ViDaExpert, a tool for visualization and analysis of multidimensional vectorial data (arXiv:1406.5550).

* 9 pages, 1 figure, 1 table with data for 171 countries. In this case study we use only publicly available data taken from GAPMINDER online data base for 2005

Via

Access Paper or Ask Questions

Geometrical complexity of data approximators

May 04, 2013

E. M. Mirkes, A. Zinovyev, A. N. Gorban

Figure 1 for Geometrical complexity of data approximators

Figure 2 for Geometrical complexity of data approximators

Figure 3 for Geometrical complexity of data approximators

Figure 4 for Geometrical complexity of data approximators

Abstract:There are many methods developed to approximate a cloud of vectors embedded in high-dimensional space by simpler objects: starting from principal points and linear manifolds to self-organizing maps, neural gas, elastic maps, various types of principal curves and principal trees, and so on. For each type of approximators the measure of the approximator complexity was developed too. These measures are necessary to find the balance between accuracy and complexity and to define the optimal approximations of a given type. We propose a measure of complexity (geometrical complexity) which is applicable to approximators of several types and which allows comparing data approximations of different types.

* IWANN 2013, Advances in Computation Intelligence, Springer LNCS 7902, pp. 500-509, 2013
* 10 pages, 3 figures, minor correction and extension

Via

Access Paper or Ask Questions

Principal Graphs and Manifolds

May 09, 2011

A. N. Gorban, A. Y. Zinovyev

Figure 1 for Principal Graphs and Manifolds

Figure 2 for Principal Graphs and Manifolds

Figure 3 for Principal Graphs and Manifolds

Figure 4 for Principal Graphs and Manifolds

Abstract:In many physical, statistical, biological and other investigations it is desirable to approximate a system of points by objects of lower dimension and/or complexity. For this purpose, Karl Pearson invented principal component analysis in 1901 and found 'lines and planes of closest fit to system of points'. The famous k-means algorithm solves the approximation problem too, but by finite sets instead of lines and planes. This chapter gives a brief practical introduction into the methods of construction of general principal objects, i.e. objects embedded in the 'middle' of the multidimensional data set. As a basis, the unifying framework of mean squared distance approximation of finite datasets is selected. Principal graphs and manifolds are constructed as generalisations of principal components and k-means principal points. For this purpose, the family of expectation/maximisation algorithms with nearest generalisations is presented. Construction of principal graphs with controlled complexity is based on the graph grammar approach.

* Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods and Techniques, Ch. 2, Information Science Reference, 2009. 28-59
* 36 pages, 6 figures, minor corrections

Via

Access Paper or Ask Questions

Principal manifolds and graphs in practice: from molecular biology to dynamical systems

Jul 25, 2010

A. N. Gorban, A. Zinovyev

Figure 1 for Principal manifolds and graphs in practice: from molecular biology to dynamical systems

Figure 2 for Principal manifolds and graphs in practice: from molecular biology to dynamical systems

Figure 3 for Principal manifolds and graphs in practice: from molecular biology to dynamical systems

Figure 4 for Principal manifolds and graphs in practice: from molecular biology to dynamical systems

Abstract:We present several applications of non-linear data modeling, using principal manifolds and principal graphs constructed using the metaphor of elasticity (elastic principal graph approach). These approaches are generalizations of the Kohonen's self-organizing maps, a class of artificial neural networks. On several examples we show advantages of using non-linear objects for data approximation in comparison to the linear ones. We propose four numerical criteria for comparing linear and non-linear mappings of datasets into the spaces of lower dimension. The examples are taken from comparative political science, from analysis of high-throughput data in molecular biology, from analysis of dynamical systems.

* International Journal of Neural Systems, Vol. 20, No. 3 (2010) 219-232
* 12 pages, 9 figures

Via

Access Paper or Ask Questions