The extended Kalman filter is perhaps the most standard tool to estimate in real time the state of a dynamical system from noisy measurements of some function of the system, with extensive practical applications (such as position tracking via GPS). While the plain Kalman filter for linear systems is well-understood, the extended Kalman filter relies on linearizations which have been debated. We recover the exact extended Kalman filter equations from first principles in statistical learning: the extended Kalman filter is equal to Amari's online natural gradient, applied in the space of trajectories of the system. Namely, each possible trajectory of the dynamical system defines a probability law over possible observations. In principle this makes it possible to treat the underlying trajectory as the parameter of a statistical model of the observations. Then the parameter can be learned by gradient ascent on the log-likelihood of observations, as they become available. Using Amari's natural gradient from information geometry (a gradient descent preconditioned with the Fisher matrix, which provides parameterization-invariance) exactly recovers the extended Kalman filter. This applies only to a particular choice of process noise in the Kalman filter, namely, taking noise proportional to the posterior covariance - a canonical choice in the absence of specific model information.