A new evaluation framework for topic modeling algorithms based on synthetic corpora

Jan 28, 2019
Hanyu Shi, Martin Gerlach, Isabel Diersen, Doug Downey, Luis A. N. Amaral

* accepted for AISTATS 2019; code available at; Main text (11 pages, 5 figures) and Supplementary Material (14 pages, 11 figures) 

A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics

Dec 19, 2018
Martin Gerlach, Francesc Font-Clos

A network approach to topic models

Jul 19, 2018
Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann

* Science Advances 4, eaaq1360 (2018) 
* 22 pages, 10 figures, code available at 

Generalized Entropies and the Similarity of Texts

Nov 11, 2016
Eduardo G. Altmann, Laercio Dias, Martin Gerlach

* J. Stat. Mech. 014002 (2017) 
* 13 pages, 6 figures; Results presented at the StatPhys-2016 meeting in Lyon 

Similarity of symbol frequency distributions with heavy tails

Apr 15, 2016
Martin Gerlach, Francesc Font-Clos, Eduardo G. Altmann

* Phys. Rev. X 6, 021009 (2016) 
* 13 pages, 7 figures 

Statistical laws in linguistics

Feb 11, 2015
Eduardo G. Altmann, Martin Gerlach

* Proceedings of the Flow Machines Workshop: Creativity and Universality in Language, Paris, June 18 to 20, 2014 

Scaling laws and fluctuations in the statistics of word frequencies

Nov 04, 2014
Martin Gerlach, Eduardo G. Altmann

* New Journal of Physics 16 (2014), 113010 
* 19 pages, 4 figures 

Extracting information from S-curves of language change

Oct 30, 2014
Fakhteh Ghanbarnejad, Martin Gerlach, Jose M. Miotto, Eduardo G. Altmann

* J. R. Soc. Interface 6 December 2014 vol. 11 no. 101 20141044 
* 9 pages, 5 figures, Supplementary Material is available at 

Stochastic model for the vocabulary growth in natural languages

Apr 04, 2013
Martin Gerlach, Eduardo G. Altmann

* Phys. Rev. X 3, 021006 (2013) 
* corrected typos and errors in reference list; 10 pages text, 15 pages supplemental material; to appear in Physical Review X 

