Abstract:The Chapter starts with introductory information about quantitative linguistics notions, like rank--frequency dependence, Zipf's law, frequency spectra, etc. Similarities in distributions of words in texts with level occupation in quantum ensembles hint at a superficial analogy with statistical physics. This enables one to define various parameters for texts based on this physical analogy, including "temperature", "chemical potential", entropy, and some others. Such parameters provide a set of variables to classify texts serving as an example of complex systems. Moreover, texts are perhaps the easiest complex systems to collect and analyze. Similar approaches can be developed to study, for instance, genomes due to well-known linguistic analogies. We consider a couple of approaches to define nucleotide sequences in mitochondrial DNAs and viral RNAs and demonstrate their possible application as an auxiliary tool for comparative analysis of genomes. Finally, we discuss entropy as one of the parameters, which can be easily computed from rank--frequency dependences. Being a discriminating parameter in some problems of classification of complex systems, entropy can be given a proper interpretation only in a limited class of problems. Its overall role and significance remain an open issue so far.
Abstract:A new set of parameters to describe the word frequency behavior of texts is proposed. The analogy between the word frequency distribution and the Bose-distribution is suggested and the notion of "temperature" is introduced for this case. The calculations are made for English, Ukrainian, and the Guinean Maninka languages. The correlation between in-deep language structure (the level of analyticity) and the defined parameters is shown to exist.
Abstract:In the paper, we analyze the distribution of complexities in the Vai script, an indigenous syllabic writing system from Liberia. It is found that the uniformity hypothesis for complexities fails for this script. The models using Poisson distribution for the number of components and hyper-Poisson distribution for connections provide good fits in the case of the Vai script.
Abstract:We investigate the grapheme-phoneme relation in Ukrainian and some properties of the Ukrainian version of the Cyrillic alphabet.
Abstract:In the article, theoretical principles and practical realization for the compilation of the concordance to "Perekhresni stezhky" ("The Cross-Paths"), a novel by Ivan Franko, are described. Two forms for the context presentation are proposed. The electronic version of this lexicographic work is available online.
Abstract:In the paper, the definition of clause suitable for an automated processing of a Ukrainian text is proposed. The Menzerath-Altmann law is verified on the sentence level and the parameters for the dependences of the clause length counted in words and syllables on the sentence length counted in clauses are calculated for "Perekhresni Stezhky" ("The Cross-Paths"), a novel by Ivan Franko.
Abstract:In the paper, a complex statistical characteristics of a Ukrainian novel is given for the first time. The distribution of word-forms with respect to their size is studied. The linguistic laws by Zipf-Mandelbrot and Altmann-Menzerath are analyzed.