



Abstract:The statistical methods derived and described in this thesis provide new ways to elucidate the structural properties of text and other symbolic sequences. Generically, these methods allow detection of a difference in the frequency of a single feature, the detection of a difference between the frequencies of an ensemble of features and the attribution of the source of a text. These three abstract tasks suffice to solve problems in a wide variety of settings. Furthermore, the techniques described in this thesis can be extended to provide a wide range of additional tests beyond the ones described here. A variety of applications for these methods are examined in detail. These applications are drawn from the area of text analysis and genetic sequence analysis. The textually oriented tasks include finding interesting collocations and cooccurent phrases, language identification, and information retrieval. The biologically oriented tasks include species identification and the discovery of previously unreported long range structure in genes. In the applications reported here where direct comparison is possible, the performance of these new methods substantially exceeds the state of the art. Overall, the methods described here provide new and effective ways to analyse text and other symbolic sequences. Their particular strength is that they deal well with situations where relatively little data are available. Since these methods are abstract in nature, they can be applied in novel situations with relative ease.




Abstract:Two meta-evolutionary optimization strategies described in this paper accelerate the convergence of evolutionary programming algorithms while still retaining much of their ability to deal with multi-modal problems. The strategies, called directional mutation and recorded step in this paper, can operate independently but together they greatly enhance the ability of evolutionary programming algorithms to deal with fitness landscapes characterized by long narrow valleys. The directional mutation aspect of this combined method uses correlated meta-mutation but does not introduce a full covariance matrix. These new methods are thus much more economical in terms of storage for problems with high dimensionality. Additionally, directional mutation is rotationally invariant which is a substantial advantage over self-adaptive methods which use a single variance per coordinate for problems where the natural orientation of the problem is not oriented along the axes.