Abstract:In the classical setting, the training of a Hidden Markov Model (HMM) typically relies on a single, sufficiently long observation sequence that can be regarded as representative of the underlying stochastic process. In this context, the Expectation Maximization (EM) algorithm is applied in its specialized form for HMMs, namely the Baum Welch algorithm, which has been extensively employed in applications such as speech recognition. The objective of this work is to present pseudocode formulations for both the training and decoding procedures of HMMs in a different scenario, where the available data consist of multiple independent temporal sequences generated by the same model, each of relatively short duration, i.e., containing only a limited number of samples. Special emphasis is placed on the relevance of this formulation to longitudinal studies in population health, where datasets are naturally structured as collections of short trajectories across individuals with point data at follow up.