Artificial Intelligence and digital health have the potential to transform global health. However, having access to representative data to test and validate algorithms in realistic production environments is essential. We introduce HealthSyn, an open-source synthetic data generator of user behavior for testing reinforcement learning algorithms in the context of mobile health interventions. The generator utilizes Markov processes to generate diverse user actions, with individual user behavioral patterns that can change in reaction to personalized interventions (i.e., reminders, recommendations, and incentives). These actions are translated into actual logs using an ML-purposed data schema specific to the mobile health application functionality included with HealthKit, and open-source SDK. The logs can be fed to pipelines to obtain user metrics. The generated data, which is based on real-world behaviors and simulation techniques, can be used to develop, test, and evaluate, both ML algorithms in research and end-to-end operational RL-based intervention delivery frameworks.
Providing front-line health workers in low- and middle- income countries with recommendations and predictions to improve health outcomes can have a tremendous impact on reducing healthcare inequalities, for instance by helping to prevent the thousands of maternal and newborn deaths that occur every day. To that end, we are developing a data-centric machine learning platform that leverages the behavioral logs from a wide range of mobile health applications running in those countries. Here we describe the platform architecture, focusing on the details that help us to maximize the quality and organization of the data throughout the whole process, from the data ingestion with a data-science purposed software development kit to the data pipelines, feature engineering and model management.