Alert button
Picture for Simon Finfer

Simon Finfer

Alert button

The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms

Mar 12, 2022
Nicholas I-Hsien Kuo, Mark N. Polizzotto, Simon Finfer, Federico Garcia, Anders Sönnerborg, Maurizio Zazzi, Michael Böhm, Louisa Jorm, Sebastiano Barbieri

Figure 1 for The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms
Figure 2 for The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms
Figure 3 for The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms
Figure 4 for The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms

In recent years, the machine learning research community has benefited tremendously from the availability of openly accessible benchmark datasets. Clinical data are usually not openly available due to their highly confidential nature. This has hampered the development of reproducible and generalisable machine learning applications in health care. Here we introduce the Health Gym - a growing collection of highly realistic synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms, with a specific focus on reinforcement learning. The three synthetic datasets described in this paper present patient cohorts with acute hypotension and sepsis in the intensive care unit, and people with human immunodeficiency virus (HIV) receiving antiretroviral therapy in ambulatory care. The datasets were created using a novel generative adversarial network (GAN). The distributions of variables, and correlations between variables and trends over time in the synthetic datasets mirror those in the real datasets. Furthermore, the risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low.

Viaarxiv icon

Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project

Dec 07, 2021
Nicholas I-Hsien Kuo, Mark Polizzotto, Simon Finfer, Louisa Jorm, Sebastiano Barbieri

Figure 1 for Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project
Figure 2 for Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project
Figure 3 for Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project
Figure 4 for Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project

These two synthetic datasets comprise vital signs, laboratory test results, administered fluid boluses and vasopressors for 3,910 patients with acute hypotension and for 2,164 patients with sepsis in the Intensive Care Unit (ICU). The patient cohorts were built using previously published inclusion and exclusion criteria and the data were created using Generative Adversarial Networks (GANs) and the MIMIC-III Clinical Database. The risk of identity disclosure associated with the release of these data was estimated to be very low (0.045%). The datasets were generated and published as part of the Health Gym, a project aiming to publicly distribute synthetic longitudinal health data for developing machine learning algorithms (with a particular focus on offline reinforcement learning) and for educational purposes.

Viaarxiv icon