Synthetic datasets are often presented as a silver-bullet solution to the problem of privacy-preserving data publishing. However, for many applications, synthetic data has been shown to have limited utility when used to train predictive models. One promising potential application of these data is in the exploratory phase of the machine learning workflow, which involves understanding, engineering and selecting features. This phase often involves considerable time, and depends on the availability of data. There would be substantial value in synthetic data that permitted these steps to be carried out while, for example, data access was being negotiated, or with fewer information governance restrictions. This paper presents an empirical analysis of the agreement between the feature importance obtained from raw and from synthetic data, on a range of artificially generated and real-world datasets (where feature importance represents how useful each feature is when predicting a the outcome). We employ two differentially-private methods to produce synthetic data, and apply various utility measures to quantify the agreement in feature importance as this varies with the level of privacy. Our results indicate that synthetic data can sometimes preserve several representations of the ranking of feature importance in simple settings but their performance is not consistent and depends upon a number of factors. Particular caution should be exercised in more nuanced real-world settings, where synthetic data can lead to differences in ranked feature importance that could alter key modelling decisions. This work has important implications for developing synthetic versions of highly sensitive data sets in fields such as finance and healthcare.
Capturing and simulating intelligent adaptive behaviours within spatially explicit individual-based models remains an ongoing challenge for researchers. While an ever-increasing abundance of real-world behavioural data are collected, few approaches exist that can quantify and formalise key individual behaviours and how they change over space and time. Consequently, commonly used agent decision-making frameworks, such as event-condition-action rules, are often required to focus only on a narrow range of behaviours. We argue that these behavioural frameworks often do not reflect real-world scenarios and fail to capture how behaviours can develop in response to stimuli. There has been an increased interest in Machine Learning methods and their potential to simulate intelligent adaptive behaviours in recent years. One method that is beginning to gain traction in this area is Reinforcement Learning (RL). This paper explores how RL can be applied to create emergent agent behaviours using a simple predator-prey Agent-Based Model (ABM). Running a series of simulations, we demonstrate that agents trained using the novel Proximal Policy Optimisation (PPO) algorithm behave in ways that exhibit properties of real-world intelligent adaptive behaviours, such as hiding, evading and foraging.