Abstract:Synthetic data generation has emerged as a promising approach to address the challenges of using sensitive financial data in machine learning applications. By leveraging generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), it is possible to create artificial datasets that preserve the statistical properties of real financial records while mitigating privacy risks and regulatory constraints. Despite the rapid growth of this field, a comprehensive synthesis of the current research landscape has been lacking. This systematic review consolidates and analyses 72 studies published since 2018 that focus on synthetic financial data generation. We categorise the types of financial information synthesised, the generative methods employed, and the evaluation strategies used to assess data utility and privacy. The findings indicate that GAN-based approaches dominate the literature, particularly for generating time-series market data and tabular credit data. While several innovative techniques demonstrate potential for improved realism and privacy preservation, there remains a notable lack of rigorous evaluation of privacy safeguards across studies. By providing an integrated overview of generative techniques, applications, and evaluation methods, this review highlights critical research gaps and offers guidance for future work aimed at developing robust, privacy-preserving synthetic data solutions for the financial domain.
Abstract:Software is omnipresent within all factors of society. It is thus important to ensure that software are well tested to mitigate bad user experiences as well as the potential for severe financial and human losses. Software testing is however expensive and absorbs valuable time and resources. As a result, the field of automated software testing has grown of interest to researchers in past decades. In our review of present and past research papers, we have identified an information gap in the areas of improvement for the Genetic Algorithm and Particle Swarm Optimisation. A gap in knowledge in the current challenges that face automated testing has also been identified. We therefore present this systematic literature review in an effort to consolidate existing knowledge in regards to the evolutionary approaches as well as their improvements and resulting limitations. These improvements include hybrid algorithm combinations as well as interoperability with mutation testing and neural networks. We will also explore the main test criterion that are used in these algorithms alongside the challenges currently faced in the field related to readability, mocking and more.




Abstract:Federated Learning (FL) enables local devices to collaboratively learn a shared predictive model by only periodically sharing model parameters with a central aggregator. However, FL can be disadvantaged by statistical heterogeneity produced by the diversity in each local devices data distribution, which creates different levels of Independent and Identically Distributed (IID) data. Furthermore, this can be more complex when optimising different combinations of FL parameters and choosing optimal aggregation. In this paper, we present an empirical analysis of different FL training parameters and aggregators over various levels of statistical heterogeneity on three datasets. We propose a systematic data partition strategy to simulate different levels of statistical heterogeneity and a metric to measure the level of IID. Additionally, we empirically identify the best FL model and key parameters for datasets of different characteristics. On the basis of these, we present recommended guidelines for FL parameters and aggregators to optimise model performance under different levels of IID and with different datasets