Abstract:Training Neural Networks (NNs) to behave as Model Predictive Control (MPC) algorithms is an effective way to implement them in constrained embedded devices. By collecting large amounts of input-output data, where inputs represent system states and outputs are MPC-generated control actions, NNs can be trained to replicate MPC behavior at a fraction of the computational cost. However, although the composition of the training data critically influences the final NN accuracy, methods for systematically optimizing it remain underexplored. In this paper, we introduce the concept of Optimally-Sampled Datasets (OSDs) as ideal training sets and present an efficient algorithm for generating them. An OSD is a parametrized subset of all the available data that (i) preserves existing MPC information up to a certain numerical resolution, (ii) avoids duplicate or near-duplicate states, and (iii) becomes saturated or complete. We demonstrate the effectiveness of OSDs by training NNs to replicate the University of Virginia's MPC algorithm for automated insulin delivery in Type-1 Diabetes, achieving a four-fold improvement in final accuracy. Notably, two OSD-trained NNs received regulatory clearance for clinical testing as the first NN-based control algorithm for direct human insulin dosing. This methodology opens new pathways for implementing advanced optimizations on resource-constrained embedded platforms, potentially revolutionizing how complex algorithms are deployed.