Abstract:Machine learning models commonly exhibit unexpected failures post-deployment due to either data shifts or uncommon situations in the training environment. Domain experts typically go through the tedious process of inspecting the failure cases manually, identifying failure modes and then attempting to fix the model. In this work, we aim to standardise and bring principles to this process through answering two critical questions: (i) how do we know that we have identified meaningful and distinct failure types?; (ii) how can we validate that a model has, indeed, been repaired? We suggest that the quality of the identified failure types can be validated through measuring the intra- and inter-type generalisation after fine-tuning and introduce metrics to compare different subtyping methods. Furthermore, we argue that a model can be considered repaired if it achieves high accuracy on the failure types while retaining performance on the previously correct data. We combine these two ideas into a principled framework for evaluating the quality of both the identified failure subtypes and model repairment. We evaluate its utility on a classification and an object detection tasks. Our code is available at https://github.com/Rokken-lab6/Failure-Analysis-and-Model-Repairment
Abstract:In the domain of the Soccer simulation 2D league of the RoboCup project, appropriate player positioning against a given opponent team is an important factor of soccer team performance. This work proposes a model which decides the strategy that should be applied regarding a particular opponent team. This task can be realized by applying preliminary a learning phase where the model determines the most effective strategies against clusters of opponent teams. The model determines the best strategies by using sequential Bayes' estimators. As a first trial of the system, the proposed model is used to determine the association of player formations against opponent teams in the particular situation of corner-kick. The implemented model shows satisfying abilities to compare player formations that are similar to each other in terms of performance and determines the right ranking even by running a decent number of simulation games.