Given the increasing amount and general complexity of time series data in domains such as finance, weather forecasting, and healthcare, there is a growing need for state-of-the-art performance models that can provide interpretable insights into underlying patterns and relationships. Attribution techniques enable the extraction of explanations from time series models to gain insights but are hard to evaluate for their robustness and trustworthiness. We propose the Attribution Stability Indicator (ASI), a measure to incorporate robustness and trustworthiness as properties of attribution techniques for time series into account. We extend a perturbation analysis with correlations of the original time series to the perturbed instance and the attributions to include wanted properties in the measure. We demonstrate the wanted properties based on an analysis of the attributions in a dimension-reduced space and the ASI scores distribution over three whole time series classification datasets.
With the rising necessity of explainable artificial intelligence (XAI), we see an increase in task-dependent XAI methods on varying abstraction levels. XAI techniques on a global level explain model behavior and on a local level explain sample predictions. We propose a visual analytics workflow to support seamless transitions between global and local explanations, focusing on attributions and counterfactuals on time series classification. In particular, we adapt local XAI techniques (attributions) that are developed for traditional datasets (images, text) to analyze time series classification, a data type that is typically less intelligible to humans. To generate a global overview, we apply local attribution methods to the data, creating explanations for the whole dataset. These explanations are projected onto two dimensions, depicting model behavior trends, strategies, and decision boundaries. To further inspect the model decision-making as well as potential data errors, a what-if analysis facilitates hypothesis generation and verification on both the global and local levels. We constantly collected and incorporated expert user feedback, as well as insights based on their domain knowledge, resulting in a tailored analysis workflow and system that tightly integrates time series transformations into explanations. Lastly, we present three use cases, verifying that our technique enables users to (1)~explore data transformations and feature relevance, (2)~identify model behavior and decision boundaries, as well as, (3)~the reason for misclassifications.
Explainable Artificial Intelligence (XAI) has gained significant attention recently as the demand for transparency and interpretability of machine learning models has increased. In particular, XAI for time series data has become increasingly important in finance, healthcare, and climate science. However, evaluating the quality of explanations, such as attributions provided by XAI techniques, remains challenging. This paper provides an in-depth analysis of using perturbations to evaluate attributions extracted from time series models. A perturbation analysis involves systematically modifying the input data and evaluating the impact on the attributions generated by the XAI method. We apply this approach to several state-of-the-art XAI techniques and evaluate their performance on three time series classification datasets. Our results demonstrate that the perturbation analysis approach can effectively evaluate the quality of attributions and provide insights into the strengths and limitations of XAI techniques. Such an approach can guide the selection of XAI methods for time series data, e.g., focusing on return time rather than precision, and facilitate the development of more reliable and interpretable machine learning models for time series analysis.
Neural networks grow vastly in size to tackle more sophisticated tasks. In many cases, such large networks are not deployable on particular hardware and need to be reduced in size. Pruning techniques help to shrink deep neural networks to smaller sizes by only decreasing their performance as little as possible. However, such pruning algorithms are often hard to understand by applying them and do not include domain knowledge which can potentially be bad for user goals. We propose ViNNPruner, a visual interactive pruning application that implements state-of-the-art pruning algorithms and the option for users to do manual pruning based on their knowledge. We show how the application facilitates gaining insights into automatic pruning algorithms and semi-automatically pruning oversized networks to make them more efficient using interactive visualizations.
Attributions are a common local explanation technique for deep learning models on single samples as they are easily extractable and demonstrate the relevance of input values. In many cases, heatmaps visualize such attributions for samples, for instance, on images. However, heatmaps are not always the ideal visualization to explain certain model decisions for other data types. In this review, we focus on attribution visualizations for time series. We collect attribution heatmap visualizations and some alternatives, discuss the advantages as well as disadvantages and give a short position towards future opportunities for attributions and explanations for time series.
Time series forecasting is a demanding task ranging from weather to failure forecasting with black-box models achieving state-of-the-art performances. However, understanding and debugging are not guaranteed. We propose TS-MULE, a local surrogate model explanation method specialized for time series extending the LIME approach. Our extended LIME works with various ways to segment and perturb the time series data. In our extension, we present six sampling segmentation approaches for time series to improve the quality of surrogate attributions and demonstrate their performances on three deep learning model architectures and three common multivariate time series datasets.
Decision explanations of machine learning black-box models are often generated by applying Explainable AI (XAI) techniques. However, many proposed XAI methods produce unverified outputs. Evaluation and verification are usually achieved with a visual interpretation by humans on individual images or text. In this preregistration, we propose an empirical study and benchmark framework to apply attribution methods for neural networks developed for images and text data on time series. We present a methodology to automatically evaluate and rank attribution techniques on time series using perturbation methods to identify reliable approaches.
Explainable Artificial Intelligence (XAI) methods are typically deployed to explain and debug black-box machine learning models. However, most proposed XAI methods are black-boxes themselves and designed for images. Thus, they rely on visual interpretability to evaluate and prove explanations. In this work, we apply XAI methods previously used in the image and text-domain on time series. We present a methodology to test and evaluate various XAI methods on time series by introducing new verification techniques to incorporate the temporal dimension. We further conduct preliminary experiments to assess the quality of selected XAI method explanations with various verification methods on a range of datasets and inspecting quality metrics on it. We demonstrate that in our initial experiments, SHAP works robust for all models, but others like DeepLIFT, LRP, and Saliency Maps work better with specific architectures.
We propose a framework for interactive and explainable machine learning that enables users to (1) understand machine learning models; (2) diagnose model limitations using different explainable AI methods; as well as (3) refine and optimize the models. Our framework combines an iterative XAI pipeline with eight global monitoring and steering mechanisms, including quality monitoring, provenance tracking, model comparison, and trust building. To operationalize the framework, we present explAIner, a visual analytics system for interactive and explainable machine learning that instantiates all phases of the suggested pipeline within the commonly used TensorBoard environment. We performed a user-study with nine participants across different expertise levels to examine their perception of our workflow and to collect suggestions to fill the gap between our system and framework. The evaluation confirms that our tightly integrated system leads to an informed machine learning process while disclosing opportunities for further extensions.