Abstract:Decision-tree ensembles are a cornerstone of predictive modeling, and SHAP is a standard framework for interpreting their predictions. Among its variants, Background SHAP offers high accuracy by modeling missing features using a background dataset. Historically, this approach did not scale well, as the time complexity for explaining n instances using m background samples included an O(mn) component. Recent methods such as Woodelf and PLTreeSHAP reduce this to O(m+n), but introduce a preprocessing bottleneck that grows as 3^D with tree depth D, making them impractical for deep trees. We address this limitation with WoodelfHD, a Woodelf extension that reduces the 3^D factor to 2^D. The key idea is a Strassen-like multiplication scheme that exploits the structure of Woodelf matrices, reducing matrix-vector multiplication from O(k^2) to O(k*log(k)) via a fully vectorized, non-recursive implementation. In addition, we merge path nodes with identical features, reducing cache size and memory usage. When running on standard environments, WoodelfHD enables exact Background SHAP computation for trees with depths up to 21, where previous methods fail due to excessive memory usage. For ensembles of depths 12 and 15, it achieves speedups of 33x and 162x, respectively, over the state-of-the-art.




Abstract:SHapley Additive exPlanations (SHAP) is a key tool for interpreting decision tree ensembles by assigning contribution values to features. It is widely used in finance, advertising, medicine, and other domains. Two main approaches to SHAP calculation exist: Path-Dependent SHAP, which leverages the tree structure for efficiency, and Background SHAP, which uses a background dataset to estimate feature distributions. We introduce WOODELF, a SHAP algorithm that integrates decision trees, game theory, and Boolean logic into a unified framework. For each consumer, WOODELF constructs a pseudo-Boolean formula that captures their feature values, the structure of the decision tree ensemble, and the entire background dataset. It then leverages this representation to compute Background SHAP in linear time. WOODELF can also compute Path-Dependent SHAP, Shapley interaction values, Banzhaf values, and Banzhaf interaction values. WOODELF is designed to run efficiently on CPU and GPU hardware alike. Available via the WOODELF Python package, it is implemented using NumPy, SciPy, and CuPy without relying on custom C++ or CUDA code. This design enables fast performance and seamless integration into existing frameworks, supporting large-scale computation of SHAP and other game-theoretic values in practice. For example, on a dataset with 3,000,000 rows, 5,000,000 background samples, and 127 features, WOODELF computed all Background Shapley values in 162 seconds on CPU and 16 seconds on GPU - compared to 44 minutes required by the best method on any hardware platform, representing 16x and 165x speedups, respectively.