Alert button
Picture for Marcelo Arenas

Marcelo Arenas

Alert button

Foundations of Symbolic Languages for Model Interpretability

Oct 05, 2021
Marcelo Arenas, Daniel Baez, Pablo Barceló, Jorge Pérez, Bernardo Subercaseaux

Figure 1 for Foundations of Symbolic Languages for Model Interpretability
Figure 2 for Foundations of Symbolic Languages for Model Interpretability
Figure 3 for Foundations of Symbolic Languages for Model Interpretability
Figure 4 for Foundations of Symbolic Languages for Model Interpretability

Several queries and scores have recently been proposed to explain individual predictions over ML models. Given the need for flexible, reliable, and easy-to-apply interpretability methods for ML models, we foresee the need for developing declarative languages to naturally specify different explainability queries. We do this in a principled way by rooting such a language in a logic, called FOIL, that allows for expressing many simple but important explainability queries, and might serve as a core for more expressive interpretability languages. We study the computational complexity of FOIL queries over two classes of ML models often deemed to be easily interpretable: decision trees and OBDDs. Since the number of possible inputs for an ML model is exponential in its dimension, the tractability of the FOIL evaluation problem is delicate but can be achieved by either restricting the structure of the models or the fragment of FOIL being evaluated. We also present a prototype implementation of FOIL wrapped in a high-level declarative language and perform experiments showing that such a language can be used in practice.

* Accepted as Spotlight for NeurIPS'2021 
Viaarxiv icon

Querying in the Age of Graph Databases and Knowledge Graphs

Jun 25, 2021
Marcelo Arenas, Claudio Gutierrez, Juan F. Sequeda

Figure 1 for Querying in the Age of Graph Databases and Knowledge Graphs
Figure 2 for Querying in the Age of Graph Databases and Knowledge Graphs

Graphs have become the best way we know of representing knowledge. The computing community has investigated and developed the support for managing graphs by means of digital technology. Graph databases and knowledge graphs surface as the most successful solutions to this program. The goal of this document is to provide a conceptual map of the data management tasks underlying these developments, paying particular attention to data models and query languages for graphs.

Viaarxiv icon

On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results

Apr 16, 2021
Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, Mikaël Monet

Figure 1 for On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results

In Machine Learning, the $\mathsf{SHAP}$-score is a version of the Shapley value that is used to explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is an intractable problem, we prove a strong positive result stating that the $\mathsf{SHAP}$-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees and Ordered Binary Decision Diagrams (OBDDs). We also establish the computational limits of the SHAP-score by observing that computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider. It also implies that computing $\mathsf{SHAP}$-scores is intractable as well over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing $\mathsf{SHAP}$-scores over such class. In contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists for the computation of $\mathsf{SHAP}$-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula $\varphi$ and features $x,y$, whether the $\mathsf{SHAP}$-score of $x$ in $\varphi$ is smaller than the $\mathsf{SHAP}$-score of $y$ in $\varphi$.

* 52 pages, including 48 pages of main text. This is an extended version of the AAAI conference paper "The Tractability of SHAP-Score-Based Explanations over Deterministic and Decomposable Boolean Circuits" (arXiv:2007.14045), with additional results 
Viaarxiv icon

The Tractability of SHAP-scores over Deterministic and Decomposable Boolean Circuits

Jul 28, 2020
Marcelo Arenas, Pablo Barceló Leopoldo Bertossi, Mikaël Monet

Scores based on Shapley values are currently widely used for providing explanations to classification results over machine learning models. A prime example of this corresponds to the influential SHAP-score, a version of the Shapley value in which the contribution of a set $S$ of features from a given entity $\mathbf{e}$ over a model $M$ is defined as the expected value in $M$ of the set of entities $\mathbf{e}'$ that coincide with $\mathbf{e}$ over all features in $S$. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits, also known as tractable probabilistic circuits. Such circuits encompass a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees and Ordered Binary Decision Diagrams (OBDDs). Moreover, we establish the computational limits of the notion of SHAP-score by showing that computing it over a class of Boolean models is always (polynomially) as hard as the model counting problem for this class (under some mild condition). This implies, for instance, that computing the SHAP-score for DNF propositional formulae is a #P-hard problem, and, thus, that determinism is essential for the circuits that we consider.

* 13 pages, including 2 pages of references 
Viaarxiv icon

Exchanging OWL 2 QL Knowledge Bases

Jul 01, 2013
Marcelo Arenas, Elena Botoeva, Diego Calvanese, Vladislav Ryzhikov

Figure 1 for Exchanging OWL 2 QL Knowledge Bases

Knowledge base exchange is an important problem in the area of data exchange and knowledge representation, where one is interested in exchanging information between a source and a target knowledge base connected through a mapping. In this paper, we study this fundamental problem for knowledge bases and mappings expressed in OWL 2 QL, the profile of OWL 2 based on the description logic DL-Lite_R. More specifically, we consider the problem of computing universal solutions, identified as one of the most desirable translations to be materialized, and the problem of computing UCQ-representations, which optimally capture in a target TBox the information that can be extracted from a source TBox and a mapping by means of unions of conjunctive queries. For the former we provide a novel automata-theoretic technique, and complexity results that range from NP to EXPTIME, while for the latter we show NLOGSPACE-completeness.

Viaarxiv icon