Abstract:Ensemble models are widely recognized in the ML community for their limited interpretability. For instance, while a single decision tree is considered interpretable, ensembles of trees (e.g., boosted trees) are often treated as black-boxes. Despite this folklore recognition, there remains a lack of rigorous mathematical understanding of what particularly makes an ensemble (un)-interpretable, including how fundamental factors like the (1) *number*, (2) *size*, and (3) *type* of base models influence its interpretability. In this work, we seek to bridge this gap by applying concepts from computational complexity theory to study the challenges of generating explanations for various ensemble configurations. Our analysis uncovers nuanced complexity patterns influenced by various factors. For example, we demonstrate that under standard complexity assumptions like P$\neq$NP, interpreting ensembles remains intractable even when base models are of constant size. Surprisingly, the complexity changes drastically with the number of base models: small ensembles of decision trees are efficiently interpretable, whereas interpreting ensembles with even a constant number of linear models remains intractable. We believe that our findings provide a more robust foundation for understanding the interpretability of ensembles, emphasizing the benefits of examining it through a computational complexity lens.
Abstract:Decision trees are a fundamental tool in machine learning for representing, classifying, and generalizing data. It is desirable to construct ``small'' decision trees, by minimizing either the \textit{size} ($s$) or the \textit{depth} $(d)$ of the \textit{decision tree} (\textsc{DT}). Recently, the parameterized complexity of \textsc{Decision Tree Learning} has attracted a lot of attention. We consider a generalization of \textsc{Decision Tree Learning} where given a \textit{classification instance} $E$ and an integer $t$, the task is to find a ``small'' \textsc{DT} that disagrees with $E$ in at most $t$ examples. We consider two problems: \textsc{DTSO} and \textsc{DTDO}, where the goal is to construct a \textsc{DT} minimizing $s$ and $d$, respectively. We first establish that both \textsc{DTSO} and \textsc{DTDO} are W[1]-hard when parameterized by $s+\delta_{max}$ and $d+\delta_{max}$, respectively, where $\delta_{max}$ is the maximum number of features in which two differently labeled examples can differ. We complement this result by showing that these problems become \textsc{FPT} if we include the parameter $t$. We also consider the kernelization complexity of these problems and establish several positive and negative results for both \textsc{DTSO} and \textsc{DTDO}.
Abstract:Challenge the champ tournaments are one of the simplest forms of competition, where a (initially selected) champ is repeatedly challenged by other players. If a player beats the champ, then that player is considered the new (current) champ. Each player in the competition challenges the current champ once in a fixed order. The champ of the last round is considered the winner of the tournament. We investigate a setting where players can be bribed to lower their winning probability against the initial champ. The goal is to maximize the probability of the initial champ winning the tournament by bribing the other players, while not exceeding a given budget for the bribes. Mattei et al. [Journal of Applied Logic, 2015] showed that the problem can be solved in pseudo-polynomial time, and that it is in XP when parameterized by the number of players. We show that the problem is weakly NP-hard and W[1]-hard when parameterized by the number of players. On the algorithmic side, we show that the problem is fixed-parameter tractable when parameterized either by the number of different bribe values or the number of different probability values. To this end, we establish several results that are of independent interest. In particular, we show that the product knapsack problem is W[1]-hard when parameterized by the number of items in the knapsack, and that constructive bribery for cup tournaments is W[1]-hard when parameterized by the number of players. Furthermore, we present a novel way of designing mixed integer linear programs, ensuring optimal solutions where all variables are integers.
Abstract:We study a generalization of the standard approval-based model of participatory budgeting (PB), in which voters are providing approval ballots over a set of predefined projects and -- in addition to a global budget limit, there are several groupings of the projects, each group with its own budget limit. We study the computational complexity of identifying project bundles that maximize voter satisfaction while respecting all budget limits. We show that the problem is generally intractable and describe efficient exact algorithms for several special cases, including instances with only few groups and instances where the group structure is close to be hierarchical, as well as efficient approximation algorithms. Our results could allow, e.g., municipalities to hold richer PB processes that are thematically and geographically inclusive.