Abstract:Vine copula models have become highly popular and practical tools for modelling multivariate probability distributions due to their flexibility in modelling different kinds of dependences between the random variables involved. However, their flexibility comes with the drawback of a high-dimensional parameter space. To tackle this problem, truncated vine copulas were introduced by Kurowicka (2010) (Gaussian case) and Brechmann and Czado (2013) (general case). Truncated vine copulas contain conditionally independent pair copulas after the truncation level. So far, in the general case, truncated vine constructing algorithms started from the lowest tree in order to encode the largest dependences in the lower trees. The novelty of this paper starts from the observation that a truncated vine is determined by the first tree after the truncation level (see Kovács and Szántai (2017)). This paper introduces a new score for fitting truncated vines to given data, called the Weight of the truncated vine. Then we propose a completely new methodology for constructing truncated vines. We prove theorems which motivate this new approach. While earlier algorithms did not use conditional independences, we give algorithms for constructing and encoding truncated vines which do exploit them. Finally, we illustrate the algorithms on real datasets and compare the results with well-known methods included in R packages. Our method generally compare favorably to previously known methods.
Abstract:In this paper we introduce the so-called Generalized Naive Bayes structure as an extension of the Naive Bayes structure. We give a new greedy algorithm that finds a good fitting Generalized Naive Bayes (GNB) probability distribution. We prove that this fits the data at least as well as the probability distribution determined by the classical Naive Bayes (NB). Then, under a not very restrictive condition, we give a second algorithm for which we can prove that it finds the optimal GNB probability distribution, i.e. best fitting structure in the sense of KL divergence. Both algorithms are constructed to maximize the information content and aim to minimize redundancy. Based on these algorithms, new methods for feature selection are introduced. We discuss the similarities and differences to other related algorithms in terms of structure, methodology, and complexity. Experimental results show, that the algorithms introduced outperform the related algorithms in many cases.
Abstract:A simple graph on $n$ vertices may contain a lot of maximum cliques. But how many can it potentially contain? We will show that the maximum number of maximum cliques is taken over so-called cliqueful graphs, more specifically, later we will show that it is taken over saturated composite cliqueful graphs, if $n \ge 15$. Using this we will show that the graph that contains $3^{\lfloor n/3 \rfloor}c$ maxcliques has the most number of maxcliques on $n$ vertices, where $c\in\{1,\frac{4}{3},2\}$, depending on $n \text{ mod } 3$.



Abstract:Vine copulas can efficiently model a large portion of probability distributions. This paper focuses on a more thorough understanding of their structures. We are building on well-known existing constructions to represent vine copulas with graphs as well as matrices. The graph representations include the regular, cherry and chordal graph sequence structures, which we show equivalence between. Importantly we also show that when a perfect elimination ordering of a vine structure is given, then it can always be uniquely represented with a matrix. O. M. N\'apoles has shown a way to represent them in a matrix, and we algorithmify this previous approach, while also showing a new method for constructing such a matrix, through cherry tree sequences. Lastly, we prove that these two matrix-building algorithms are equivalent if the same perfect elimination ordering is being used.