Abstract:Geometry-grounded learning asks models to respect structure in the problem domain rather than treating observations as arbitrary vectors. Motivated by this view, we revisit a classical but underused primitive for comparing datasets: linear relations between two data matrices, expressed via the co-span constraint $Ax = By = z$ in a shared ambient space. To operationalize this comparison, we use the generalized singular value decomposition (GSVD) as a joint coordinate system for two subspaces. In particular, we exploit the GSVD form $A = HCU$, $B = HSV$ with $C^{\top}C + S^{\top}S = I$, which separates shared versus dataset-specific directions through the diagonal structure of $(C, S)$. From these factors we derive an interpretable *angle score* $θ(z) \in [0, π/2]$ for a sample $z$, quantifying whether z is explained relatively more by $A$, more by $B$, or comparably by both. The primary role of $θ(z)$ is as a *per-sample geometric diagnostic*. We illustrate the behavior of the score on MNIST through angle distributions and representative GSVD directions. A binary classifier derived from $θ(z)$ is presented as an illustrative application of the score as an interpretable diagnostic tool.
Abstract:Community detection is one of the fundamental problems in data science which consists of partitioning nodes into disjoint communities. We present a game-theoretic perspective on the Constant Potts Model (CPM) for partitioning networks into disjoint communities, emphasizing its efficiency, robustness, and accuracy. Efficiency: We reinterpret CPM as a potential hedonic game by decomposing its global Hamiltonian into local utility functions, where the local utility gain of each agent matches the corresponding increase in global utility. Leveraging this equivalence, we prove that local optimization of the CPM objective via better-response dynamics converges in pseudo-polynomial time to an equilibrium partition. Robustness: We introduce and relate two stability criteria: a strict criterion based on a novel notion of robustness, requiring nodes to simultaneously maximize neighbors and minimize non-neighbors within communities, and a relaxed utility function based on a weighted sum of these objectives, controlled by a resolution parameter. Accuracy: In community tracking scenarios, where initial partitions are used to bootstrap the Leiden algorithm with partial ground-truth information, our experiments reveal that robust partitions yield higher accuracy in recovering ground-truth communities.