Abstract:Visual abstract reasoning problems hold immense importance in the field of image processing. Both Bongard-Logo and Raven's Progressive Matrices (RPM) belong to this domain, with Bongard-Logo categorized as image clustering reasoning and RPM involving image progression pattern reasoning. This paper introduces Valen, a novel baseline model under probabilistic highlighting models. Valen exhibits remarkable performance in solving both RPM and Bongard-Logo problems, offering a versatile solution. Our investigation delves into the underlying mechanisms of probability-highlighting solvers, realizing they approximate solutions to reasoning problem instances as distributions delineated by primary and auxiliary samples. We propose that the learning objective is not the distribution of correct solutions but one defined by both primary and auxiliary samples. To bridge discrepancies, we introduced the Tine method, an adversarial learning-based approach to assist Valen in estimating a solution distribution closer to the correct one, albeit with issues like unstable training. Reflecting on Tine, we propose modeling the sample distribution of reasoning problems as a mixture of Gaussian distributions, leading to the Funny method. This effectively enables Valen to capture the true form of the correct solution distribution. Furthermore, we designed the SBR method to model the distribution of progressive patterns representation similarly. Overall, the Funny, Tine, and SBR methods significantly improve Valen's performance, providing new ideas and methods for studying visual abstract reasoning problems.
Abstract:Visual abstract reasoning problems hold immense importance in the field of image processing. Both Bongard-Logo and Raven's Progressive Matrices (RPM) belong to this domain, with Bongard-Logo categorized as image clustering reasoning and RPM involving image progression pattern reasoning. This paper introduces Valen, a novel baseline model under probabilistic highlighting models. Valen exhibits remarkable performance in solving both RPM and Bongard-Logo problems, offering a versatile solution. Our investigation delves into the underlying mechanisms of probability-highlighting solvers, realizing they approximate solutions to reasoning problem instances as distributions delineated by primary and auxiliary samples. We propose that the learning objective is not the distribution of correct solutions but one defined by both primary and auxiliary samples. To bridge discrepancies, we introduced the Tine method, an adversarial learning-based approach to assist Valen in estimating a solution distribution closer to the correct one, albeit with issues like unstable training. Reflecting on Tine, we propose modeling the sample distribution of reasoning problems as a mixture of Gaussian distributions, leading to the Funny method. This effectively enables Valen to capture the true form of the correct solution distribution. Furthermore, we designed the SBR method to model the distribution of progressive patterns representation similarly. Overall, the Funny, Tine, and SBR methods significantly improve Valen's performance, providing new ideas and methods for studying visual abstract reasoning problems.
Abstract:Abstract reasoning problems challenge the perceptual and cognitive abilities of AI algorithms, demanding deeper pattern discernment and inductive reasoning beyond explicit image features. This study introduces PMoC, a tailored probability model for the Bongard-Logo problem, achieving high reasoning accuracy by constructing independent probability models. Additionally, we present Pose-Transformer, an enhanced Transformer-Encoder designed for complex abstract reasoning tasks, including Bongard-Logo, RAVEN, I-RAVEN, and PGM. Pose-Transformer incorporates positional information learning, inspired by capsule networks' pose matrices, enhancing its focus on local positional relationships in image data processing. When integrated with PMoC, it further improves reasoning accuracy. Our approach effectively addresses reasoning difficulties associated with abstract entities' positional changes, outperforming previous models on the OIG, D3$\times$3 subsets of RAVEN, and PGM databases. This research contributes to advancing AI's capabilities in abstract reasoning and cognitive pattern recognition.
Abstract:This paper achieves significant progress in the field of abstract reasoning, particularly in addressing Raven's Progressive Matrices (RPM) and Bongard-Logo problems. We propose the D2C approach, which redefines conceptual boundaries in these domains and bridges the gap between high-level concepts and their low-dimensional representations. Based on this, we further introduce the D3C method that handles Bongard-Logo problems and significantly improves reasoning accuracy by estimating the distribution of image representations and measuring their Sinkhorn distance. To enhance computational efficiency, we introduce the D3C-cos variant, which provides an efficient and accurate solution for RPM problems by constraining distribution distances. Additionally, we present Lico-Net, a network that combines D3C and D3C-cos to achieve state-of-the-art performance in both problem-solving and interpretability. Finally, we extend our approach to D4C, employing adversarial strategies to further refine conceptual boundaries and demonstrate notable improvements for both RPM and Bongard-Logo problems. Overall, our contributions offer a new perspective and practical solutions to the field of abstract reasoning.
Abstract:Abstract reasoning problems pose significant challenges to artificial intelligence algorithms, demanding cognitive capabilities beyond those required for perception tasks. This study introduces the Triple-CFN approach to tackle the Bongard-Logo problem, achieving notable reasoning accuracy by implicitly reorganizing the concept space of conflicting instances. Additionally, the Triple-CFN paradigm proves effective for the RPM problem with necessary modifications, yielding competitive results. To further enhance performance on the RPM issue, we develop the Meta Triple-CFN network, which explicitly structures the problem space while maintaining interpretability on progressive patterns. The success of Meta Triple-CFN is attributed to its paradigm of modeling the conceptual space, equivalent to normalizing reasoning information. Based on this ideology, we introduce the Re-space layer, enhancing the performance of both Meta Triple-CFN and Triple-CFN. This paper aims to contribute to advancements in machine intelligence by exploring innovative network designs for addressing abstract reasoning problems, paving the way for further breakthroughs in this domain.