Abstract:Large Language Models (LLMs) have been successfully used in many natural-language tasks and applications including text generation and AI chatbots. They also are a promising new technology for concept-oriented deep learning (CODL). However, the prerequisite is that LLMs understand concepts and ensure conceptual consistency. We discuss these in this paper, as well as major uses of LLMs for CODL including concept extraction from text, concept graph extraction from text, and concept learning. Human knowledge consists of both symbolic (conceptual) knowledge and embodied (sensory) knowledge. Text-only LLMs, however, can represent only symbolic (conceptual) knowledge. Multimodal LLMs, on the other hand, are capable of representing the full range (conceptual and sensory) of human knowledge. We discuss conceptual understanding in visual-language LLMs, the most important multimodal LLMs, and major uses of them for CODL including concept extraction from image, concept graph extraction from image, and concept learning. While uses of LLMs for CODL are valuable standalone, they are particularly valuable as part of LLM applications such as AI chatbots.
Abstract:As part of the recent research effort on quantum natural language processing (QNLP), variational quantum sentence classifiers (VQSCs) have been implemented and supported in lambeq / DisCoPy, based on the DisCoCat model of sentence meaning. We discuss in some detail VQSCs, including category theory, DisCoCat for modeling sentence as string diagram, and DisCoPy for encoding string diagram as parameterized quantum circuit. Many NLP tasks, however, require the handling of text consisting of multiple sentences, which is not supported in lambeq / DisCoPy. A good example is sentiment classification of customer feedback or product review. We discuss three potential approaches to variational quantum text classifiers (VQTCs), in line with VQSCs. The first is a weighted bag-of-sentences approach which treats text as a group of independent sentences with task-specific sentence weighting. The second is a coreference resolution approach which treats text as a consolidation of its member sentences with coreferences among them resolved. Both approaches are based on the DisCoCat model and should be implementable in lambeq / DisCoCat. The third approach, on the other hand, is based on the DisCoCirc model which considers both ordering of sentences and interaction of words in composing text meaning from word and sentence meanings. DisCoCirc makes fundamental modification of DisCoCat since a sentence in DisCoCirc updates meanings of words, whereas all meanings are static in DisCoCat. It is not clear if DisCoCirc can be implemented in lambeq / DisCoCat without breaking DisCoCat.
Abstract:Quantum kernel methods, i.e., kernel methods with quantum kernels, offer distinct advantages as a hybrid quantum-classical approach to quantum machine learning (QML), including applicability to Noisy Intermediate-Scale Quantum (NISQ) devices and usage for solving all types of machine learning problems. Kernel methods rely on the notion of similarity between points in a higher (possibly infinite) dimensional feature space. For machine learning, the notion of similarity assumes that points close in the feature space should be close in the machine learning task space. In this paper, we discuss the use of variational quantum kernels with task-specific quantum metric learning to generate optimal quantum embeddings (a.k.a. quantum feature encodings) that are specific to machine learning tasks. Such task-specific optimal quantum embeddings, implicitly supporting feature selection, are valuable not only to quantum kernel methods in improving the latter's performance, but they can also be valuable to non-kernel QML methods based on parameterized quantum circuits (PQCs) as pretrained embeddings and for transfer learning. This further demonstrates the quantum utility, and quantum advantage (with classically-intractable quantum embeddings), of quantum kernel methods.
Abstract:Quantum machine learning (QML) is the use of quantum computing for the computation of machine learning algorithms. With the prevalence and importance of classical data, a hybrid quantum-classical approach to QML is called for. Parameterized Quantum Circuits (PQCs), and particularly Quantum Kernel PQCs, are generally used in the hybrid approach to QML. In this paper we discuss some important aspects of PQCs with quantum kernels including PQCs, quantum kernels, quantum kernels with quantum advantage, and the trainability of quantum kernels. We conclude that quantum kernels with hybrid kernel methods, a.k.a. quantum kernel methods, offer distinct advantages as a hybrid approach to QML. Not only do they apply to Noisy Intermediate-Scale Quantum (NISQ) devices, but they also can be used to solve all types of machine learning problems including regression, classification, clustering, and dimension reduction. Furthermore, beyond quantum utility, quantum advantage can be attained if the quantum kernels, i.e., the quantum feature encodings, are classically intractable.
Abstract:Deep learning for molecular science has so far mainly focused on 2D molecular graphs. Recently, however, there has been work to extend it to 3D molecular geometry, due to its scientific significance and critical importance in real-world applications. The 3D distance-geometric graph representation (DG-GR) adopts a unified scheme (distance) for representing the geometry of 3D graphs. It is invariant to rotation and translation of the graph, and it reflects pair-wise node interactions and their generally local nature, particularly relevant for 3D molecular geometry. To facilitate the incorporation of 3D molecular geometry in deep learning for molecular science, we adopt the new graph attention network with dynamic attention (GATv2) for use with DG-GR and propose the 3D distance-geometric graph attention network (DG-GAT). GATv2 is a great fit for DG-GR since the attention can vary by node and by distance between nodes. Experimental results of DG-GAT for the ESOL and FreeSolv datasets show major improvement (31% and 38%, respectively) over those of the standard graph convolution network based on 2D molecular graphs. The same is true for the QM9 dataset. Our work demonstrates the utility and value of DG-GAT for deep learning based on 3D molecular geometry.
Abstract:Dual embodied-symbolic concept representations are the foundation for deep learning and symbolic AI integration. We discuss the use of dual embodied-symbolic concept representations for molecular graph representation learning, specifically with exemplar-based contrastive self-supervised learning (SSL). The embodied representations are learned from molecular graphs, and the symbolic representations are learned from the corresponding Chemical knowledge graph (KG). We use the Chemical KG to enhance molecular graphs with symbolic (semantic) knowledge and generate their augmented molecular graphs. We treat a molecular graph and its semantically augmented molecular graph as exemplars of the same semantic class, and use the pairs as positive pairs in exemplar-based contrastive SSL.
Abstract:Motivated by recent findings from cognitive neural science, we advocate the use of a dual-level model for concept representations: the embodied level consists of concept-oriented feature representations, and the symbolic level consists of concept graphs. Embodied concept representations are modality specific and exist in the form of feature vectors in a feature space. Symbolic concept representations, on the other hand, are amodal and language specific, and exist in the form of word / knowledge-graph embeddings in a concept / knowledge space. The human conceptual system comprises both embodied representations and symbolic representations, which typically interact to drive conceptual processing. As such, we further advocate the use of dual embodied-symbolic concept representations for deep learning. To demonstrate their usage and value, we discuss two important use cases: embodied-symbolic knowledge distillation for few-shot class incremental learning, and embodied-symbolic fused representation for image-text matching. Dual embodied-symbolic concept representations are the foundation for deep learning and symbolic AI integration. We discuss two important examples of such integration: scene graph generation with knowledge graph bridging, and multimodal knowledge graphs.
Abstract:Humans are capable of learning new concepts from only a few (labeled) exemplars, incrementally and continually. This happens within the context that we can differentiate among the exemplars, and between the exemplars and large amounts of other data (unlabeled and labeled). This suggests, in human learning, supervised learning of concepts based on exemplars takes place within the larger context of contrastive self-supervised learning (CSSL) based on unlabeled and labeled data. We discuss extending CSSL (1) to be based mainly on exemplars and only secondly on data augmentation, and (2) to apply to both unlabeled data (a large amount is available in general) and labeled data (a few exemplars can be obtained with valuable supervised knowledge). A major benefit of the extensions is that exemplar-based CSSL, with supervised finetuning, supports few-shot class incremental learning (CIL). Specifically, we discuss exemplar-based CSSL including: nearest-neighbor CSSL, neighborhood CSSL with supervised pretraining, and exemplar CSSL with supervised finetuning. We further discuss using exemplar-based CSSL to facilitate few-shot learning and, in particular, few-shot CIL.
Abstract:Concept-oriented deep learning (CODL) is a general approach to meet the future challenges for deep learning: (1) learning with little or no external supervision, (2) coping with test examples that come from a different distribution than the training examples, and (3) integrating deep learning with symbolic AI. In CODL, as in human learning, concept representations are learned based on concept exemplars. Contrastive self-supervised learning (CSSL) provides a promising approach to do so, since it: (1) uses data-driven associations, to get away from semantic labels, (2) supports incremental and continual learning, to get away from (large) fixed datasets, and (3) accommodates emergent objectives, to get away from fixed objectives (tasks). We discuss major aspects of concept representation learning using CSSL. These include dual-level concept representations, CSSL for feature representations, exemplar similarity measures and self-supervised relational reasoning, incremental and continual CSSL, and contrastive self-supervised concept (class) incremental learning. The discussion leverages recent findings from cognitive neural science and CSSL.
Abstract:Bayesian neural networks provide a direct and natural way to extend standard deep neural networks to support probabilistic deep learning through the use of probabilistic layers that, traditionally, encode weight (and bias) uncertainty. In particular, hybrid Bayesian neural networks utilize standard deterministic layers together with few probabilistic layers judicially positioned in the networks for uncertainty estimation. A major aspect and benefit of Bayesian inference is that priors, in principle, provide the means to encode prior knowledge for use in inference and prediction. However, it is difficult to specify priors on weights since the weights have no intuitive interpretation. Further, the relationships of priors on weights to the functions computed by networks are difficult to characterize. In contrast, functions are intuitive to interpret and are direct since they map inputs to outputs. Therefore, it is natural to specify priors on functions to encode prior knowledge, and to use them in inference and prediction based on functions. To support this, we propose hybrid Bayesian neural networks with functional probabilistic layers that encode function (and activation) uncertainty. We discuss their foundations in functional Bayesian inference, functional variational inference, sparse Gaussian processes, and sparse variational Gaussian processes. We further perform few proof-of-concept experiments using GPflus, a new library that provides Gaussian process layers and supports their use with deterministic Keras layers to form hybrid neural network and Gaussian process models.