Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yang Yuan

AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts

Feb 12, 2024

Yifan Zhang, Yifan Luo, Yang Yuan, Andrew Chi-Chih Yao

Abstract:To improve language models' proficiency in mathematical reasoning via continual pretraining, we introduce a novel strategy that leverages base language models for autonomous data selection. Departing from conventional supervised fine-tuning or trained classifiers with human-annotated data, our approach utilizes meta-prompted language models as zero-shot verifiers to autonomously evaluate and select high-quality mathematical content, and we release the curated open-source AutoMathText dataset encompassing over 200GB of data. To demonstrate the efficacy of our method, we continuously pretrained a 7B-parameter Mistral language model on the AutoMathText dataset, achieving substantial improvements in downstream performance on the MATH dataset with a token amount reduced by orders of magnitude compared to previous continuous pretraining works. Our method showcases a 2 times increase in pretraining token efficiency compared to baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities. The AutoMathText dataset is available at https://huggingface.co/datasets/math-ai/AutoMathText. The code is available at https://github.com/yifanzhang-pro/AutoMathText.

Via

Access Paper or Ask Questions

Information Flow in Self-Supervised Learning

Oct 15, 2023

Zhiquan Tan, Jingqin Yang, Weiran Huang, Yang Yuan, Yifan Zhang

Figure 1 for Information Flow in Self-Supervised Learning

Figure 2 for Information Flow in Self-Supervised Learning

Figure 3 for Information Flow in Self-Supervised Learning

Figure 4 for Information Flow in Self-Supervised Learning

Abstract:In this paper, we provide a comprehensive toolbox for understanding and enhancing self-supervised learning (SSL) methods through the lens of matrix information theory. Specifically, by leveraging the principles of matrix mutual information and joint entropy, we offer a unified analysis for both contrastive and feature decorrelation based methods. Furthermore, we propose the matrix variational masked auto-encoder (M-MAE) method, grounded in matrix information theory, as an enhancement to masked image modeling. The empirical evaluations underscore the effectiveness of M-MAE compared with the state-of-the-art methods, including a 3.9% improvement in linear probing ViT-Base, and a 1% improvement in fine-tuning ViT-Large, both on ImageNet.

Via

Access Paper or Ask Questions

MatChat: A Large Language Model and Application Service Platform for Materials Science

Oct 11, 2023

Ziyi Chen, Fankai Xie, Meng Wan, Yang Yuan, Miao Liu, Zongguo Wang, Sheng Meng, Yangang Wang

Abstract:The prediction of chemical synthesis pathways plays a pivotal role in materials science research. Challenges, such as the complexity of synthesis pathways and the lack of comprehensive datasets, currently hinder our ability to predict these chemical processes accurately. However, recent advancements in generative artificial intelligence (GAI), including automated text generation and question-answering systems, coupled with fine-tuning techniques, have facilitated the deployment of large-scale AI models tailored to specific domains. In this study, we harness the power of the LLaMA2-7B model and enhance it through a learning process that incorporates 13,878 pieces of structured material knowledge data. This specialized AI model, named MatChat, focuses on predicting inorganic material synthesis pathways. MatChat exhibits remarkable proficiency in generating and reasoning with knowledge in materials science. Although MatChat requires further refinement to meet the diverse material design needs, this research undeniably highlights its impressive reasoning capabilities and innovative potential in the field of materials science. MatChat is now accessible online and open for use, with both the model and its application framework available as open source. This study establishes a robust foundation for collaborative innovation in the integration of generative AI in materials science.

Via

Access Paper or Ask Questions

Cumulative Reasoning with Large Language Models

Aug 25, 2023

Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao

Figure 1 for Cumulative Reasoning with Large Language Models

Figure 2 for Cumulative Reasoning with Large Language Models

Figure 3 for Cumulative Reasoning with Large Language Models

Figure 4 for Cumulative Reasoning with Large Language Models

Abstract:While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, CR streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3%, and achieves the astonishing accuracy of 98.04% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 98%, which signifies a substantial enhancement of 24% over the previous state-of-the-art method. Finally, on the MATH dataset, we establish new state-of-the-art results with 58.0% overall accuracy, surpassing the previous best approach by a margin of 4.2%, and achieving 43% relative improvement on the hardest level 5 problems (22.4% to 32.1%). Code is available at https://github.com/iiis-ai/cumulative-reasoning.

Via

Access Paper or Ask Questions

Kernel-SSL: Kernel KL Divergence for Self-Supervised Learning

May 30, 2023

Yifan Zhang, Zhiquan Tan, Jingqin Yang, Yang Yuan

Abstract:Contrastive learning usually compares one positive anchor sample with lots of negative samples to perform Self-Supervised Learning (SSL). Alternatively, non-contrastive learning, as exemplified by methods like BYOL, SimSiam, and Barlow Twins, accomplishes SSL without the explicit use of negative samples. Inspired by the existing analysis for contrastive learning, we provide a reproducing kernel Hilbert space (RKHS) understanding of many existing non-contrastive learning methods. Subsequently, we propose a novel loss function, Kernel-SSL, which directly optimizes the mean embedding and the covariance operator within the RKHS. In experiments, our method Kernel-SSL outperforms state-of-the-art methods by a large margin on ImageNet datasets under the linear evaluation settings. Specifically, when performing 100 epochs pre-training, our method outperforms SimCLR by 4.6%.

Via

Access Paper or Ask Questions

RelationMatch: Matching In-batch Relationships for Semi-supervised Learning

May 17, 2023

Yifan Zhang, Jingqin Yang, Zhiquan Tan, Yang Yuan

Abstract:Semi-supervised learning has achieved notable success by leveraging very few labeled data and exploiting the wealth of information derived from unlabeled data. However, existing algorithms usually focus on aligning predictions on paired data points augmented from an identical source, and overlook the inter-point relationships within each batch. This paper introduces a novel method, RelationMatch, which exploits in-batch relationships with a matrix cross-entropy (MCE) loss function. Through the application of MCE, our proposed method consistently surpasses the performance of established state-of-the-art methods, such as FixMatch and FlexMatch, across a variety of vision datasets. Notably, we observed a substantial enhancement of 15.21% in accuracy over FlexMatch on the STL-10 dataset using only 40 labels. Moreover, we apply MCE to supervised learning scenarios, and observe consistent improvements as well.

Via

Access Paper or Ask Questions

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

May 03, 2023

Chenzhuang Du, Jiaye Teng, Tingle Li, Yichen Liu, Tianyuan Yuan, Yue Wang, Yang Yuan, Hang Zhao

Figure 1 for On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

Figure 2 for On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

Figure 3 for On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

Figure 4 for On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

Abstract:We abstract the features (i.e. learned representations) of multi-modal data into 1) uni-modal features, which can be learned from uni-modal training, and 2) paired features, which can only be learned from cross-modal interactions. Multi-modal models are expected to benefit from cross-modal interactions on the basis of ensuring uni-modal feature learning. However, recent supervised multi-modal late-fusion training approaches still suffer from insufficient learning of uni-modal features on each modality. We prove that this phenomenon does hurt the model's generalization ability. To this end, we propose to choose a targeted late-fusion learning method for the given supervised multi-modal task from Uni-Modal Ensemble(UME) and the proposed Uni-Modal Teacher(UMT), according to the distribution of uni-modal and paired features. We demonstrate that, under a simple guiding strategy, we can achieve comparable results to other complex late-fusion or intermediate-fusion methods on various multi-modal datasets, including VGG-Sound, Kinetics-400, UCF101, and ModelNet40.

Via

Access Paper or Ask Questions

Contrastive Learning Is Spectral Clustering On Similarity Graph

Mar 27, 2023

Zhiquan Tan, Yifan Zhang, Jingqin Yang, Yang Yuan

Figure 1 for Contrastive Learning Is Spectral Clustering On Similarity Graph

Figure 2 for Contrastive Learning Is Spectral Clustering On Similarity Graph

Figure 3 for Contrastive Learning Is Spectral Clustering On Similarity Graph

Figure 4 for Contrastive Learning Is Spectral Clustering On Similarity Graph

Abstract:Contrastive learning is a powerful self-supervised learning method, but we have a limited theoretical understanding of how it works and why it works. In this paper, we prove that contrastive learning with the standard InfoNCE loss is equivalent to spectral clustering on the similarity graph. Using this equivalence as the building block, we extend our analysis to the CLIP model and rigorously characterize how similar multi-modal objects are embedded together. Motivated by our theoretical insights, we introduce the kernel mixture loss, incorporating novel kernel functions that outperform the standard Gaussian kernel on several vision datasets.

* We express our gratitude to the anonymous reviewers for their valuable feedback

Via

Access Paper or Ask Questions

A Categorical Framework of General Intelligence

Mar 08, 2023

Yang Yuan

Figure 1 for A Categorical Framework of General Intelligence

Figure 2 for A Categorical Framework of General Intelligence

Figure 3 for A Categorical Framework of General Intelligence

Figure 4 for A Categorical Framework of General Intelligence

Abstract:Can machines think? Since Alan Turing asked this question in 1950, nobody is able to give a direct answer, due to the lack of solid mathematical foundations for general intelligence. In this paper, we introduce a categorical framework towards this goal, consisting of four components: the sensor, world category, planner with objectives, and actor. By leveraging category theory, many important notions in general intelligence can be rigorously defined and analyzed. For instance, we introduce the concept of self-state awareness as a categorical analogy for self-consciousness and provide algorithms for learning and evaluating it. For communication with other agents, we propose to use diagrams that capture the exact representation of the context, instead of using natural languages. Additionally, we demonstrate that by designing the objectives as the output of function over self-state, the model's human-friendliness is guaranteed. Most importantly, our framework naturally introduces various constraints based on categorical invariance that can serve as the alignment signals for training a model that fits into the framework.

Via

Access Paper or Ask Questions

Succinct Representations for Concepts

Mar 01, 2023

Yang Yuan

Figure 1 for Succinct Representations for Concepts

Abstract:Foundation models like chatGPT have demonstrated remarkable performance on various tasks. However, for many questions, they may produce false answers that look accurate. How do we train the model to precisely understand the concepts? In this paper, we introduce succinct representations of concepts based on category theory. Such representation yields concept-wise invariance properties under various tasks, resulting a new learning algorithm that can provably and accurately learn complex concepts or fix misconceptions. Moreover, by recursively expanding the succinct representations, one can generate a hierarchical decomposition, and manually verify the concept by individually examining each part inside the decomposition.

Via

Access Paper or Ask Questions