Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Meng Jiang

Distance Recomputator and Topology Reconstructor for Graph Neural Networks

Jun 25, 2024

Dong Liu, Meng Jiang

Figure 1 for Distance Recomputator and Topology Reconstructor for Graph Neural Networks

Figure 2 for Distance Recomputator and Topology Reconstructor for Graph Neural Networks

Figure 3 for Distance Recomputator and Topology Reconstructor for Graph Neural Networks

Figure 4 for Distance Recomputator and Topology Reconstructor for Graph Neural Networks

Abstract:This paper introduces novel methodologies, the Distance Recomputator and Topology Reconstructor, aimed at enhancing Graph Neural Networks (GNNs). The Distance Recomputator dynamically recalibrates node distances within k-hop neighborhoods using a dynamic encoding scheme, thereby improving the accuracy and adaptability of node representations. Concurrently, the Topology Reconstructor adjusts local graph structures based on computed "similarity distances," optimizing network configurations for improved learning outcomes. These methods address the limitations of static node representations and fixed aggregation schemes in traditional GNNs, offering a more nuanced approach to modeling complex and dynamic graph topologies. Furthermore, our experimental evaluations demonstrate significant performance advantages over existing methods across various benchmark datasets. The proposed Distance Recomputator and Topology Reconstructor not only enhance node relationship modeling accuracy but also optimize information aggregation efficiency through an asynchronous aggregation mechanism. This approach proves particularly effective in scenarios involving dynamic or large-scale graphs, showcasing the methods' robustness and applicability in real-world graph learning tasks.

Via

Access Paper or Ask Questions

GraphSnapShot: Graph Machine Learning Acceleration with Fast Storage and Retrieval

Jun 25, 2024

Dong Liu, Roger Waleffe, Meng Jiang, Shivaram Venkataraman

Figure 1 for GraphSnapShot: Graph Machine Learning Acceleration with Fast Storage and Retrieval

Figure 2 for GraphSnapShot: Graph Machine Learning Acceleration with Fast Storage and Retrieval

Figure 3 for GraphSnapShot: Graph Machine Learning Acceleration with Fast Storage and Retrieval

Figure 4 for GraphSnapShot: Graph Machine Learning Acceleration with Fast Storage and Retrieval

Abstract:In our recent research, we have developed a framework called GraphSnapShot, which has been proven an useful tool for graph learning acceleration. GraphSnapShot is a framework for fast cache, storage, retrieval and computation for graph learning. It can quickly store and update the local topology of graph structure and allows us to track patterns in the structure of graph networks, just like take snapshots of the graphs. In experiments, GraphSnapShot shows efficiency, it can achieve up to 30% training acceleration and 73% memory reduction for lossless graph ML training compared to current baselines such as dgl.This technique is particular useful for large dynamic graph learning tasks such as social media analysis and recommendation systems to process complex relationships between entities.

Via

Access Paper or Ask Questions

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Jun 17, 2024

Zhihan Zhang, Zhenwen Liang, Wenhao Yu, Dian Yu, Mengzhao Jia, Dong Yu, Meng Jiang

Figure 1 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Figure 2 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Figure 3 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Figure 4 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Abstract:Supervised fine-tuning enhances the problem-solving abilities of language models across various mathematical reasoning tasks. To maximize such benefits, existing research focuses on broadening the training set with various data augmentation techniques, which is effective for standard single-round question-answering settings. Our work introduces a novel technique aimed at cultivating a deeper understanding of the training problems at hand, enhancing performance not only in standard settings but also in more complex scenarios that require reflective thinking. Specifically, we propose reflective augmentation, a method that embeds problem reflection into each training instance. It trains the model to consider alternative perspectives and engage with abstractions and analogies, thereby fostering a thorough comprehension through reflective reasoning. Extensive experiments validate the achievement of our aim, underscoring the unique advantages of our method and its complementary nature relative to existing augmentation techniques.

Via

Access Paper or Ask Questions

Learning Molecular Representation in a Cell

Jun 17, 2024

Gang Liu, Srijit Seal, John Arevalo, Zhenwen Liang, Anne E. Carpenter, Meng Jiang, Shantanu Singh

Figure 1 for Learning Molecular Representation in a Cell

Figure 2 for Learning Molecular Representation in a Cell

Figure 3 for Learning Molecular Representation in a Cell

Figure 4 for Learning Molecular Representation in a Cell

Abstract:Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream tasks: molecular property prediction against up to 19 baseline methods across four datasets, plus zero-shot molecule-morphology matching.

* 21 pages, 8 tables, 7 figures

Via

Access Paper or Ask Questions

Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts

Jun 15, 2024

Zhaoxuan Tan, Zheyuan Liu, Meng Jiang

Abstract:Personalized large language models (LLMs) aim to tailor interactions, content, and recommendations to individual user preferences. While parameter-efficient fine-tuning (PEFT) methods excel in performance and generalization, they are costly and limit communal benefits when used individually. To this end, we introduce Personalized Pieces (Per-Pcs), a framework that allows users to safely share and assemble personalized PEFT efficiently with collaborative efforts. Per-Pcs involves selecting sharers, breaking their PEFT into pieces, and training gates for each piece. These pieces are added to a pool, from which target users can select and assemble personalized PEFT using their history data. This approach preserves privacy and enables fine-grained user modeling without excessive storage and computation demands. Experimental results show Per-Pcs outperforms non-personalized and PEFT retrieval baselines, offering performance comparable to OPPU with significantly lower resource use across six tasks. Further analysis highlights Per-Pcs's robustness concerning sharer count and selection strategy, pieces sharing ratio, and scalability in computation time and storage space. Per-Pcs's modularity promotes safe sharing, making LLM personalization more efficient, effective, and widely accessible through collaborative efforts.

Via

Access Paper or Ask Questions

Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Jun 06, 2024

Ruiyang Qin, Dancheng Liu, Zheyu Yan, Zhaoxuan Tan, Zixuan Pan, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Jinjun Xiong, Yiyu Shi

Figure 1 for Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Figure 2 for Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Figure 3 for Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Figure 4 for Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Abstract:The scaling laws have become the de facto guidelines for designing large language models (LLMs), but they were studied under the assumption of unlimited computing resources for both training and inference. As LLMs are increasingly used as personalized intelligent assistants, their customization (i.e., learning through fine-tuning) and deployment onto resource-constrained edge devices will become more and more prevalent. An urging but open question is how a resource-constrained computing environment would affect the design choices for a personalized LLM. We study this problem empirically in this work. In particular, we consider the tradeoffs among a number of key design factors and their intertwined impacts on learning efficiency and accuracy. The factors include the learning methods for LLM customization, the amount of personalized data used for learning customization, the types and sizes of LLMs, the compression methods of LLMs, the amount of time afforded to learn, and the difficulty levels of the target use cases. Through extensive experimentation and benchmarking, we draw a number of surprisingly insightful guidelines for deploying LLMs onto resource-constrained devices. For example, an optimal choice between parameter learning and RAG may vary depending on the difficulty of the downstream task, the longer fine-tuning time does not necessarily help the model, and a compressed LLM may be a better choice than an uncompressed LLM to learn from limited personalized data.

* Under review

Via

Access Paper or Ask Questions

Large Language Models Can Self-Correct with Minimal Effort

May 23, 2024

Zhenyu Wu, Qingkai Zeng, Zhihan Zhang, Zhaoxuan Tan, Chao Shen, Meng Jiang

Figure 1 for Large Language Models Can Self-Correct with Minimal Effort

Figure 2 for Large Language Models Can Self-Correct with Minimal Effort

Figure 3 for Large Language Models Can Self-Correct with Minimal Effort

Figure 4 for Large Language Models Can Self-Correct with Minimal Effort

Abstract:Intrinsic self-correct was a method that instructed large language models (LLMs) to verify and correct their responses without external feedback. Unfortunately, the study concluded that the LLMs could not self-correct reasoning yet. We find that a simple yet effective verification method can unleash inherent capabilities of the LLMs. That is to mask a key condition in the question, add the current response to construct a verification question, and predict the condition to verify the response. The condition can be an entity in an open-domain question or a numeric value in a math question, which requires minimal effort (via prompting) to identify. We propose an iterative verify-then-correct framework to progressively identify and correct (probably) false responses, named ProCo. We conduct experiments on three reasoning tasks. On average, ProCo, with GPT-3.5-Turbo as the backend LLM, yields $+6.8$ exact match on four open-domain question answering datasets, $+14.1$ accuracy on three arithmetic reasoning datasets, and $+9.6$ accuracy on a commonsense reasoning dataset, compared to Self-Correct.

* Work in Progress

Via

Access Paper or Ask Questions

Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training

Apr 26, 2024

Mengzhao Jia, Zhihan Zhang, Wenhao Yu, Fangkai Jiao, Meng Jiang

Abstract:Open-source multimodal large language models (MLLMs) excel in various tasks involving textual and visual inputs but still struggle with complex multimodal mathematical reasoning, lagging behind proprietary models like GPT-4V(ision) and Gemini-Pro. Although fine-tuning with intermediate steps (i.e., rationales) elicits some mathematical reasoning skills, the resulting models still fall short in visual comprehension due to inadequate visual-centric supervision, which leads to inaccurate interpretation of math figures. To address this issue, we propose a two-step training pipeline VCAR, which emphasizes the Visual Comprehension training in Addition to mathematical Reasoning learning. It first improves the visual comprehension ability of MLLMs through the visual description generation task, followed by another training step on generating rationales with the assistance of descriptions. Experimental results on two popular benchmarks demonstrate that VCAR substantially outperforms baseline methods solely relying on rationale supervision, especially on problems with high visual demands.

Via

Access Paper or Ask Questions

AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

Apr 17, 2024

Meng Jiang, Yi Jing Yu, Qing Zhao, Jianqiang Li, Changwei Song, Hongzhi Qi, Wei Zhai, Dan Luo, Xiaoqin Wang, Guanghui Fu(+1 more)

Figure 1 for AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

Figure 2 for AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

Figure 3 for AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

Figure 4 for AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

Abstract:Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including suicidal behaviors in extreme cases. Yet, there is a notable absence of methodologies for analyzing cognitive pathways that could aid psychotherapists in conducting effective interventions online. In this study, we gathered data from social media and established the task of extracting cognitive pathways, annotating the data based on a cognitive theoretical framework. We initially categorized the task of extracting cognitive pathways as a hierarchical text classification with four main categories and nineteen subcategories. Following this, we structured a text summarization task to help psychotherapists quickly grasp the essential information. Our experiments evaluate the performance of deep learning and large language models (LLMs) on these tasks. The results demonstrate that our deep learning method achieved a micro-F1 score of 62.34% in the hierarchical text classification task. Meanwhile, in the text summarization task, GPT-4 attained a Rouge-1 score of 54.92 and a Rouge-2 score of 30.86, surpassing the experimental deep learning model's performance. However, it may suffer from an issue of hallucination. We have made all models and codes publicly available to support further research in this field.

Via

Access Paper or Ask Questions

Instructing Large Language Models to Identify and Ignore Irrelevant Conditions

Mar 19, 2024

Zhenyu Wu, Chao Shen, Meng Jiang

Abstract:Math word problem (MWP) solving requires generating a reasoning path based on a given problem description that often contains irrelevant conditions. Existing chain-of-thought (CoT) prompting methods elicited multi-step reasoning abilities of large language models (LLMs) to solve MWPs. However, they were seriously confused by the irrelevant conditions, resulting in low accuracy. In this paper, we propose a novel approach named I$^3$C that instructs LLMs to identify and ignore irrelevant conditions. It identifies a set of irrelevant condition candidates that have a weak semantic relevance with the question. Then it prompts LLMs to verify the irrelevant conditions. Lastly it instructs the LLMs with the verification on relevant and irrelevant conditions to avoid confusion and improve reasoning paths. Moreover, we propose to select (problem, reasoning paths) pairs as demonstrations to enhance I$^3$C with few-shot reasoning. We develop I$^3$C-Select that selects the most confusing problems based on the semantic relevance measurement. We conduct extensive experiments on eight MWP datasets. I$^3$C can be combined with any CoT prompting methods to improve the performance of solving MWPs. Notably, with GPT-3.5-Turbo and I$^3$C-Select, we achieve an accuracy of 96.0 and 94.1 on GSM-IC2-1K and GSM-ICM-1K, respectively, significantly outperforming the state-of-the-art few-shot prompting method Complex-CoT by +11.7 and +11.1. Our implementation is made publicly available at https://wzy6642.github.io/I3C.github.io/.

* NAACL 2024 - Camera Ready

Via

Access Paper or Ask Questions