Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiang Zhang

Victor

LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

Jun 03, 2024

Junjie Xu, Zongyu Wu, Minhua Lin, Xiang Zhang, Suhang Wang

Figure 1 for LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

Figure 2 for LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

Figure 3 for LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

Figure 4 for LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

Abstract:Recent progress in Graph Neural Networks (GNNs) has greatly enhanced the ability to model complex molecular structures for predicting properties. Nevertheless, molecular data encompasses more than just graph structures, including textual and visual information that GNNs do not handle well. To bridge this gap, we present an innovative framework that utilizes multimodal molecular data to extract insights from Large Language Models (LLMs). We introduce GALLON (Graph Learning from Large Language Model Distillation), a framework that synergizes the capabilities of LLMs and GNNs by distilling multimodal knowledge into a unified Multilayer Perceptron (MLP). This method integrates the rich textual and visual data of molecules with the structural analysis power of GNNs. Extensive experiments reveal that our distilled MLP model notably improves the accuracy and efficiency of molecular property predictions.

Via

Access Paper or Ask Questions

Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification

May 24, 2024

Yihe Wang, Nan Huang, Taida Li, Yujun Yan, Xiang Zhang

Figure 1 for Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification

Figure 2 for Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification

Figure 3 for Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification

Figure 4 for Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification

Abstract:Medical time series data, such as Electroencephalography (EEG) and Electrocardiography (ECG), play a crucial role in healthcare, such as diagnosing brain and heart diseases. Existing methods for medical time series classification primarily rely on handcrafted biomarkers extraction and CNN-based models, with limited exploration of transformers tailored for medical time series. In this paper, we introduce Medformer, a multi-granularity patching transformer tailored specifically for medical time series classification. Our method incorporates three novel mechanisms to leverage the unique characteristics of medical time series: cross-channel patching to leverage inter-channel correlations, multi-granularity embedding for capturing features at different scales, and two-stage (intra- and inter-granularity) multi-granularity self-attention for learning features and correlations within and among granularities. We conduct extensive experiments on five public datasets under both subject-dependent and challenging subject-independent setups. Results demonstrate Medformer's superiority over 10 baselines, achieving top averaged ranking across five datasets on all six evaluation metrics. These findings underscore the significant impact of our method on healthcare applications, such as diagnosing Myocardial Infarction, Alzheimer's, and Parkinson's disease. We release the source code at \url{https://github.com/DL4mHealth/Medformer}.

* 20pages (14 pages main paper + 6 pages supplementary materials)

Via

Access Paper or Ask Questions

UnitNorm: Rethinking Normalization for Transformers in Time Series

May 24, 2024

Nan Huang, Christian Kümmerle, Xiang Zhang

Abstract:Normalization techniques are crucial for enhancing Transformer models' performance and stability in time series analysis tasks, yet traditional methods like batch and layer normalization often lead to issues such as token shift, attention shift, and sparse attention. We propose UnitNorm, a novel approach that scales input vectors by their norms and modulates attention patterns, effectively circumventing these challenges. Grounded in existing normalization frameworks, UnitNorm's effectiveness is demonstrated across diverse time series analysis tasks, including forecasting, classification, and anomaly detection, via a rigorous evaluation on 6 state-of-the-art models and 10 datasets. Notably, UnitNorm shows superior performance, especially in scenarios requiring robust attention mechanisms and contextual comprehension, evidenced by significant improvements by up to a 1.46 decrease in MSE for forecasting, and a 4.89% increase in accuracy for classification. This work not only calls for a reevaluation of normalization strategies in time series Transformers but also sets a new direction for enhancing model performance and stability. The source code is available at https://anonymous.4open.science/r/UnitNorm-5B84.

Via

Access Paper or Ask Questions

Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark

May 14, 2024

Mengsong Wu, Tong Zhu, Han Han, Chuanyuan Tan, Xiang Zhang, Wenliang Chen

Abstract:This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-like tools. Seal-Tools not only offers a large number of tools, but also includes instances which demonstrate the practical application of tools. Seeking to generate data on a large scale while ensuring reliability, we propose a self-instruct method to generate tools and instances, allowing precise control over the process. Moreover, our Seal-Tools contains hard instances that call multiple tools to complete the job, among which some are nested tool callings. For precise and comprehensive evaluation, we use strict format control and design three metrics from different dimensions. Therefore, Seal-Tools can serve as a new benchmark to evaluate the tool-calling ability of LLMs. Finally, we evaluate several prevalent LLMs and our finetuned model on Seal-Tools. The results show that current systems are far from perfect. The code, data and experiment results are available at https://github.com/fairyshine/Seal-Tools .

* 14 pages, 10 figures

Via

Access Paper or Ask Questions

Super-Resolving Blurry Images with Events

May 11, 2024

Chi Zhang, Mingyuan Lin, Xiang Zhang, Chenxu Jiang, Lei Yu

Figure 1 for Super-Resolving Blurry Images with Events

Figure 2 for Super-Resolving Blurry Images with Events

Figure 3 for Super-Resolving Blurry Images with Events

Figure 4 for Super-Resolving Blurry Images with Events

Abstract:Super-resolution from motion-blurred images poses a significant challenge due to the combined effects of motion blur and low spatial resolution. To address this challenge, this paper introduces an Event-based Blurry Super Resolution Network (EBSR-Net), which leverages the high temporal resolution of events to mitigate motion blur and improve high-resolution image prediction. Specifically, we propose a multi-scale center-surround event representation to fully capture motion and texture information inherent in events. Additionally, we design a symmetric cross-modal attention module to fully exploit the complementarity between blurry images and events. Furthermore, we introduce an intermodal residual group composed of several residual dense Swin Transformer blocks, each incorporating multiple Swin Transformer layers and a residual connection, to extract global context and facilitate inter-block feature aggregation. Extensive experiments show that our method compares favorably against state-of-the-art approaches and achieves remarkable performance.

Via

Access Paper or Ask Questions

Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases

Apr 12, 2024

Xiang Zhang, Khatoon Khedri, Reza Rawassizadeh

Figure 1 for Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases

Figure 2 for Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases

Figure 3 for Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases

Figure 4 for Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases

Abstract:Large Language Models (LLMs) can automate or substitute different types of tasks in the software engineering process. This study evaluates the resource utilization and accuracy of LLM in interpreting and executing natural language queries against traditional SQL within relational database management systems. We empirically examine the resource utilization and accuracy of nine LLMs varying from 7 to 34 Billion parameters, including Llama2 7B, Llama2 13B, Mistral, Mixtral, Optimus-7B, SUS-chat-34B, platypus-yi-34b, NeuralHermes-2.5-Mistral-7B and Starling-LM-7B-alpha, using a small transaction dataset. Our findings indicate that using LLMs for database queries incurs significant energy overhead (even small and quantized models), making it an environmentally unfriendly approach. Therefore, we advise against replacing relational databases with LLMs due to their substantial resource utilization.

* 13 pages, 2 figures, 5 tables

Via

Access Paper or Ask Questions

From Discrete to Continuous: Deep Fair Clustering With Transferable Representations

Mar 24, 2024

Xiang Zhang

Abstract:We consider the problem of deep fair clustering, which partitions data into clusters via the representations extracted by deep neural networks while hiding sensitive data attributes. To achieve fairness, existing methods present a variety of fairness-related objective functions based on the group fairness criterion. However, these works typically assume that the sensitive attributes are discrete and do not work for continuous sensitive variables, such as the proportion of the female population in an area. Besides, the potential of the representations learned from clustering tasks to improve performance on other tasks is ignored by existing works. In light of these limitations, we propose a flexible deep fair clustering method that can handle discrete and continuous sensitive attributes simultaneously. Specifically, we design an information bottleneck style objective function to learn fair and clustering-friendly representations. Furthermore, we explore for the first time the transferability of the extracted representations to other downstream tasks. Unlike existing works, we impose fairness at the representation level, which could guarantee fairness for the transferred task regardless of clustering results. To verify the effectiveness of the proposed method, we perform extensive experiments on datasets with discrete and continuous sensitive attributes, demonstrating the advantage of our method in comparison with state-of-the-art methods.

Via

Access Paper or Ask Questions

In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing

Mar 19, 2024

Mingrui Yu, Boyuan Liang, Xiang Zhang, Xinghao Zhu, Xiang Li, Masayoshi Tomizuka

Figure 1 for In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing

Figure 2 for In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing

Figure 3 for In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing

Figure 4 for In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing

Abstract:Most research on deformable linear object (DLO) manipulation assumes rigid grasping. However, beyond rigid grasping and re-grasping, in-hand following is also an essential skill that humans use to dexterously manipulate DLOs, which requires continuously changing the grasp point by in-hand sliding while holding the DLO to prevent it from falling. Achieving such a skill is very challenging for robots without using specially designed but not versatile end-effectors. Previous works have attempted using generic parallel grippers, but their robustness is unsatisfactory owing to the conflict between following and holding, which is hard to balance with a one-degree-of-freedom gripper. In this work, inspired by how humans use fingers to follow DLOs, we explore the usage of a generic dexterous hand with tactile sensing to imitate human skills and achieve robust in-hand DLO following. To enable the hardware system to function in the real world, we develop a framework that includes Cartesian-space arm-hand control, tactile-based in-hand 3-D DLO pose estimation, and task-specific motion design. Experimental results demonstrate the significant superiority of our method over using parallel grippers, as well as its great robustness, generalizability, and efficiency.

Via

Access Paper or Ask Questions

Self-Supervised Learning for Time Series: Contrastive or Generative?

Mar 14, 2024

Ziyu Liu, Azadeh Alavi, Minyi Li, Xiang Zhang

Abstract:Self-supervised learning (SSL) has recently emerged as a powerful approach to learning representations from large-scale unlabeled data, showing promising results in time series analysis. The self-supervised representation learning can be categorized into two mainstream: contrastive and generative. In this paper, we will present a comprehensive comparative study between contrastive and generative methods in time series. We first introduce the basic frameworks for contrastive and generative SSL, respectively, and discuss how to obtain the supervision signal that guides the model optimization. We then implement classical algorithms (SimCLR vs. MAE) for each type and conduct a comparative analysis in fair settings. Our results provide insights into the strengths and weaknesses of each approach and offer practical recommendations for choosing suitable SSL methods. We also discuss the implications of our findings for the broader field of representation learning and propose future research directions. All the code and data are released at \url{https://github.com/DL4mHealth/SSL_Comparison}.

* Published at the AI4TS Workshop, IJCAI 2023

Via

Access Paper or Ask Questions

Bayesian Diffusion Models for 3D Shape Reconstruction

Mar 11, 2024

Haiyang Xu, Yu Lei, Zeyuan Chen, Xiang Zhang, Yue Zhao, Yilin Wang, Zhuowen Tu

Figure 1 for Bayesian Diffusion Models for 3D Shape Reconstruction

Figure 2 for Bayesian Diffusion Models for 3D Shape Reconstruction

Figure 3 for Bayesian Diffusion Models for 3D Shape Reconstruction

Figure 4 for Bayesian Diffusion Models for 3D Shape Reconstruction

Abstract:We present Bayesian Diffusion Models (BDM), a prediction algorithm that performs effective Bayesian inference by tightly coupling the top-down (prior) information with the bottom-up (data-driven) procedure via joint diffusion processes. We show the effectiveness of BDM on the 3D shape reconstruction task. Compared to prototypical deep learning data-driven approaches trained on paired (supervised) data-labels (e.g. image-point clouds) datasets, our BDM brings in rich prior information from standalone labels (e.g. point clouds) to improve the bottom-up 3D reconstruction. As opposed to the standard Bayesian frameworks where explicit prior and likelihood are required for the inference, BDM performs seamless information fusion via coupled diffusion processes with learned gradient computation networks. The specialty of our BDM lies in its capability to engage the active and effective information exchange and fusion of the top-down and bottom-up processes where each itself is a diffusion process. We demonstrate state-of-the-art results on both synthetic and real-world benchmarks for 3D shape reconstruction.

* Accepted by CVPR 2024

Via

Access Paper or Ask Questions