Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Wang

School of Physics and Astronomy, Shanghai Jiao Tong University, State Key Laboratory of Dark Matter Physics, Shanghai Jiao Tong University, Tsung-Dao Lee Institute, Shanghai Jiao Tong University

Ultra-High-Definition Restoration: New Benchmarks and A Dual Interaction Prior-Driven Solution

Jun 19, 2024

Liyan Wang, Cong Wang, Jinshan Pan, Weixiang Zhou, Xiaoran Sun, Wei Wang, Zhixun Su

Abstract:Ultra-High-Definition (UHD) image restoration has acquired remarkable attention due to its practical demand. In this paper, we construct UHD snow and rain benchmarks, named UHD-Snow and UHD-Rain, to remedy the deficiency in this field. The UHD-Snow/UHD-Rain is established by simulating the physics process of rain/snow into consideration and each benchmark contains 3200 degraded/clear image pairs of 4K resolution. Furthermore, we propose an effective UHD image restoration solution by considering gradient and normal priors in model design thanks to these priors' spatial and detail contributions. Specifically, our method contains two branches: (a) feature fusion and reconstruction branch in high-resolution space and (b) prior feature interaction branch in low-resolution space. The former learns high-resolution features and fuses prior-guided low-resolution features to reconstruct clear images, while the latter utilizes normal and gradient priors to mine useful spatial features and detail features to guide high-resolution recovery better. To better utilize these priors, we introduce single prior feature interaction and dual prior feature interaction, where the former respectively fuses normal and gradient priors with high-resolution features to enhance prior ones, while the latter calculates the similarity between enhanced prior ones and further exploits dual guided filtering to boost the feature interaction of dual priors. We conduct experiments on both new and existing public datasets and demonstrate the state-of-the-art performance of our method on UHD image low-light enhancement, UHD image desonwing, and UHD image deraining. The source codes and benchmarks are available at \url{https://github.com/wlydlut/UHDDIP}.

Via

Access Paper or Ask Questions

Stealth edits for provably fixing or attacking large language models

Jun 18, 2024

Oliver J. Sutton, Qinghua Zhou, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Y. Tyukin

Figure 1 for Stealth edits for provably fixing or attacking large language models

Figure 2 for Stealth edits for provably fixing or attacking large language models

Figure 3 for Stealth edits for provably fixing or attacking large language models

Figure 4 for Stealth edits for provably fixing or attacking large language models

Abstract:We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundamental to predicting the success of popular editing approaches, and reveals new bridges between disparate families of editing methods. We collectively refer to these approaches as stealth editing methods, because they aim to directly and inexpensively update a model's weights to correct the model's responses to known hallucinating prompts without otherwise affecting the model's behaviour, without requiring retraining. By carefully applying the insight gleaned from our theoretical investigation, we are able to introduce a new network block -- named a jet-pack block -- which is optimised for highly selective model editing, uses only standard network operations, and can be inserted into existing networks. The intrinsic dimensionality metric also determines the vulnerability of a language model to a stealth attack: a small change to a model's weights which changes its response to a single attacker-chosen prompt. Stealth attacks do not require access to or knowledge of the model's training data, therefore representing a potent yet previously unrecognised threat to redistributed foundation models. They are computationally simple enough to be implemented in malware in many cases. Extensive experimental results illustrate and support the method and its theoretical underpinnings. Demos and source code for editing language models are available at https://github.com/qinghua-zhou/stealth-edits.

* 24 pages, 9 figures. Open source implementation: https://github.com/qinghua-zhou/stealth-edits

Via

Access Paper or Ask Questions

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Jun 16, 2024

Yu Zhang, Xiusi Chen, Bowen Jin, Sheng Wang, Shuiwang Ji, Wei Wang, Jiawei Han

Figure 1 for A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Abstract:In many scientific fields, large language models (LLMs) have revolutionized the way with which text and other modalities of data (e.g., molecules and proteins) are dealt, achieving superior performance in various applications and augmenting the scientific discovery process. Nevertheless, previous surveys on scientific LLMs often concentrate on one to two fields or a single modality. In this paper, we aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs regarding their architectures and pre-training techniques. To this end, we comprehensively survey over 250 scientific LLMs, discuss their commonalities and differences, as well as summarize pre-training datasets and evaluation tasks for each field and modality. Moreover, we investigate how LLMs have been deployed to benefit scientific discovery. Resources related to this survey are available at https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models.

* 33 pages (GitHub: https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models)

Via

Access Paper or Ask Questions

Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals

Jun 16, 2024

Lida Chen, Zujie Liang, Xintao Wang, Jiaqing Liang, Yanghua Xiao, Feng Wei, Jinglei Chen, Zhenghong Hao, Bing Han, Wei Wang

Abstract:Large language models (LLMs) have achieved great success, but their occasional content fabrication, or hallucination, limits their practical application. Hallucination arises because LLMs struggle to admit ignorance due to inadequate training on knowledge boundaries. We call it a limitation of LLMs that they can not accurately express their knowledge boundary, answering questions they know while admitting ignorance to questions they do not know. In this paper, we aim to teach LLMs to recognize and express their knowledge boundary, so they can reduce hallucinations caused by fabricating when they do not know. We propose CoKE, which first probes LLMs' knowledge boundary via internal confidence given a set of questions, and then leverages the probing results to elicit the expression of the knowledge boundary. Extensive experiments show CoKE helps LLMs express knowledge boundaries, answering known questions while declining unknown ones, significantly improving in-domain and out-of-domain performance.

Via

Access Paper or Ask Questions

Bayesian Intervention Optimization for Causal Discovery

Jun 16, 2024

Yuxuan Wang, Mingzhou Liu, Xinwei Sun, Wei Wang, Yizhou Wang

Figure 1 for Bayesian Intervention Optimization for Causal Discovery

Figure 2 for Bayesian Intervention Optimization for Causal Discovery

Figure 3 for Bayesian Intervention Optimization for Causal Discovery

Figure 4 for Bayesian Intervention Optimization for Causal Discovery

Abstract:Causal discovery is crucial for understanding complex systems and informing decisions. While observational data can uncover causal relationships under certain assumptions, it often falls short, making active interventions necessary. Current methods, such as Bayesian and graph-theoretical approaches, do not prioritize decision-making and often rely on ideal conditions or information gain, which is not directly related to hypothesis testing. We propose a novel Bayesian optimization-based method inspired by Bayes factors that aims to maximize the probability of obtaining decisive and correct evidence. Our approach uses observational data to estimate causal models under different hypotheses, evaluates potential interventions pre-experimentally, and iteratively updates priors to refine interventions. We demonstrate the effectiveness of our method through various experiments. Our contributions provide a robust framework for efficient causal discovery through active interventions, enhancing the practical application of theoretical advancements.

Via

Access Paper or Ask Questions

CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

Jun 14, 2024

Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei Ping, Timothy S Chang, Wei Wang

Figure 1 for CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

Figure 2 for CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

Figure 3 for CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

Figure 4 for CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

Abstract:The integration of Artificial Intelligence (AI), especially Large Language Models (LLMs), into the clinical diagnosis process offers significant potential to improve the efficiency and accessibility of medical care. While LLMs have shown some promise in the medical domain, their application in clinical diagnosis remains underexplored, especially in real-world clinical practice, where highly sophisticated, patient-specific decisions need to be made. Current evaluations of LLMs in this field are often narrow in scope, focusing on specific diseases or specialties and employing simplified diagnostic tasks. To bridge this gap, we introduce CliBench, a novel benchmark developed from the MIMIC IV dataset, offering a comprehensive and realistic assessment of LLMs' capabilities in clinical diagnosis. This benchmark not only covers diagnoses from a diverse range of medical cases across various specialties but also incorporates tasks of clinical significance: treatment procedure identification, lab test ordering and medication prescriptions. Supported by structured output ontologies, CliBench enables a precise and multi-granular evaluation, offering an in-depth understanding of LLM's capability on diverse clinical tasks of desired granularity. We conduct a zero-shot evaluation of leading LLMs to assess their proficiency in clinical decision-making. Our preliminary results shed light on the potential and limitations of current LLMs in clinical settings, providing valuable insights for future advancements in LLM-powered healthcare.

* Project page: https://clibench.github.io

Via

Access Paper or Ask Questions

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Jun 10, 2024

Khiem Le, Zhichun Guo, Kaiwen Dong, Xiaobao Huang, Bozhao Nan, Roshni Iyer, Xiangliang Zhang, Olaf Wiest, Wei Wang, Nitesh V. Chawla

Figure 1 for MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Figure 2 for MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Figure 3 for MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Figure 4 for MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Abstract:Recently, Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving professional molecule-related tasks. This challenge is attributed to their inherent limitations in comprehending molecules using only common textual representations, i.e., SMILES strings. In this study, we seek to enhance the ability of LLMs to comprehend molecules by designing and equipping them with a multi-modal external module, namely MolX. In particular, instead of directly using a SMILES string to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations for feeding into an LLM. Moreover, a human-defined molecular fingerprint is incorporated to leverage its embedded domain knowledge. Then, to establish an alignment between MolX and the LLM's textual input space, the whole model in which the LLM is frozen, is pre-trained with a versatile strategy including a diverse set of tasks. Extensive experimental evaluations demonstrate that our proposed method only introduces a small number of trainable parameters while outperforming baselines on various downstream molecule-related tasks ranging from molecule-to-text translation to retrosynthesis, with and without fine-tuning the LLM.

Via

Access Paper or Ask Questions

Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning

Jun 10, 2024

Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Yongsheng Zhu, Guangquan Xu, Jiqiang Liu, Xiangliang Zhang

Figure 1 for Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning

Figure 2 for Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning

Figure 3 for Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning

Figure 4 for Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning

Abstract:Federated Learning (FL) is a collaborative machine learning technique where multiple clients work together with a central server to train a global model without sharing their private data. However, the distribution shift across non-IID datasets of clients poses a challenge to this one-model-fits-all method hindering the ability of the global model to effectively adapt to each client's unique local data. To echo this challenge, personalized FL (PFL) is designed to allow each client to create personalized local models tailored to their private data. While extensive research has scrutinized backdoor risks in FL, it has remained underexplored in PFL applications. In this study, we delve deep into the vulnerabilities of PFL to backdoor attacks. Our analysis showcases a tale of two cities. On the one hand, the personalization process in PFL can dilute the backdoor poisoning effects injected into the personalized local models. Furthermore, PFL systems can also deploy both server-end and client-end defense mechanisms to strengthen the barrier against backdoor attacks. On the other hand, our study shows that PFL fortified with these defense methods may offer a false sense of security. We propose \textit{PFedBA}, a stealthy and effective backdoor attack strategy applicable to PFL systems. \textit{PFedBA} ingeniously aligns the backdoor learning task with the main learning task of PFL by optimizing the trigger generation process. Our comprehensive experiments demonstrate the effectiveness of \textit{PFedBA} in seamlessly embedding triggers into personalized local models. \textit{PFedBA} yields outstanding attack performance across 10 state-of-the-art PFL algorithms, defeating the existing 6 defense mechanisms. Our study sheds light on the subtle yet potent backdoor threats to PFL systems, urging the community to bolster defenses against emerging backdoor challenges.

* Accepted by Usenix Security 2024

Via

Access Paper or Ask Questions

M2NO: Multiresolution Operator Learning with Multiwavelet-based Algebraic Multigrid Method

Jun 07, 2024

Zhihao Li, Zhilu Lai, Xiaobo Wang, Wei Wang

Figure 1 for M2NO: Multiresolution Operator Learning with Multiwavelet-based Algebraic Multigrid Method

Figure 2 for M2NO: Multiresolution Operator Learning with Multiwavelet-based Algebraic Multigrid Method

Figure 3 for M2NO: Multiresolution Operator Learning with Multiwavelet-based Algebraic Multigrid Method

Figure 4 for M2NO: Multiresolution Operator Learning with Multiwavelet-based Algebraic Multigrid Method

Abstract:Solving partial differential equations (PDEs) effectively necessitates a multi-scale approach, particularly critical in high-dimensional scenarios characterized by increasing grid points or resolution. Traditional methods often fail to capture the detailed features necessary for accurate modeling, presenting a significant challenge in scientific computing. In response, we introduce the Multiwavelet-based Algebraic Multigrid Neural Operator (M2NO), a novel deep learning framework that synergistically combines multiwavelet transformations and algebraic multigrid (AMG) techniques. By exploiting the inherent similarities between these two approaches, M2NO overcomes their individual limitations and enhances precision and flexibility across various PDE benchmarks. Employing Multiresolution Analysis (MRA) with high-pass and low-pass filters, the model executes hierarchical decomposition to accurately delineate both global trends and localized details within PDE solutions, supporting adaptive data representation at multiple scales. M2NO also automates node selection and adeptly manages complex boundary conditions through its multiwavelet-based operators. Extensive evaluations on a diverse array of PDE datasets with different boundary conditions confirm M2NO's superior performance. Furthermore, M2NO excels in handling high-resolution and super-resolution tasks, consistently outperforming competing models and demonstrating robust adaptability in complex computational scenarios.

Via

Access Paper or Ask Questions

FourierKAN-GCF: Fourier Kolmogorov-Arnold Network -- An Effective and Efficient Feature Transformation for Graph Collaborative Filtering

Jun 04, 2024

Jinfeng Xu, Zheyu Chen, Jinze Li, Shuo Yang, Wei Wang, Xiping Hu, Edith C. -H. Ngai

Figure 1 for FourierKAN-GCF: Fourier Kolmogorov-Arnold Network -- An Effective and Efficient Feature Transformation for Graph Collaborative Filtering

Figure 2 for FourierKAN-GCF: Fourier Kolmogorov-Arnold Network -- An Effective and Efficient Feature Transformation for Graph Collaborative Filtering

Figure 3 for FourierKAN-GCF: Fourier Kolmogorov-Arnold Network -- An Effective and Efficient Feature Transformation for Graph Collaborative Filtering

Figure 4 for FourierKAN-GCF: Fourier Kolmogorov-Arnold Network -- An Effective and Efficient Feature Transformation for Graph Collaborative Filtering

Abstract:Graph Collaborative Filtering (GCF) has achieved state-of-the-art performance for recommendation tasks. However, most GCF structures simplify the feature transformation and nonlinear operation during message passing in the graph convolution network (GCN). We revisit these two components and discover that a part of feature transformation and nonlinear operation during message passing in GCN can improve the representation of GCF, but increase the difficulty of training. In this work, we propose a simple and effective graph-based recommendation model called FourierKAN-GCF. Specifically, it utilizes a novel Fourier Kolmogorov-Arnold Network (KAN) to replace the multilayer perceptron (MLP) as a part of the feature transformation during message passing in GCN, which improves the representation power of GCF and is easy to train. We further employ message dropout and node dropout strategies to improve the representation power and robustness of the model. Extensive experiments on two public datasets demonstrate the superiority of FourierKAN-GCF over most state-of-the-art methods. The implementation code is available at https://github.com/Jinfeng-Xu/FKAN-GCF.

Via

Access Paper or Ask Questions