Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiqiang Shen

Northeastern University, Shenyang, China, Key Laboratory of Intelligent Computing in Medical Image, Shenyang, China

Empowering Graph Invariance Learning with Deep Spurious Infomax

Jul 13, 2024

Tianjun Yao, Yongqiang Chen, Zhenhao Chen, Kai Hu, Zhiqiang Shen, Kun Zhang

Figure 1 for Empowering Graph Invariance Learning with Deep Spurious Infomax

Figure 2 for Empowering Graph Invariance Learning with Deep Spurious Infomax

Figure 3 for Empowering Graph Invariance Learning with Deep Spurious Infomax

Figure 4 for Empowering Graph Invariance Learning with Deep Spurious Infomax

Abstract:Recently, there has been a surge of interest in developing graph neural networks that utilize the invariance principle on graphs to generalize the out-of-distribution (OOD) data. Due to the limited knowledge about OOD data, existing approaches often pose assumptions about the correlation strengths of the underlying spurious features and the target labels. However, this prior is often unavailable and will change arbitrarily in the real-world scenarios, which may lead to severe failures of the existing graph invariance learning methods. To bridge this gap, we introduce a novel graph invariance learning paradigm, which induces a robust and general inductive bias. The paradigm is built upon the observation that the infomax principle encourages learning spurious features regardless of spurious correlation strengths. We further propose the EQuAD framework that realizes this learning paradigm and employs tailored learning objectives that provably elicit invariant features by disentangling them from the spurious features learned through infomax. Notably, EQuAD shows stable and enhanced performance across different degrees of bias in synthetic datasets and challenging real-world datasets up to $31.76\%$. Our code is available at \url{https://github.com/tianyao-aka/EQuAD}.

* ICML2024 camera-ready version

Via

Access Paper or Ask Questions

FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

Jul 09, 2024

Liqun Ma, Mingjie Sun, Zhiqiang Shen

Figure 1 for FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

Figure 2 for FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

Figure 3 for FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

Figure 4 for FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

Abstract:This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for the first time how to train a large-scale binary language model from scratch (not the partial binary or ternary LLM like BitNet b1.58) to match the performance of its full-precision counterparts (e.g., FP16 or BF16) in transformer-based LLMs. It achieves this by employing an autoregressive distillation (AD) loss with maintaining equivalent model dimensions (130M, 1.3B, 7B) and training data volume as regular LLM pretraining, while delivering competitive results in terms of perplexity and task-specific effectiveness. Intriguingly, by analyzing the training trajectory, we find that the pretrained weight is not necessary for training binarized LLMs from scratch. This research encourages a new computational framework and may facilitate the future design of specialized hardware tailored for fully 1-bit LLMs. We make all models, code, and training dataset fully accessible and transparent to support further research (Code: https://github.com/LiqunMa/FBI-LLM. Model: https://huggingface.co/LiqunMa/).

* Github at https://github.com/LiqunMa/FBI-LLM

Via

Access Paper or Ask Questions

Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation

Jul 07, 2024

Junming Su, Zhiqiang Shen, Peng Cao, Jinzhu Yang, Osmar R. Zaiane

Figure 1 for Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation

Figure 2 for Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation

Figure 3 for Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation

Figure 4 for Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation

Abstract:The existing barely-supervised medical image segmentation (BSS) methods, adopting a registration-segmentation paradigm, aim to learn from data with very few annotations to mitigate the extreme label scarcity problem. However, this paradigm poses a challenge: pseudo-labels generated by image registration come with significant noise. To address this issue, we propose a self-paced sample selection framework (SPSS) for BSS. Specifically, SPSS comprises two main components: 1) self-paced uncertainty sample selection (SU) for explicitly improving the quality of pseudo labels in the image space, and 2) self-paced bidirectional feature contrastive learning (SC) for implicitly improving the quality of pseudo labels through enhancing the separability between class semantics in the feature space. Both SU and SC are trained collaboratively in a self-paced learning manner, ensuring that SPSS can learn from high-quality pseudo labels for BSS. Extensive experiments on two public medical image segmentation datasets demonstrate the effectiveness and superiority of SPSS over the state-of-the-art. Our code is release at https://github.com/SuuuJM/SPSS.

* Accepted to MICCAI 2024

Via

Access Paper or Ask Questions

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Jun 28, 2024

Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li(+7 more)

Figure 1 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Figure 2 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Figure 3 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Figure 4 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Abstract:Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-to-code dataset for instruction tuning and an evaluation framework for the webpage understanding and HTML code translation abilities of MLLMs. For dataset construction, we leverage pretrained LLMs to enhance existing webpage-to-code datasets as well as generate a diverse pool of new webpages rendered into images. Specifically, the inputs are webpage images and instructions, while the responses are the webpage's HTML code. We further include diverse natural language QA pairs about the webpage content in the responses to enable a more comprehensive understanding of the web content. To evaluate model performance in these tasks, we develop an evaluation framework for testing MLLMs' abilities in webpage understanding and web-to-code generation. Extensive experiments show that our proposed dataset is beneficial not only to our proposed tasks but also in the general visual domain, while previous datasets result in worse performance. We hope our work will contribute to the development of general MLLMs suitable for web-based content generation and task automation. Our data and code will be available at https://github.com/MBZUAI-LLM/web2code.

* Website at https://mbzuai-llm.github.io/webpage2code/

Via

Access Paper or Ask Questions

Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena

Jun 11, 2024

Aidar Myrzakhan, Sondos Mahmoud Bsharat, Zhiqiang Shen

Figure 1 for Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena

Figure 2 for Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena

Figure 3 for Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena

Figure 4 for Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena

Abstract:Multiple-choice questions (MCQ) are frequently used to assess large language models (LLMs). Typically, an LLM is given a question and selects the answer deemed most probable after adjustments for factors like length. Unfortunately, LLMs may inherently favor certain answer choice IDs, such as A/B/C/D, due to inherent biases of priori unbalanced probabilities, influencing the prediction of answers based on these IDs. Previous research has introduced methods to reduce this ''selection bias'' by simply permutating options on a few test samples and applying to new ones. Another problem of MCQ is the lottery ticket choice by ''random guessing''. The LLM does not learn particular knowledge, but the option is guessed correctly. This situation is especially serious for those small-scale LLMs. To address them, a more thorough approach involves shifting from MCQ to open-style questions, which can fundamentally eliminate selection bias and random guessing issues. However, transitioning causes its own set of challenges in (1) identifying suitable open-style questions and (2) validating the correctness of LLM open-style responses against human-annotated ground-truths. This work aims to tackle these significant difficulties, and establish a new LLM evaluation benchmark through entirely open-style questions. Consequently, we introduce the Open-LLM-Leaderboard to track various LLMs' performance and reflect true capability of them, such as GPT-4o/4/3.5, Claude 3, Gemini, etc. Our code and dataset are available at https://github.com/VILA-Lab/Open-LLM-Leaderboard.

* Code and dataset are available at https://github.com/VILA-Lab/Open-LLM-Leaderboard

Via

Access Paper or Ask Questions

Rethinking Barely-Supervised Segmentation from an Unsupervised Domain Adaptation Perspective

May 16, 2024

Zhiqiang Shen, Peng Cao, Junming Su, Jinzhu Yang, Osmar R. Zaiane

Abstract:This paper investigates an extremely challenging problem, barely-supervised medical image segmentation (BSS), where the training dataset comprises limited labeled data with only single-slice annotations and numerous unlabeled images. Currently, state-of-the-art (SOTA) BSS methods utilize a registration-based paradigm, depending on image registration to propagate single-slice annotations into volumetric pseudo labels for constructing a complete labeled set. However, this paradigm has a critical limitation: the pseudo labels generated by image registration are unreliable and noisy. Motivated by this, we propose a new perspective: training a model using only single-annotated slices as the labeled set without relying on image registration. To this end, we formulate BSS as an unsupervised domain adaptation (UDA) problem. Specifically, we first design a novel noise-free labeled data construction algorithm (NFC) for slice-to-volume labeled data synthesis, which may result in a side effect: domain shifts between the synthesized images and the original images. Then, a frequency and spatial mix-up strategy (FSX) is further introduced to mitigate the domain shifts for UDA. Extensive experiments demonstrate that our method provides a promising alternative for BSS. Remarkably, the proposed method with only one labeled slice achieves an 80.77% dice score on left atrial segmentation, outperforming the SOTA by 61.28%. The code will be released upon the publication of this paper.

Via

Access Paper or Ask Questions

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

May 15, 2024

Kai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Yining Li, Kai Chen, Zhiqiang Shen, Matt Fredrikson

Figure 1 for Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

Figure 2 for Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

Figure 3 for Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

Figure 4 for Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

Abstract:Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which effectively jailbreaks several open-source LLMs. Our approach relaxes the discrete jailbreak optimization into a continuous optimization and progressively increases the sparsity of the optimizing vectors. Consequently, our method effectively bridges the gap between discrete and continuous space optimization. Experimental results demonstrate that our method is more effective and efficient than existing token-level methods. On Harmbench, our method achieves state of the art attack success rate on seven out of eight LLMs. Code will be made available. Trigger Warning: This paper contains model behavior that can be offensive in nature.

Via

Access Paper or Ask Questions

Elucidating the Design Space of Dataset Condensation

Apr 21, 2024

Shitong Shao, Zikai Zhou, Huanran Chen, Zhiqiang Shen

Figure 1 for Elucidating the Design Space of Dataset Condensation

Figure 2 for Elucidating the Design Space of Dataset Condensation

Figure 3 for Elucidating the Design Space of Dataset Condensation

Figure 4 for Elucidating the Design Space of Dataset Condensation

Abstract:Dataset condensation, a concept within data-centric learning, efficiently transfers critical attributes from an original dataset to a synthetic version, maintaining both diversity and realism. This approach significantly improves model training efficiency and is adaptable across multiple application areas. Previous methods in dataset condensation have faced challenges: some incur high computational costs which limit scalability to larger datasets (e.g., MTT, DREAM, and TESLA), while others are restricted to less optimal design spaces, which could hinder potential improvements, especially in smaller datasets (e.g., SRe2L, G-VBSM, and RDED). To address these limitations, we propose a comprehensive design framework that includes specific, effective strategies like implementing soft category-aware matching and adjusting the learning rate schedule. These strategies are grounded in empirical evidence and theoretical backing. Our resulting approach, Elucidate Dataset Condensation (EDC), establishes a benchmark for both small and large-scale dataset condensation. In our testing, EDC achieves state-of-the-art accuracy, reaching 48.6% on ImageNet-1k with a ResNet-18 model at an IPC of 10, which corresponds to a compression ratio of 0.78%. This performance exceeds those of SRe2L, G-VBSM, and RDED by margins of 27.3%, 17.2%, and 6.6%, respectively.

Via

Access Paper or Ask Questions

TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment

Apr 17, 2024

Qinfeng Li, Zhiqiang Shen, Zhenghan Qin, Yangfan Xie, Xuhong Zhang, Tianyu Du, Jianwei Yin

Figure 1 for TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment

Figure 2 for TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment

Figure 3 for TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment

Figure 4 for TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment

Abstract:Proprietary large language models (LLMs) have been widely applied in various scenarios. Additionally, deploying LLMs on edge devices is trending for efficiency and privacy reasons. However, edge deployment of proprietary LLMs introduces new security challenges: edge-deployed models are exposed as white-box accessible to users, enabling adversaries to conduct effective model stealing (MS) attacks. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify four critical protection properties that existing methods fail to simultaneously satisfy: (1) maintaining protection after a model is physically copied; (2) authorizing model access at request level; (3) safeguarding runtime reverse engineering; (4) achieving high security with negligible runtime overhead. To address the above issues, we propose TransLinkGuard, a plug-and-play model protection approach against model stealing on edge devices. The core part of TransLinkGuard is a lightweight authorization module residing in a secure environment, e.g., TEE. The authorization module can freshly authorize each request based on its input. Extensive experiments show that TransLinkGuard achieves the same security protection as the black-box security guarantees with negligible overhead.

* arXiv admin note: text overlap with arXiv:2310.07152 by other authors

Via

Access Paper or Ask Questions

Self-supervised Dataset Distillation: A Good Compression Is All You Need

Apr 11, 2024

Muxin Zhou, Zeyuan Yin, Shitong Shao, Zhiqiang Shen

Figure 1 for Self-supervised Dataset Distillation: A Good Compression Is All You Need

Figure 2 for Self-supervised Dataset Distillation: A Good Compression Is All You Need

Figure 3 for Self-supervised Dataset Distillation: A Good Compression Is All You Need

Figure 4 for Self-supervised Dataset Distillation: A Good Compression Is All You Need

Abstract:Dataset distillation aims to compress information from a large-scale original dataset to a new compact dataset while striving to preserve the utmost degree of the original data informational essence. Previous studies have predominantly concentrated on aligning the intermediate statistics between the original and distilled data, such as weight trajectory, features, gradient, BatchNorm, etc. In this work, we consider addressing this task through the new lens of model informativeness in the compression stage on the original dataset pretraining. We observe that with the prior state-of-the-art SRe$^2$L, as model sizes increase, it becomes increasingly challenging for supervised pretrained models to recover learned information during data synthesis, as the channel-wise mean and variance inside the model are flatting and less informative. We further notice that larger variances in BN statistics from self-supervised models enable larger loss signals to update the recovered data by gradients, enjoying more informativeness during synthesis. Building on this observation, we introduce SC-DD, a simple yet effective Self-supervised Compression framework for Dataset Distillation that facilitates diverse information compression and recovery compared to traditional supervised learning schemes, further reaps the potential of large pretrained models with enhanced capabilities. Extensive experiments are conducted on CIFAR-100, Tiny-ImageNet and ImageNet-1K datasets to demonstrate the superiority of our proposed approach. The proposed SC-DD outperforms all previous state-of-the-art supervised dataset distillation methods when employing larger models, such as SRe$^2$L, MTT, TESLA, DC, CAFE, etc., by large margins under the same recovery and post-training budgets. Code is available at https://github.com/VILA-Lab/SRe2L/tree/main/SCDD/.

Via

Access Paper or Ask Questions