Alert button
Picture for Shitong Shao

Shitong Shao

Alert button

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching

Nov 29, 2023
Shitong Shao, Zeyuan Yin, Muxin Zhou, Xindong Zhang, Zhiqiang Shen

The lightweight "local-match-global" matching introduced by SRe2L successfully creates a distilled dataset with comprehensive information on the full 224x224 ImageNet-1k. However, this one-sided approach is limited to a particular backbone, layer, and statistics, which limits the improvement of the generalization of a distilled dataset. We suggest that sufficient and various "local-match-global" matching are more precise and effective than a single one and has the ability to create a distilled dataset with richer information and better generalization. We call this perspective "generalized matching" and propose Generalized Various Backbone and Statistical Matching (G-VBSM) in this work, which aims to create a synthetic dataset with densities, ensuring consistency with the complete dataset across various backbones, layers, and statistics. As experimentally demonstrated, G-VBSM is the first algorithm to obtain strong performance across both small-scale and large-scale datasets. Specifically, G-VBSM achieves a performance of 38.7% on CIFAR-100 with 128-width ConvNet, 47.6% on Tiny-ImageNet with ResNet18, and 31.4% on the full 224x224 ImageNet-1k with ResNet18, under images per class (IPC) 10, 50, and 10, respectively. These results surpass all SOTA methods by margins of 3.9%, 6.5%, and 10.1%, respectively.

Viaarxiv icon

Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling

May 21, 2023
Shitong Shao, Xu Dai, Shouyi Yin, Lujun Li, Huanran Chen, Yang Hu

Figure 1 for Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling
Figure 2 for Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling
Figure 3 for Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling
Figure 4 for Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling

Diffusion Probability Models (DPMs) have made impressive advancements in various machine learning domains. However, achieving high-quality synthetic samples typically involves performing a large number of sampling steps, which impedes the possibility of real-time sample synthesis. Traditional accelerated sampling algorithms via knowledge distillation rely on pre-trained model weights and discrete time step scenarios, necessitating additional training sessions to achieve their goals. To address these issues, we propose the Catch-Up Distillation (CUD), which encourages the current moment output of the velocity estimation model ``catch up'' with its previous moment output. Specifically, CUD adjusts the original Ordinary Differential Equation (ODE) training objective to align the current moment output with both the ground truth label and the previous moment output, utilizing Runge-Kutta-based multi-step alignment distillation for precise ODE estimation while preventing asynchronous updates. Furthermore, we investigate the design space for CUDs under continuous time-step scenarios and analyze how to determine the suitable strategies. To demonstrate CUD's effectiveness, we conduct thorough ablation and comparison experiments on CIFAR-10, MNIST, and ImageNet-64. On CIFAR-10, we obtain a FID of 2.80 by sampling in 15 steps under one-session training and the new state-of-the-art FID of 3.37 by sampling in one step with additional training. This latter result necessitated only 62w iterations with a batch size of 128, in contrast to Consistency Distillation, which demanded 210w iterations with a larger batch size of 256. Our code is released at https://anonymous.4open.science/r/Catch-Up-Distillation-E31F.

Viaarxiv icon

Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation

May 13, 2023
Shuai Wang, Daoan Zhang, Zipei Yan, Shitong Shao, Rui Li

Figure 1 for Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation
Figure 2 for Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation
Figure 3 for Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation
Figure 4 for Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation

Source-free domain adaptation aims to adapt deep neural networks using only pre-trained source models and target data. However, accessing the source model still has a potential concern about leaking the source data, which reveals the patient's privacy. In this paper, we study the challenging but practical problem: black-box source-free domain adaptation where only the outputs of the source model and target data are available. We propose a simple but effective two-stage knowledge distillation method. In Stage \uppercase\expandafter{\romannumeral1}, we train the target model from scratch with soft pseudo-labels generated by the source model in a knowledge distillation manner. In Stage \uppercase\expandafter{\romannumeral2}, we initialize another model as the new student model to avoid the error accumulation caused by noisy pseudo-labels. We feed the images with weak augmentation to the teacher model to guide the learning of the student model. Our method is simple and flexible, and achieves surprising results on three cross-domain segmentation tasks.

* 10 pages,3 figures 
Viaarxiv icon

PELE scores: Pelvic X-ray Landmark Detection by Pelvis Extraction and Enhancement

May 07, 2023
Zhen Huang, Han Li, Shitong Shao, Heqin Zhu, Huijie Hu, Zhiwei Cheng, Jianji Wang, S. Kevin Zhou

Figure 1 for PELE scores: Pelvic X-ray Landmark Detection by Pelvis Extraction and Enhancement
Figure 2 for PELE scores: Pelvic X-ray Landmark Detection by Pelvis Extraction and Enhancement
Figure 3 for PELE scores: Pelvic X-ray Landmark Detection by Pelvis Extraction and Enhancement
Figure 4 for PELE scores: Pelvic X-ray Landmark Detection by Pelvis Extraction and Enhancement

The pelvis, the lower part of the trunk, supports and balances the trunk. Landmark detection from a pelvic X-ray (PXR) facilitates downstream analysis and computer-assisted diagnosis and treatment of pelvic diseases. Although PXRs have the advantages of low radiation and reduced cost compared to computed tomography (CT) images, their 2D pelvis-tissue superposition of 3D structures confuses clinical decision-making. In this paper, we propose a PELvis Extraction (PELE) module that utilizes 3D prior anatomical knowledge in CT to guide and well isolate the pelvis from PXRs, thereby eliminating the influence of soft tissue. We conduct an extensive evaluation based on two public datasets and one private dataset, totaling 850 PXRs. The experimental results show that the proposed PELE module significantly improves the accuracy of PXRs landmark detection and achieves state-of-the-art performances in several benchmark metrics, thus better serving downstream tasks.

* 13 pages, 5 figures 
Viaarxiv icon

DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models

Apr 26, 2023
Shitong Shao, Xiaohan Yuan, Zhen Huang, Ziming Qiu, Shuai Wang, Kevin Zhou

Figure 1 for DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models
Figure 2 for DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models
Figure 3 for DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models
Figure 4 for DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models

Dataset expansion can effectively alleviate the problem of data scarcity for medical image segmentation, due to privacy concerns and labeling difficulties. However, existing expansion algorithms still face great challenges due to their inability of guaranteeing the diversity of synthesized images with paired segmentation masks. In recent years, Diffusion Probabilistic Models (DPMs) have shown powerful image synthesis performance, even better than Generative Adversarial Networks. Based on this insight, we propose an approach called DiffuseExpand for expanding datasets for 2D medical image segmentation using DPM, which first samples a variety of masks from Gaussian noise to ensure the diversity, and then synthesizes images to ensure the alignment of images and masks. After that, DiffuseExpand chooses high-quality samples to further enhance the effectiveness of data expansion. Our comparison and ablation experiments on COVID-19 and CGMH Pelvis datasets demonstrate the effectiveness of DiffuseExpand. Our code is released at https://anonymous.4open.science/r/DiffuseExpand.

* 10 pages, 5 figures 
Viaarxiv icon

AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual Intrinsic Reward Mixing Network

Feb 19, 2023
Wei Li, Weiyan Liu, Shitong Shao, Shiyi Huang

Figure 1 for AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual Intrinsic Reward Mixing Network
Figure 2 for AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual Intrinsic Reward Mixing Network
Figure 3 for AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual Intrinsic Reward Mixing Network
Figure 4 for AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual Intrinsic Reward Mixing Network

Deducing the contribution of each agent and assigning the corresponding reward to them is a crucial problem in cooperative Multi-Agent Reinforcement Learning (MARL). Previous studies try to resolve the issue through designing an intrinsic reward function, but the intrinsic reward is simply combined with the environment reward by summation in these studies, which makes the performance of their MARL framework unsatisfactory. We propose a novel method named Attention Individual Intrinsic Reward Mixing Network (AIIR-MIX) in MARL, and the contributions of AIIR-MIX are listed as follows:(a) we construct a novel intrinsic reward network based on the attention mechanism to make teamwork more effective. (b) we propose a Mixing network that is able to combine intrinsic and extrinsic rewards non-linearly and dynamically in response to changing conditions of the environment. We compare AIIR-MIX with many State-Of-The-Art (SOTA) MARL methods on battle games in StarCraft II. And the results demonstrate that AIIR-MIX performs admirably and can defeat the current advanced methods on average test win rate. To validate the effectiveness of AIIR-MIX, we conduct additional ablation studies. The results show that AIIR-MIX can dynamically assign each agent a real-time intrinsic reward in accordance with their actual contribution.

Viaarxiv icon

Learning What You Should Learn

Dec 11, 2022
Shitong Shao, Huanran Chen, Zhen Huang, Linrui Gong, Shuai Wang, Xinxiao Wu

Figure 1 for Learning What You Should Learn
Figure 2 for Learning What You Should Learn
Figure 3 for Learning What You Should Learn
Figure 4 for Learning What You Should Learn

In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This method gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the approach to the knowledge distillation framework and propose a data-based distillation method named ``Teaching what you Should Teach (TST)''. To be specific, TST contains a neural network-based data augmentation module with the priori bias, which can assist in finding what the teacher is good at while the student are not by learning magnitudes and probabilities to generate suitable samples. By training the data augmentation module and the generalized distillation paradigm in turn, a student model that has excellent generalization ability can be created. To verify the effectiveness of TST, we conducted extensive comparative experiments on object recognition (CIFAR-100 and ImageNet-1k), detection (MS-COCO), and segmentation (Cityscapes) tasks. As experimentally demonstrated, TST achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct intriguing studies of TST, including how to solve the performance degradation caused by the stronger teacher and what magnitudes and probabilities are needed for the distillation framework.

* 13 pages, 4 figures 
Viaarxiv icon

Bootstrap Generalization Ability from Loss Landscape Perspective

Sep 18, 2022
Huanran Chen, Shitong Shao, Ziyi Wang, Zirui Shang, Jin Chen, Xiaofeng Ji, Xinxiao Wu

Figure 1 for Bootstrap Generalization Ability from Loss Landscape Perspective
Figure 2 for Bootstrap Generalization Ability from Loss Landscape Perspective
Figure 3 for Bootstrap Generalization Ability from Loss Landscape Perspective
Figure 4 for Bootstrap Generalization Ability from Loss Landscape Perspective

Domain generalization aims to learn a model that can generalize well on the unseen test dataset, i.e., out-of-distribution data, which has different distribution from the training dataset. To address domain generalization in computer vision, we introduce the loss landscape theory into this field. Specifically, we bootstrap the generalization ability of the deep learning model from the loss landscape perspective in four aspects, including backbone, regularization, training paradigm, and learning rate. We verify the proposed theory on the NICO++, PACS, and VLCS datasets by doing extensive ablation studies as well as visualizations. In addition, we apply this theory in the ECCV 2022 NICO Challenge1 and achieve the 3rd place without using any domain invariant methods.

* 18 pages, 4 figures 
Viaarxiv icon