Person re-identification (Re-ID) has achieved great success in the supervised scenario. However, it is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains. In this paper, we aim to tackle the generalizable multi-source person Re-ID task (i.e., there are multiple available source domains, and the testing domain is unseen during training) from the data augmentation perspective, thus we put forward a novel method, termed MixNorm, which consists of domain-aware mix-normalization (DMN) and domain-ware center regularization (DCR). Different from the conventional data augmentation, the proposed domain-aware mix-normalization to enhance the diversity of features during training from the normalization view of the neural network, which can effectively alleviate the model overfitting to the source domains, so as to boost the generalization capability of the model in the unseen domain. To better learn the domain-invariant model, we further develop the domain-aware center regularization to better map the produced diverse features into the same space. Extensive experiments on multiple benchmark datasets validate the effectiveness of the proposed method and show that the proposed method can outperform the state-of-the-art methods. Besides, further analysis also reveals the superiority of the proposed method.
Aiming to generalize the model trained in source domains to unseen target domains, domain generalization (DG) has attracted lots of attention recently. The key issue of DG is how to prevent overfitting to the observed source domains because target domain is unavailable during training. We investigate that overfitting not only causes the inferior generalization ability to unseen target domains but also leads unstable prediction in the test stage. In this paper, we observe that both sampling multiple tasks in training stage and generating augmented images in test stage largely benefit generalization performance. Thus, by treating tasks and images as different views, we propose a novel multi-view DG framework. Specifically, in training stage, to enhance generalization ability, we develop a multi-view regularized meta-learning algorithm that employs multiple tasks to produce a suitable optimization direction during updating model. In test stage, to alleviate unstable prediction, we utilize multiple augmented images to yield multi-view prediction, which significantly promotes model reliability via fusing the results of different views of a test image. Extensive experiments on three benchmark datasets validate our method outperforms several state-of-the-art approaches.
For medical image segmentation, imagine if a model was only trained using MR images in source domain, how about its performance to directly segment CT images in target domain? This setting, namely generalizable cross-modality segmentation, owning its clinical potential, is much more challenging than other related settings, e.g., domain adaptation. To achieve this goal, we in this paper propose a novel dual-normalization module by leveraging the augmented source-similar and source-dissimilar images during our generalizable segmentation. To be specific, given a single source domain, aiming to simulate the possible appearance change in unseen target domains, we first utilize a nonlinear transformation to augment source-similar and source-dissimilar images. Then, to sufficiently exploit these two types of augmentations, our proposed dual-normalization based model employs a shared backbone yet independent batch normalization layer for separate normalization. Afterwards, we put forward a style-based selection scheme to automatically choose the appropriate path in the test stage. Extensive experiments on three publicly available datasets, i.e., BraTS, Cross-Modality Cardiac and Abdominal Multi-Organ dataset, have demonstrated that our method outperforms other state-of-the-art domain generalization methods.
By training a model on multiple observed source domains, domain generalization aims to generalize well to arbitrary unseen target domains without further training. Existing works mainly focus on learning domain-invariant features to improve the generalization ability. However, since target domain is not available during training, previous methods inevitably suffer from overfitting in source domains. To tackle this issue, we develop an effective dropout-based framework to enlarge the region of the model's attention, which can effectively mitigate the overfitting problem. Particularly, different from the typical dropout scheme, which normally conducts the dropout on the fixed layer, first, we randomly select one layer, and then we randomly select its channels to conduct dropout. Besides, we leverage the progressive scheme to add the ratio of the dropout during training, which can gradually boost the difficulty of training model to enhance the robustness of the model. Moreover, to further alleviate the impact of the overfitting issue, we leverage the augmentation schemes on image-level and feature-level to yield a strong baseline model. We conduct extensive experiments on multiple benchmark datasets, which show our method can outperform the state-of-the-art methods.
Domain generalization (DG) has attracted much attention in person re-identification (ReID) recently. It aims to make a model trained on multiple source domains generalize to an unseen target domain. Although achieving promising progress, existing methods usually need the source domains to be labeled, which could be a significant burden for practical ReID tasks. In this paper, we turn to investigate unsupervised domain generalization for ReID, by assuming that no label is available for any source domains. To address this challenging setting, we propose a simple and efficient domain-specific adaptive framework, and realize it with an adaptive normalization module designed upon the batch and instance normalization techniques. In doing so, we successfully yield reliable pseudo-labels to implement training and also enhance the domain generalization capability of the model as required. In addition, we show that our framework can even be applied to improve person ReID under the settings of supervised domain generalization and unsupervised domain adaptation, demonstrating competitive performance with respect to relevant methods. Extensive experimental study on benchmark datasets is conducted to validate the proposed framework. A significance of our work lies in that it shows the potential of unsupervised domain generalization for person ReID and sets a strong baseline for the further research on this topic.
In semi-supervised medical image segmentation, most previous works draw on the common assumption that higher entropy means higher uncertainty. In this paper, we investigate a novel method of estimating uncertainty. We observe that, when assigned different misclassification costs in a certain degree, if the segmentation result of a pixel becomes inconsistent, this pixel shows a relative uncertainty in its segmentation. Therefore, we present a new semi-supervised segmentation model, namely, conservative-radical network (CoraNet in short) based on our uncertainty estimation and separate self-training strategy. In particular, our CoraNet model consists of three major components: a conservative-radical module (CRM), a certain region segmentation network (C-SN), and an uncertain region segmentation network (UC-SN) that could be alternatively trained in an end-to-end manner. We have extensively evaluated our method on various segmentation tasks with publicly available benchmark datasets, including CT pancreas, MR endocardium, and MR multi-structures segmentation on the ACDC dataset. Compared with the current state of the art, our CoraNet has demonstrated superior performance. In addition, we have also analyzed its connection with and difference from conventional methods of uncertainty estimation in semi-supervised medical image segmentation.
With the goal of directly generalizing trained models to unseen target domains, domain generalization (DG), a newly proposed learning paradigm, has attracted considerable attention. Previous DG models usually require a sufficient quantity of annotated samples from observed source domains during training. In this paper, we relax this requirement about full annotation and investigate semi-supervised domain generalization (SSDG) where only one source domain is fully annotated along with the other domains totally unlabeled in the training process. With the challenges of tackling the domain gap between observed source domains and predicting unseen target domains, we propose a novel deep framework via joint domain-aware labels and dual-classifier to produce high-quality pseudo-labels. Concretely, to predict accurate pseudo-labels under domain shift, a domain-aware pseudo-labeling module is developed. Also, considering inconsistent goals between generalization and pseudo-labeling: former prevents overfitting on all source domains while latter might overfit the unlabeled source domains for high accuracy, we employ a dual-classifier to independently perform pseudo-labeling and domain generalization in the training process. Extensive results on publicly available DG benchmark datasets show the efficacy of our proposed SSDG method compared to the well-designed baselines and the state-of-the-art semi-supervised learning methods.
Few-shot learning, especially few-shot image classification, has received increasing attention and witnessed significant advances in recent years. Some recent studies implicitly show that many generic techniques or ``tricks'', such as data augmentation, pre-training, knowledge distillation, and self-supervision, may greatly boost the performance of a few-shot learning method. Moreover, different works may employ different software platforms, different training schedules, different backbone architectures and even different input image sizes, making fair comparisons difficult and practitioners struggle with reproducibility. To address these situations, we propose a comprehensive library for few-shot learning (LibFewShot) by re-implementing seventeen state-of-the-art few-shot learning methods in a unified framework with the same single codebase in PyTorch. Furthermore, based on LibFewShot, we provide comprehensive evaluations on multiple benchmark datasets with multiple backbone architectures to evaluate common pitfalls and effects of different training tricks. In addition, given the recent doubts on the necessity of meta- or episodic-training mechanism, our evaluation results show that such kind of mechanism is still necessary especially when combined with pre-training. We hope our work can not only lower the barriers for beginners to work on few-shot learning but also remove the effects of the nontrivial tricks to facilitate intrinsic research on few-shot learning. The source code is available from https://github.com/RL-VIG/LibFewShot.
Accurate image segmentation plays a crucial role in medical image analysis, yet it faces great challenges of various shapes, diverse sizes, and blurry boundaries. To address these difficulties, square kernel-based encoder-decoder architecture has been proposed and widely used, but its performance remains still unsatisfactory. To further cope with these challenges, we present a novel double-branch encoder architecture. Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels in the double-branch encoder, so features learned by the two branches can be expected to complement each other. 2) Considering that spatial attention can help models to better focus on the target region in a large-sized image, we develop an attention loss to further emphasize the segmentation on small-sized targets. Together, the above two schemes give rise to a novel double-branch encoder segmentation framework for medical image segmentation, namely Crosslink-Net. The experiments validate the effectiveness of our model on four datasets. The code is released at https://github.com/Qianyu1226/Crosslink-Net.
In this paper, we investigate if we could make the self-training -- a simple but popular framework -- work better for semi-supervised segmentation. Since the core issue in semi-supervised setting lies in effective and efficient utilization of unlabeled data, we notice that increasing the diversity and hardness of unlabeled data is crucial to performance improvement. Being aware of this fact, we propose to adopt the most plain self-training scheme coupled with appropriate strong data augmentations on unlabeled data (namely ST) for this task, which surprisingly outperforms previous methods under various settings without any bells and whistles. Moreover, to alleviate the negative impact of the wrongly pseudo labeled images, we further propose an advanced self-training framework (namely ST++), that performs selective re-training via selecting and prioritizing the more reliable unlabeled images. As a result, the proposed ST++ boosts the performance of semi-supervised model significantly and surpasses existing methods by a large margin on the Pascal VOC 2012 and Cityscapes benchmark. Overall, we hope this straightforward and simple framework will serve as a strong baseline or competitor for future works. Code is available at https://github.com/LiheYoung/ST-PlusPlus.