Federated learning enables collaboratively training machine learning models on decentralized data. The three types of heterogeneous natures that is data, model, and objective bring about unique challenges to the canonical federated learning algorithm (FederatedAveraging), where one shared model is produced by and for all clients. First, due to the Non-IIDness of data, the global shared model may perform worse than local models that solely trained on their private data; Second, clients may need to design their own model because of different communication and computing abilities of devices, which is also private property that should be protected; Third, the objective of achieving consensus throughout the training process will compromise the personalities of clients. In this work, we present a novel federated learning paradigm, named Federated Mutual Leaning (FML), dealing with the three heterogeneities. FML allows clients designing their customized models and training independently, thus the Non-IIDness of data is no longer a bug but a feature that clients can be personally served better. Local customized models can benefit from collaboratively training without compromising personalities. Global model does not have to be an out-of-the-box (OOTB) product but a meta-learner which requires local adaptation for new participants. The experiments show that FML can achieve better performance, robustness and communication efficiency than alternatives.
One fundamental problem in the learning treatment effect from observational data is confounder identification and balancing. Most of the previous methods realized confounder balancing by treating all observed variables as confounders, ignoring the identification of confounders and non-confounders. In general, not all the observed variables are confounders which are the common causes of both the treatment and the outcome, some variables only contribute to the treatment and some contribute to the outcome. Balancing those non-confounders would generate additional bias for treatment effect estimation. By modeling the different relations among variables, treatment and outcome, we propose a synergistic learning framework to 1) identify and balance confounders by learning decomposed representation of confounders and non-confounders, and simultaneously 2) estimate the treatment effect in observational studies via counterfactual inference. Our empirical results demonstrate that the proposed method can precisely identify and balance confounders, while the estimation of the treatment effect performs better than the state-of-the-art methods on both synthetic and real-world datasets.
Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) can suffer from inferior performance due to unstable training, especially for text generation. we propose a new variational GAN training framework which enjoys superior training stability. Our approach is inspired by a connection of GANs and reinforcement learning under a variational perspective. The connection leads to (1) probability ratio clipping that regularizes generator training to prevent excessively large updates, and (2) a sample re-weighting mechanism that stabilizes discriminator training by downplaying bad-quality fake samples. We provide theoretical analysis on the convergence of our approach. By plugging the training approach in diverse state-of-the-art GAN architectures, we obtain significantly improved performance over a range of tasks, including text generation, text style transfer, and image generation.
This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning. PCL not only learns low-level features for the task of instance discrimination, but more importantly, it implicitly encodes semantic structures of the data into the learned embedding space. Specifically, we introduce prototypes as latent variables to help find the maximum-likelihood estimation of the network parameters in an Expectation-Maximization framework. We iteratively perform E-step as finding the distribution of prototypes via clustering and M-step as optimizing the network via contrastive learning. We propose ProtoNCE loss, a generalized version of the InfoNCE loss for contrastive learning, which encourages representations to be closer to their assigned prototypes. PCL achieves state-of-the-art results on multiple unsupervised representation learning benchmarks, with >10% accuracy improvement in low-resource transfer tasks.
Pedestrian attribute recognition is an important multi-label classification problem. Although the convolutional neural networks are prominent in learning discriminative features from images, the data imbalance in multi-label setting for fine-grained tasks remains an open problem. In this paper, we propose a new re-sampling algorithm called: data augmentation imbalance (DAI) to explicitly enhance the ability to discriminate the fewer attributes via increasing the proportion of labels accounting for a small part. Fundamentally, by applying over-sampling and under-sampling on the multi-label dataset at the same time, the thought of robbing the rich attributes and helping the poor makes a significant contribution to DAI. Extensive empirical evidence shows that our DAI algorithm achieves state-of-the-art results, based on pedestrian attribute datasets, i.e. standard PA-100K and PETA datasets.
The task of crowd counting is extremely challenging due to complicated difficulties, especially the huge variation in vision scale. Previous works tend to adopt a naive concatenation of multi-scale information to tackle it, while the scale shifts between the feature maps are ignored. In this paper, we propose a novel Hierarchical Scale Recalibration Network (HSRNet), which addresses the above issues by modeling rich contextual dependencies and recalibrating multiple scale-associated information. Specifically, a Scale Focus Module (SFM) first integrates global context into local features by modeling the semantic inter-dependencies along channel and spatial dimensions sequentially. In order to reallocate channel-wise feature responses, a Scale Recalibration Module (SRM) adopts a step-by-step fusion to generate final density maps. Furthermore, we propose a novel Scale Consistency loss to constrain that the scale-associated outputs are coherent with groundtruth of different scales. With the proposed modules, our approach can ignore various noises selectively and focus on appropriate crowd scales automatically. Extensive experiments on crowd counting datasets (ShanghaiTech, MALL, WorldEXPO'10, and UCSD) show that our HSRNet can deliver superior results over all state-of-the-art approaches. More remarkably, we extend experiments on an extra vehicle dataset, whose results indicate that the proposed model is generalized to other applications.
Disease diagnosis on chest X-ray images is a challenging multi-label classification task. Previous works generally classify the diseases independently on the input image without considering any correlation among diseases. However, such correlation actually exists, for example, Pleural Effusion is more likely to appear when Pneumothorax is present. In this work, we propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases by using a dynamic learnable adjacency matrix in graph structure to improve the diagnosis accuracy. To learn more natural and reliable correlation relationship, we feed each node with the image-level individual feature map corresponding to each type of disease. To our knowledge, our method is the first to build a graph over the feature maps with a dynamic adjacency matrix for correlation learning. To further deal with a practical issue of incomplete labels, DD-GCN also utilizes an adaptive loss and a curriculum learning strategy to train the model on incomplete labels. Experimental results on two popular chest X-ray (CXR) datasets show that our prediction accuracy outperforms state-of-the-arts, and the learned graph adjacency matrix establishes the correlation representations of different diseases, which is consistent with expert experience. In addition, we apply an ablation study to demonstrate the effectiveness of each component in DD-GCN.