Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel L Rubin

An Experimental Study of Data Heterogeneity in Federated Learning Methods for Medical Imaging

Jul 18, 2021

Liangqiong Qu, Niranjan Balachandar, Daniel L Rubin

Figure 1 for An Experimental Study of Data Heterogeneity in Federated Learning Methods for Medical Imaging

Figure 2 for An Experimental Study of Data Heterogeneity in Federated Learning Methods for Medical Imaging

Figure 3 for An Experimental Study of Data Heterogeneity in Federated Learning Methods for Medical Imaging

Figure 4 for An Experimental Study of Data Heterogeneity in Federated Learning Methods for Medical Imaging

Abstract:Federated learning enables multiple institutions to collaboratively train machine learning models on their local data in a privacy-preserving way. However, its distributed nature often leads to significant heterogeneity in data distributions across institutions. In this paper, we investigate the deleterious impact of a taxonomy of data heterogeneity regimes on federated learning methods, including quantity skew, label distribution skew, and imaging acquisition skew. We show that the performance degrades with the increasing degrees of data heterogeneity. We present several mitigation strategies to overcome performance drops from data heterogeneity, including weighted average for data quantity skew, weighted loss and batch normalization averaging for label distribution skew. The proposed optimizations to federated learning methods improve their capability of handling heterogeneity across institutions, which provides valuable guidance for the deployment of federated learning in real clinical applications.

Via

Access Paper or Ask Questions

The unreasonable effectiveness of Batch-Norm statistics in addressing catastrophic forgetting across medical institutions

Nov 16, 2020

Sharut Gupta, Praveer Singh, Ken Chang, Mehak Aggarwal, Nishanth Arun, Liangqiong Qu, Katharina Hoebel, Jay Patel, Mishka Gidwani, Ashwin Vaswani(+2 more)

Figure 1 for The unreasonable effectiveness of Batch-Norm statistics in addressing catastrophic forgetting across medical institutions

Figure 2 for The unreasonable effectiveness of Batch-Norm statistics in addressing catastrophic forgetting across medical institutions

Figure 3 for The unreasonable effectiveness of Batch-Norm statistics in addressing catastrophic forgetting across medical institutions

Figure 4 for The unreasonable effectiveness of Batch-Norm statistics in addressing catastrophic forgetting across medical institutions

Abstract:Model brittleness is a primary concern when deploying deep learning models in medical settings owing to inter-institution variations, like patient demographics and intra-institution variation, such as multiple scanner types. While simply training on the combined datasets is fraught with data privacy limitations, fine-tuning the model on subsequent institutions after training it on the original institution results in a decrease in performance on the original dataset, a phenomenon called catastrophic forgetting. In this paper, we investigate trade-off between model refinement and retention of previously learned knowledge and subsequently address catastrophic forgetting for the assessment of mammographic breast density. More specifically, we propose a simple yet effective approach, adapting Elastic weight consolidation (EWC) using the global batch normalization (BN) statistics of the original dataset. The results of this study provide guidance for the deployment of clinical deep learning models where continuous learning is needed for domain expansion.

* Accepted as oral presentation in Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract ; 6 pages and 4 figures

Via

Access Paper or Ask Questions

Institutionally Distributed Deep Learning Networks

Sep 10, 2017

Ken Chang, Niranjan Balachandar, Carson K Lam, Darvin Yi, James M Brown, Andrew Beers, Bruce R Rosen, Daniel L Rubin, Jayashree Kalpathy-Cramer

Figure 1 for Institutionally Distributed Deep Learning Networks

Figure 2 for Institutionally Distributed Deep Learning Networks

Figure 3 for Institutionally Distributed Deep Learning Networks

Figure 4 for Institutionally Distributed Deep Learning Networks

Abstract:Deep learning has become a promising approach for automated medical diagnoses. When medical data samples are limited, collaboration among multiple institutions is necessary to achieve high algorithm performance. However, sharing patient data often has limitations due to technical, legal, or ethical concerns. In such cases, sharing a deep learning model is a more attractive alternative. The best method of performing such a task is unclear, however. In this study, we simulate the dissemination of learning deep learning network models across four institutions using various heuristics and compare the results with a deep learning model trained on centrally hosted patient data. The heuristics investigated include ensembling single institution models, single weight transfer, and cyclical weight transfer. We evaluated these approaches for image classification in three independent image collections (retinal fundus photos, mammography, and ImageNet). We find that cyclical weight transfer resulted in a performance (testing accuracy = 77.3%) that was closest to that of centrally hosted patient data (testing accuracy = 78.7%). We also found that there is an improvement in the performance of cyclical weight transfer heuristic with high frequency of weight transfer.

Via

Access Paper or Ask Questions

Piecewise convexity of artificial neural networks

Dec 28, 2016

Blaine Rister, Daniel L Rubin

Figure 1 for Piecewise convexity of artificial neural networks

Figure 2 for Piecewise convexity of artificial neural networks

Figure 3 for Piecewise convexity of artificial neural networks

Figure 4 for Piecewise convexity of artificial neural networks

Abstract:Although artificial neural networks have shown great promise in applications including computer vision and speech recognition, there remains considerable practical and theoretical difficulty in optimizing their parameters. The seemingly unreasonable success of gradient descent methods in minimizing these non-convex functions remains poorly understood. In this work we offer some theoretical guarantees for networks with piecewise affine activation functions, which have in recent years become the norm. We prove three main results. Firstly, that the network is piecewise convex as a function of the input data. Secondly, that the network, considered as a function of the parameters in a single layer, all others held constant, is again piecewise convex. Finally, that the network as a function of all its parameters is piecewise multi-convex, a generalization of biconvexity. From here we characterize the local minima and stationary points of the training objective, showing that they minimize certain subsets of the parameter space. We then analyze the performance of two optimization algorithms on multi-convex problems: gradient descent, and a method which repeatedly solves a number of convex sub-problems. We prove necessary convergence conditions for the first algorithm and both necessary and sufficient conditions for the second, after introducing regularization to the objective. Finally, we remark on the remaining difficulty of the global optimization problem. Under the squared error objective, we show that by varying the training data, a single rectifier neuron admits local minima arbitrarily far apart, both in objective value and parameter space.

Via

Access Paper or Ask Questions