We study practical data characteristics underlying federated learning, where non-i.i.d. data from clients have sparse features, and a certain client's local data normally involves only a small part of the full model, called a submodel. Due to data sparsity, the classical federated averaging (FedAvg) algorithm or its variants will be severely slowed down, because when updating the global model, each client's zero update of the full model excluding its submodel is inaccurately aggregated. Therefore, we propose federated submodel averaging (FedSubAvg), ensuring that the expectation of the global update of each model parameter is equal to the average of the local updates of the clients who involve it. We theoretically proved the convergence rate of FedSubAvg by deriving an upper bound under a new metric called the element-wise gradient norm. In particular, this new metric can characterize the convergence of federated optimization over sparse data, while the conventional metric of squared gradient norm used in FedAvg and its variants cannot. We extensively evaluated FedSubAvg over both public and industrial datasets. The evaluation results demonstrate that FedSubAvg significantly outperforms FedAvg and its variants.
Federated learning (FL) trains a machine learning model on mobile devices in a distributed manner using each device's private data and computing resources. A critical issues is to evaluate individual users' contributions so that (1) users' effort in model training can be compensated with proper incentives and (2) malicious and low-quality users can be detected and removed. The state-of-the-art solutions require a representative test dataset for the evaluation purpose, but such a dataset is often unavailable and hard to synthesize. In this paper, we propose a method called Pairwise Correlated Agreement (PCA) based on the idea of peer prediction to evaluate user contribution in FL without a test dataset. PCA achieves this using the statistical correlation of the model parameters uploaded by users. We then apply PCA to designing (1) a new federated learning algorithm called Fed-PCA, and (2) a new incentive mechanism that guarantees truthfulness. We evaluate the performance of PCA and Fed-PCA using the MNIST dataset and a large industrial product recommendation dataset. The results demonstrate that our Fed-PCA outperforms the canonical FedAvg algorithm and other baseline methods in accuracy, and at the same time, PCA effectively incentivizes users to behave truthfully.
Most of existing studies on adaptive submodular optimization focus on the average-case, i.e., their objective is to find a policy that maximizes the expected utility over a known distribution of realizations. However, a policy that has a good average-case performance may have very poor performance under the worst-case realization. In this study, we propose to study two variants of adaptive submodular optimization problems, namely, worst-case adaptive submodular maximization and robust submodular maximization. The first problem aims to find a policy that maximizes the worst-case utility and the latter one aims to find a policy, if any, that achieves both near optimal average-case utility and worst-case utility simultaneously. We introduce a new class of stochastic functions, called \emph{worst-case submodular function}. For the worst-case adaptive submodular maximization problem subject to a $p$-system constraint, we develop an adaptive worst-case greedy policy that achieves a $\frac{1}{p+1}$ approximation ratio against the optimal worst-case utility if the utility function is worst-case submodular. For the robust adaptive submodular maximization problem subject to a cardinality constraint, if the utility function is both worst-case submodular and adaptive submodular, we develop a hybrid adaptive policy that achieves an approximation close to $1-e^{-\frac{1}{2}}$ under both worst case setting and average case setting simultaneously. We also describe several applications of our theoretical results, including pool-base active learning, stochastic submodular set cover and adaptive viral marketing.
Running machine learning algorithms on large and rapidly growing volumes of data are often computationally expensive, one common trick to reduce the size of a data set, and thus reduce the computational cost of machine learning algorithms, is \emph{probability sampling}. It creates a sampled data set by including each data point from the original data set with a known probability. Although the benefit of running machine learning algorithms on the reduced data set is obvious, one major concern is that the performance of the solution obtained from samples might be much worse than that of the optimal solution when using the full data set. In this paper, we examine the performance loss caused by probability sampling in the context of adaptive submodular maximization. We consider a easiest probability sampling method which selects each data point independently with probability $r\in[0,1]$. We define sampling gap as the largest ratio of the optimal solution obtained from the full data set and the optimal solution obtained from the samples, over independence systems. Our main contribution is to show that if the utility function is policywise submodular, then for a given sampling rate $r$, the sampling gap is both upper bounded and lower bounded by $1/r$. One immediate implication of our result is that if we can find an $\alpha$-approximation solution based on a sampled data set (which is sampled at sampling rate $r$), then this solution achieves an $\alpha r$ approximation ratio for the original problem when using the full data set. We also show that the property of policywise submodular can be found in a wide range of real-world applications, including pool-based active learning and adaptive viral marketing.
In this paper, we study the non-monotone adaptive submodular maximization problem subject to a knapsack constraint. The input of our problem is a set of items, where each item has a particular state drawn from a known prior distribution. However, the state of an item is initially unknown, one must select an item in order to reveal the state of that item. Moreover, each item has a fixed cost. There is a utility function which is defined over items and states. Our objective is to sequentially select a group of items to maximize the expected utility subject to a knapsack constraint. Although the cardinality-constrained, as well as the more general matroid-constrained, adaptive submodular maximization has been well studied in the literature, whether there exists a constant approximation solution for the knapsack-constrained adaptive submodular maximization problem remains an open problem. We fill this gap by proposing the first constant approximation solution. In particular, our main contribution is to develop a sampling-based randomized algorithm that achieves a $\frac{1}{10}$ approximation for maximizing an adaptive submodular function subject to a knapsack constraint.
In this paper, we study the problem of maximizing the difference between an adaptive submodular (revenue) function and an non-negative modular (cost) function under the adaptive setting. The input of our problem is a set of $n$ items, where each item has a particular state drawn from some known prior distribution $p$. The revenue function $g$ is defined over items and states, and the cost function $c$ is defined over items, i.e., each item has a fixed cost. The state of each item is unknown initially, one must select an item in order to observe its realized state. A policy $\pi$ specifies which item to pick next based on the observations made so far. Denote by $g_{avg}(\pi)$ the expected revenue of $\pi$ and let $c_{avg}(\pi)$ denote the expected cost of $\pi$. Our objective is to identify the best policy $\pi^o\in \arg\max_{\pi}g_{avg}(\pi)-c_{avg}(\pi)$ under a $k$-cardinality constraint. Since our objective function can take on both negative and positive values, the existing results of submodular maximization may not be applicable. To overcome this challenge, we develop a series of effective solutions with performance grantees. Let $\pi^o$ denote the optimal policy. For the case when $g$ is adaptive monotone and adaptive submodular, we develop an effective policy $\pi^l$ such that $g_{avg}(\pi^l) - c_{avg}(\pi^l) \geq (1-\frac{1}{e}-\epsilon)g_{avg}(\pi^o) - c_{avg}(\pi^o)$, using only $O(n\epsilon^{-2}\log \epsilon^{-1})$ value oracle queries. For the case when $g$ is adaptive submodular, we present a randomized policy $\pi^r$ such that $g_{avg}(\pi^r) - c_{avg}(\pi^r) \geq \frac{1}{e}g_{avg}(\pi^o) - c_{avg}(\pi^o)$.
Automatic CT segmentation of proximal femur is crucial for the diagnosis and risk stratification of orthopedic diseases; however, current methods for the femur CT segmentation mainly rely on manual interactive segmentation, which is time-consuming and has limitations in both accuracy and reproducibility. In this study, we proposed an approach based on deep learning for the automatic extraction of the periosteal and endosteal contours of proximal femur in order to differentiate cortical and trabecular bone compartments. A three-dimensional (3D) end-to-end fully convolutional neural network, which can better combine the information between neighbor slices and get more accurate segmentation results, was developed for our segmentation task. 100 subjects aged from 50 to 87 years with 24,399 slices of proximal femur CT images were enrolled in this study. The separation of cortical and trabecular bone derived from the QCT software MIAF-Femur was used as the segmentation reference. We randomly divided the whole dataset into a training set with 85 subjects for 10-fold cross-validation and a test set with 15 subjects for evaluating the performance of models. Two models with the same network structures were trained and they achieved a dice similarity coefficient (DSC) of 97.87% and 96.49% for the periosteal and endosteal contours, respectively. To verify the excellent performance of our model for femoral segmentation, we measured the volume of different parts of the femur and compared it with the ground truth and the relative errors between predicted result and ground truth are all less than 5%. It demonstrated a strong potential for clinical use, including the hip fracture risk prediction and finite element analysis.
Federated learning allows mobile clients to jointly train a global model without sending their private data to a central server. Despite that extensive works have studied the performance guarantee of the global model, it is still unclear how each individual client influences the collaborative training process. In this work, we defined a novel notion, called {\em Fed-Influence}, to quantify this influence in terms of model parameter, and proposed an effective and efficient estimation algorithm. In particular, our design satisfies several desirable properties: (1) it requires neither retraining nor retracing, adding only linear computational overhead to clients and the server; (2) it strictly maintains the tenet of federated learning, without revealing any client's local data; and (3) it works well on both convex and non-convex loss functions and does not require the final model to be optimal. Empirical results on a synthetic dataset and the FEMNIST dataset show that our estimation method can approximate Fed-Influence with small bias. Further, we demonstrated an application of client-level model debugging.