In cancer research, overall survival and progression free survival are often analyzed with the Cox model. To estimate accurately the parameters in the model, sufficient data and, more importantly, sufficient events need to be observed. In practice, this is often a problem. Merging data sets from different medical centers may help, but this is not always possible due to strict privacy legislation and logistic difficulties. Recently, the Bayesian Federated Inference (BFI) strategy for generalized linear models was proposed. With this strategy the statistical analyses are performed in the local centers where the data were collected (or stored) and only the inference results are combined to a single estimated model; merging data is not necessary. The BFI methodology aims to compute from the separate inference results in the local centers what would have been obtained if the analysis had been based on the merged data sets. In this paper we generalize the BFI methodology as initially developed for generalized linear models to survival models. Simulation studies and real data analyses show excellent performance; i.e., the results obtained with the BFI methodology are very similar to the results obtained by analyzing the merged data. An R package for doing the analyses is available.
To estimate accurately the parameters of a regression model, the sample size must be large enough relative to the number of possible predictors for the model. In practice, sufficient data is often lacking, which can lead to overfitting of the model and, as a consequence, unreliable predictions of the outcome of new patients. Pooling data from different data sets collected in different (medical) centers would alleviate this problem, but is often not feasible due to privacy regulation or logistic problems. An alternative route would be to analyze the local data in the centers separately and combine the statistical inference results with the Bayesian Federated Inference (BFI) methodology. The aim of this approach is to compute from the inference results in separate centers what would have been found if the statistical analysis was performed on the combined data. We explain the methodology under homogeneity and heterogeneity across the populations in the separate centers, and give real life examples for better understanding. Excellent performance of the proposed methodology is shown. An R-package to do all the calculations has been developed and is illustrated in this paper. The mathematical details are given in the Appendix.