Abstract:Demographic attributes are universally present in electronic health records and serve as vital predictors in clinical risk stratification and treatment decisions. Despite their significance, these attributes are often relegated to auxiliary roles in model design, with limited attention has been given to learning their representations. This study proposes a General Demographic Pre-trained (GDP) model as a foundational representation framework tailored to age and gender. The model is pre-trained and evaluated using datasets with diverse diseases and population compositions from different geographic regions. The GDP architecture explores combinations of ordering strategies and encoding methods to transform tabular demographic inputs into latent embeddings. Experimental results demonstrate that sequential ordering substantially improves model performance in discrimination, calibration, and the corresponding information gain at each decision tree split, particularly in diseases where age and gender contribute significantly to risk stratification. Even in datasets where demographic attributes hold relatively low predictive value, GDP enhances the representational importance, increasing their influence in downstream gradient boosting models. The findings suggest that foundational models for tabular demographic attributes can generalize across tasks and populations, offering a promising direction for improving predictive performance in healthcare applications.
Abstract:Objective: Improving geographical access remains a key issue in determining the sufficiency of regional medical resources during health policy design. However, patient choices can be the result of the complex interactivity of various factors. The aim of this study is to propose a deep neural network approach to model the complex decision of patient choice in travel distance to access care, which is an important indicator for policymaking in allocating resources. Method: We used the 4-year nationwide insurance data of Taiwan and accumulated the possible features discussed in earlier literature. This study proposes the use of a convolutional neural network (CNN)-based framework to make predictions. The model performance was tested against other machine learning methods. The proposed framework was further interpreted using Integrated Gradients (IG) to analyze the feature weights. Results: We successfully demonstrated the effectiveness of using a CNN-based framework to predict the travel distance of patients, achieving an accuracy of 0.968, AUC of 0.969, sensitivity of 0.960, and specificity of 0.989. The CNN-based framework outperformed all other methods. In this research, the IG weights are potentially explainable; however, the relationship does not correspond to known indicators in public health, similar to common consensus. Conclusions: Our results demonstrate the feasibility of the deep learning-based travel distance prediction model. It has the potential to guide policymaking in resource allocation.