Abstract:Diabetes has emerged as a significant global health issue, especially with the increasing number of cases in many countries. This trend Underlines the need for a greater emphasis on early detection and proactive management to avert or mitigate the severe health complications of this disease. Over recent years, machine learning algorithms have shown promising potential in predicting diabetes risk and are beneficial for practitioners. Objective: This study highlights the prediction capabilities of statistical and non-statistical machine learning methods over Diabetes risk classification in 768 samples from the Pima Indians Diabetes Database. It consists of the significant demographic and clinical features of age, body mass index (BMI) and blood glucose levels that greatly depend on the vulnerability against Diabetes. The experimentation assesses the various types of machine learning algorithms in terms of accuracy and effectiveness regarding diabetes prediction. These algorithms include Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors, Naive Bayes, Support Vector Machine, Gradient Boosting and Neural Network Models. The results show that the Neural Network algorithm gained the highest predictive accuracy with 78,57 %, and then the Random Forest algorithm had the second position with 76,30 % accuracy. These findings show that machine learning techniques are not just highly effective. Still, they also can potentially act as early screening tools in predicting Diabetes within a data-driven fashion with valuable information on who is more likely to get affected. In addition, this study can help to realize the potential of machine learning for timely intervention over the longer term, which is a step towards reducing health outcomes and disease burden attributable to Diabetes on healthcare systems
Abstract:Vesicoureteral reflux (VUR) is traditionally assessed using subjective grading systems, which introduces variability in diagnosis. This study investigates the use of machine learning to improve diagnostic consistency by analyzing voiding cystourethrogram (VCUG) images. A total of 113 VCUG images were reviewed, with expert grading of VUR severity. Nine image-based features were selected to train six predictive models: Logistic Regression, Decision Tree, Gradient Boosting, Neural Network, and Stochastic Gradient Descent. The models were evaluated using leave-one-out cross-validation. Analysis identified deformation patterns in the renal calyces as key indicators of high-grade VUR. All models achieved accurate classifications with no false positives or negatives. High sensitivity to subtle image patterns characteristic of different VUR grades was confirmed by substantial Area Under the Curve (AUC) values. The results suggest that machine learning can offer an objective and standardized alternative to current subjective VUR assessments. These findings highlight renal calyceal deformation as a strong predictor of severe cases. Future research should aim to expand the dataset, refine imaging features, and improve model generalizability for broader clinical use.
Abstract:This study conducts an empirical examination of MLP networks investigated through a rigorous methodical experimentation process involving three diverse datasets: TinyFace, Heart Disease, and Iris. Study Overview: The study includes three key methods: a) a baseline training using the default settings for the Multi-Layer Perceptron (MLP), b) feature selection using Genetic Algorithm (GA) based refinement c) Principal Component Analysis (PCA) based dimension reduction. The results show important information on how such techniques affect performance. While PCA had showed benefits in low-dimensional and noise-free datasets GA consistently increased accuracy in complex datasets by accurately identifying critical features. Comparison reveals that feature selection and dimensionality reduction play interdependent roles in enhancing MLP performance. The study contributes to the literature on feature engineering and neural network parameter optimization, offering practical guidelines for a wide range of machine learning tasks
Abstract:Oral cancer presents a formidable challenge in oncology, necessitating early diagnosis and accurate prognosis to enhance patient survival rates. Recent advancements in machine learning and data mining have revolutionized traditional diagnostic methodologies, providing sophisticated and automated tools for differentiating between benign and malignant oral lesions. This study presents a comprehensive review of cutting-edge data mining methodologies, including Neural Networks, K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and ensemble learning techniques, specifically applied to the diagnosis and prognosis of oral cancer. Through a rigorous comparative analysis, our findings reveal that Neural Networks surpass other models, achieving an impressive classification accuracy of 93,6 % in predicting oral cancer. Furthermore, we underscore the potential benefits of integrating feature selection and dimensionality reduction techniques to enhance model performance. These insights underscore the significant promise of advanced data mining techniques in bolstering early detection, optimizing treatment strategies, and ultimately improving patient outcomes in the realm of oral oncology.
Abstract:Dental provider classification plays a crucial role in optimizing healthcare resource allocation and policy planning. Effective categorization of providers, such as standard rendering providers and safety net clinic (SNC) providers, enhances service delivery to underserved populations. This study aimed to evaluate the performance of machine learning models in classifying dental providers using a 2018 dataset. A dataset of 24,300 instances with 20 features was analyzed, including beneficiary and service counts across fee-for-service (FFS), Geographic Managed Care, and Pre-Paid Health Plans. Providers were categorized by delivery system and patient age groups (0-20 and 21+). Despite 38.1% missing data, multiple machine learning algorithms were tested, including k-Nearest Neighbors (kNN), Decision Trees, Support Vector Machines (SVM), Stochastic Gradient Descent (SGD), Random Forest, Neural Networks, and Gradient Boosting. A 10-fold cross-validation approach was applied, and models were evaluated using AUC, classification accuracy (CA), F1-score, precision, and recall. Neural Networks achieved the highest AUC (0.975) and CA (94.1%), followed by Random Forest (AUC: 0.948, CA: 93.0%). These models effectively handled imbalanced data and complex feature interactions, outperforming traditional classifiers like Logistic Regression and SVM. Advanced machine learning techniques, particularly ensemble and deep learning models, significantly enhance dental workforce classification. Their integration into healthcare analytics can improve provider identification and resource distribution, benefiting underserved populations.
Abstract:This study investigates the application of machine learning (ML) models for classifying dental providers into two categories - standard rendering providers and safety net clinic (SNC) providers - using a 2018 dataset of 24,300 instances with 20 features. The dataset, characterized by high missing values (38.1%), includes service counts (preventive, treatment, exams), delivery systems (FFS, managed care), and beneficiary demographics. Feature ranking methods such as information gain, Gini index, and ANOVA were employed to identify critical predictors, revealing treatment-related metrics (TXMT_USER_CNT, TXMT_SVC_CNT) as top-ranked features. Twelve ML models, including k-Nearest Neighbors (kNN), Decision Trees, Support Vector Machines (SVM), Stochastic Gradient Descent (SGD), Random Forest, Neural Networks, and Gradient Boosting, were evaluated using 10-fold cross-validation. Classification accuracy was tested across incremental feature subsets derived from rankings. The Neural Network achieved the highest accuracy (94.1%) using all 20 features, followed by Gradient Boosting (93.2%) and Random Forest (93.0%). Models showed improved performance as more features were incorporated, with SGD and ensemble methods demonstrating robustness to missing data. Feature ranking highlighted the dominance of treatment service counts and annotation codes in distinguishing provider types, while demographic variables (AGE_GROUP, CALENDAR_YEAR) had minimal impact. The study underscores the importance of feature selection in enhancing model efficiency and accuracy, particularly in imbalanced healthcare datasets. These findings advocate for integrating feature-ranking techniques with advanced ML algorithms to optimize dental provider classification, enabling targeted resource allocation for underserved populations.