Background: Postpartum urinary incontinence (PUI) is a common issue among postnatal women. Previous studies identified potential related variables, but lacked analysis on certain intrinsic and extrinsic patient variables during pregnancy. Objective: The study aims to evaluate the most influential variables in PUI using machine learning, focusing on intrinsic, extrinsic, and combined variable groups. Methods: Data from 93 pregnant women were analyzed using machine learning and oversampling techniques. Four key variables were predicted: occurrence, frequency, intensity of urinary incontinence, and stress urinary incontinence. Results: Models using extrinsic variables were most accurate, with 70% accuracy for urinary incontinence, 77% for frequency, 71% for intensity, and 93% for stress urinary incontinence. Conclusions: The study highlights extrinsic variables as significant predictors of PUI issues. This suggests that PUI prevention might be achievable through healthy habits during pregnancy, although further research is needed for confirmation.
* Digital Health, Volume 8, 2022, 20552076221111289
Gradient Boost Decision Trees (GBDT) is a powerful additive model based on tree ensembles. Its nature makes GBDT a black-box model even though there are multiple explainable artificial intelligence (XAI) models obtaining information by reinterpreting the model globally and locally. Each tree of the ensemble is a transparent model itself but the final outcome is the result of a sum of these trees and it is not easy to clarify. In this paper, a feature contribution method for GBDT is developed. The proposed method takes advantage of the GBDT architecture to calculate the contribution of each feature using the residue of each node. This algorithm allows to calculate the sequence of node decisions given a prediction. Theoretical proofs and multiple experiments have been carried out to demonstrate the performance of our method which is not only a local explicability model for the GBDT algorithm but also a unique option that reflects GBDTs internal behavior. The proposal is aligned to the contribution of characteristics having impact in some artificial intelligence problems such as ethical analysis of Artificial Intelligence (AI) and comply with the new European laws such as the General Data Protection Regulation (GDPR) about the right to explain and nondiscrimination.
* Information Sciences, Volume 589, 2022, Pages 199-212
This work addresses the performance comparison between four clustering techniques with the objective of achieving strong hybrid models in supervised learning tasks. A real dataset from a bio-climatic house named Sotavento placed on experimental wind farm and located in Xermade (Lugo) in Galicia (Spain) has been collected. Authors have chosen the thermal solar generation system in order to study how works applying several cluster methods followed by a regression technique to predict the output temperature of the system. With the objective of defining the quality of each clustering method two possible solutions have been implemented. The first one is based on three unsupervised learning metrics (Silhouette, Calinski-Harabasz and Davies-Bouldin) while the second one, employs the most common error measurements for a regression algorithm such as Multi Layer Perceptron.
Background: Eating disorders are increasingly prevalent, and social networks offer valuable information. Objective: Our goal was to identify efficient machine learning models for categorizing tweets related to eating disorders. Methods: Over three months, we collected tweets about eating disorders. A 2,000-tweet subset was labeled for: (1) being written by individuals with eating disorders, (2) promoting eating disorders, (3) informativeness, and (4) scientific content. Both traditional machine learning and deep learning models were employed for classification, assessing accuracy, F1 score, and computational time. Results: From 1,058,957 collected tweets, transformer-based bidirectional encoder representations achieved the highest F1 scores (71.1%-86.4%) across all four categories. Conclusions: Transformer-based models outperform traditional techniques in classifying eating disorder-related tweets, though they require more computational resources.
* JMIR Medical Informatics, Volume 10, Issue 2, 2022, ID e34492
Social networks are vital for information sharing, especially in the health sector for discussing diseases and treatments. These platforms, however, often feature posts as brief texts, posing challenges for Artificial Intelligence (AI) in understanding context. We introduce a novel hybrid approach combining community-maintained knowledge graphs (like Wikidata) with deep learning to enhance the categorization of social media posts. This method uses advanced entity recognizers and linkers (like Falcon 2.0) to connect short post entities to knowledge graphs. Knowledge graph embeddings (KGEs) and contextualized word embeddings (like BERT) are then employed to create rich, context-based representations of these posts. Our focus is on the health domain, particularly in identifying posts related to eating disorders (e.g., anorexia, bulimia) to aid healthcare providers in early diagnosis. We tested our approach on a dataset of 2,000 tweets about eating disorders, finding that merging word embeddings with knowledge graph information enhances the predictive models' reliability. This methodology aims to assist health experts in spotting patterns indicative of mental disorders, thereby improving early detection and accurate diagnosis for personalized medicine.
* Semantic Web, Volume 4, Issue 5, pp. 873-892, 2023
Cardiovascular diseases state as one of the greatest risks of death for the general population. Late detection in heart diseases highly conditions the chances of survival for patients. Age, sex, cholesterol level, sugar level, heart rate, among other factors, are known to have an influence on life-threatening heart problems, but, due to the high amount of variables, it is often difficult for an expert to evaluate each patient taking this information into account. In this manuscript, the authors propose using deep learning methods, combined with feature augmentation techniques for evaluating whether patients are at risk of suffering cardiovascular disease. The results of the proposed methods outperform other state of the art methods by 4.4%, leading to a precision of a 90%, which presents a significant improvement, even more so when it comes to an affliction that affects a large population.
* Multimedia Tools and Applications, Volume 82, pp. 31759 - 31773,
Parkinson's disease is easy to diagnose when it is advanced, but it is very difficult to diagnose in its early stages. Early diagnosis is essential to be able to treat the symptoms. It impacts on daily activities and reduces the quality of life of both the patients and their families and it is also the second most prevalent neurodegenerative disorder after Alzheimer in people over the age of 60. Most current studies on the prediction of Parkinson's severity are carried out in advanced stages of the disease. In this work, the study analyzes a set of variables that can be easily extracted from voice analysis, making it a very non-intrusive technique. In this paper, a method based on different deep learning techniques is proposed with two purposes. On the one hand, to find out if a person has severe or non-severe Parkinson's disease, and on the other hand, to determine by means of regression techniques the degree of evolution of the disease in a given patient. The UPDRS (Unified Parkinson's Disease Rating Scale) has been used by taking into account both the motor and total labels, and the best results have been obtained using a mixed multi-layer perceptron (MLP) that classifies and regresses at the same time and the most important features of the data obtained are taken as input, using an autoencoder. A success rate of 99.15% has been achieved in the problem of predicting whether a person suffers from severe Parkinson's disease or non-severe Parkinson's disease. In the degree of disease involvement prediction problem case, a MSE (Mean Squared Error) of 0.15 has been obtained. Using a full deep learning pipeline for data preprocessing and classification has proven to be very promising in the field Parkinson's outperforming the state-of-the-art proposals.
* Multimedia Tools and Applications, Volume 83, pages 6077-6092,
This study proposes a method based on fully convolutional neural networks (FCNs) to identify migratory birds from their songs, with the objective of recognizing which birds pass through certain areas and at what time. To determine the best FCN architecture, extensive experimentation was conducted through a grid search, exploring the optimal depth, width, and activation function of the network. The results showed that the optimal number of filters is 400 in the widest layer, with 4 convolutional blocks with maxpooling and an adaptive activation function. The proposed FCN offers a significant advantage over other techniques, as it can recognize the sound of a bird in audio of any length with an accuracy greater than 85%. Furthermore, due to its architecture, the network can detect more than one species from audio and can carry out near-real-time sound recognition. Additionally, the proposed method is lightweight, making it ideal for deployment and use in IoT devices. The study also presents a comparative analysis of the proposed method against other techniques, demonstrating an improvement of over 67% in the best-case scenario. These findings contribute to advancing the field of bird sound recognition and provide valuable insights into the practical application of FCNs in real-world scenarios.
* Applied Intelligence, Volume 53, July 2023, pp. 23287 - 23300
Tree ensemble algorithms as RandomForest and GradientBoosting are currently the dominant methods for modeling discrete or tabular data, however, they are unable to perform a hierarchical representation learning from raw data as NeuralNetworks does thanks to its multi-layered structure, which is a key feature for DeepLearning problems and modeling unstructured data. This limitation is due to the fact that tree algorithms can not be trained with back-propagation because of their mathematical nature. However, in this work, we demonstrate that the mathematical formulation of bagging and boosting can be combined together to define a graph-structured-tree-ensemble algorithm with a distributed representation learning process between trees naturally (without using back-propagation). We call this novel approach Distributed Gradient Boosting Forest (DGBF) and we demonstrate that both RandomForest and GradientBoosting can be expressed as particular graph architectures of DGBT. Finally, we see that the distributed learning outperforms both RandomForest and GradientBoosting in 7 out of 9 datasets.
* Applied Intelligence, Volume 53, July 2023, pages 22991-23003
Aim: To study the existence of subgroups by exploring the similarities between the attributes of the nodes of the groups, in relation to diet and gender and, to analyse the connectivity between groups based on aspects of similarities between them through SNA and artificial intelligence techniques. Methods: 235 students from 5 different educational centres participate in this study between March and December 2015. Data analysis carried out is divided into two blocks: social network analysis and unsupervised machine learning techniques. As for the social network analysis, the Girvan-Newman technique was applied to find the best number of cohesive groups within each of the friendship networks of the different classes analysed. Results: After applying Girvan-Newman in the three classes, the best division into clusters was respectively 2 for classroom A, 7 for classroom B and 6 for classroom C. There are significant differences between the groups and the gender and diet variables. After applying K-means using population diet as an input variable, a K-means clustering of 2 clusters for class A, 3 clusters for class B and 3 clusters for class C is obtained. Conclusion: Adolescents form subgroups within their classrooms. Subgroup cohesion is defined by the fact that nodes share similarities in aspects that influence obesity, they share attributes related to food quality and gender. The concept of homophily, related to SNA, justifies our results. Artificial intelligence techniques together with the application of the Girvan-Newman provide robustness to the structural analysis of similarities and cohesion between subgroups.
* Plos One, Volume 18, Issue 8, August 2023, e0289553