Pandemics, with their profound societal and economic impacts, pose significant threats to global health, mortality rates, economic stability, and political landscapes. In response to these challenges, numerous studies have employed spatio-temporal models to enhance our understanding and management of these complex phenomena. These spatio-temporal models can be roughly divided into two main spatial categories: norm-based and graph-based. Norm-based models are usually more accurate and easier to model but are more computationally intensive and require more data to fit. On the other hand, graph-based models are less accurate and harder to model but are less computationally intensive and require fewer data to fit. As such, ideally, one would like to use a graph-based model while preserving the representation accuracy obtained by the norm-based model. In this study, we explore the ability to transform from norm-based to graph-based spatial representation for these models. We first show no analytical mapping between the two exists, requiring one to use approximation numerical methods instead. We introduce a novel framework for this task together with twelve possible implementations using a wide range of heuristic optimization approaches. Our findings show that by leveraging agent-based simulations and heuristic algorithms for the graph node's location and population's spatial walk dynamics approximation one can use graph-based spatial representation without losing much of the model's accuracy and expressiveness. We investigate our framework for three real-world cases, achieving 94\% accuracy preservation, on average. Moreover, an analysis of synthetic cases shows the proposed framework is relatively robust for changes in both spatial and temporal properties.
Large Language Models (LLMs) are capable of generating text that is similar to or surpasses human quality. However, it is unclear whether LLMs tend to exhibit distinctive linguistic styles akin to how human authors do. Through a comprehensive linguistic analysis, we compare the vocabulary, Part-Of-Speech (POS) distribution, dependency distribution, and sentiment of texts generated by three of the most popular LLMS today (GPT-3.5, GPT-4, and Bard) to diverse inputs. The results point to significant linguistic variations which, in turn, enable us to attribute a given text to its LLM origin with a favorable 88\% accuracy using a simple off-the-shelf classification model. Theoretical and practical implications of this intriguing finding are discussed.
Large Language Models (LLMs), exemplified by ChatGPT, have significantly reshaped text generation, particularly in the realm of writing assistance. While ethical considerations underscore the importance of transparently acknowledging LLM use, especially in scientific communication, genuine acknowledgment remains infrequent. A potential avenue to encourage accurate acknowledging of LLM-assisted writing involves employing automated detectors. Our evaluation of four cutting-edge LLM-generated text detectors reveals their suboptimal performance compared to a simple ad-hoc detector designed to identify abrupt writing style changes around the time of LLM proliferation. We contend that the development of specialized detectors exclusively dedicated to LLM-assisted writing detection is necessary. Such detectors could play a crucial role in fostering more authentic recognition of LLM involvement in scientific communication, addressing the current challenges in acknowledgment practices.
Background: Postoperative nausea and vomiting (PONV) is a frequently observed complication in patients undergoing surgery under general anesthesia. Moreover, it is a frequent cause of distress and dissatisfaction during the early postoperative period. The tools used for predicting PONV at present have not yielded satisfactory results. Therefore, prognostic tools for the prediction of early and delayed PONV were developed in this study with the aim of achieving satisfactory predictive performance. Methods: The retrospective data of adult patients admitted to the post-anesthesia care unit after undergoing surgical procedures under general anesthesia at the Sheba Medical Center, Israel, between September 1, 2018, and September 1, 2023, were used in this study. An ensemble model of machine learning algorithms trained on the data of 54848 patients was developed. The k-fold cross-validation method was used followed by splitting the data to train and test sets that optimally preserve the sociodemographic features of the patients, such as age, sex, and smoking habits, using the Bee Colony algorithm. Findings: Among the 54848 patients, early and delayed PONV were observed in 2706 (4.93%) and 8218 (14.98%) patients, respectively. The proposed PONV prediction tools could correctly predict early and delayed PONV in 84.0% and 77.3% of cases, respectively, outperforming the second-best PONV prediction tool (Koivuranta score) by 13.4% and 12.9%, respectively. Feature importance analysis revealed that the performance of the proposed prediction tools aligned with previous clinical knowledge, indicating their utility. Interpretation: The machine learning-based tools developed in this study enabled improved PONV prediction, thereby facilitating personalized care and improved patient outcomes.
In the realm of machine and deep learning regression tasks, the role of effective feature engineering (FE) is pivotal in enhancing model performance. Traditional approaches of FE often rely on domain expertise to manually design features for machine learning models. In the context of deep learning models, the FE is embedded in the neural network's architecture, making it hard for interpretation. In this study, we propose to integrate symbolic regression (SR) as an FE process before a machine learning model to improve its performance. We show, through extensive experimentation on synthetic and real-world physics-related datasets, that the incorporation of SR-derived features significantly enhances the predictive capabilities of both machine and deep learning regression models with 34-86% root mean square error (RMSE) improvement in synthetic datasets and 4-11.5% improvement in real-world datasets. In addition, as a realistic use-case, we show the proposed method improves the machine learning performance in predicting superconducting critical temperatures based on Eliashberg theory by more than 20% in terms of RMSE. These results outline the potential of SR as an FE component in data-driven models.
Zoonotic disease transmission between animals and humans is a growing risk and the agricultural context acts as a likely point of transition, with individual heterogeneity acting as an important contributor. Thus, understanding the dynamics of disease spread in the wildlife-livestock interface is crucial for mitigating these risks of transmission. Specifically, the interactions between pigeons and in-door cows at dairy farms can lead to significant disease transmission and economic losses for farmers; putting livestock, adjacent human populations, and other wildlife species at risk. In this paper, we propose a novel spatio-temporal multi-pathogen model with continuous spatial movement. The model expands on the Susceptible-Exposed-Infected-Recovered-Dead (SEIRD) framework and accounts for both within-species and cross-species transmission of pathogens, as well as the exploration-exploitation movement dynamics of pigeons, which play a critical role in the spread of infection agents. In addition to model formulation, we also implement it as an agent-based simulation approach and use empirical field data to investigate different biologically realistic scenarios, evaluating the effect of various parameters on the epidemic spread. Namely, in agreement with theoretical expectations, the model predicts that the heterogeneity of the pigeons' movement dynamics can drastically affect both the magnitude and stability of outbreaks. In addition, joint infection by multiple pathogens can have an interactive effect unobservable in single-pathogen SIR models, reflecting a non-intuitive inhibition of the outbreak. Our findings highlight the impact of heterogeneity in host behavior on their pathogens and allow realistic predictions of outbreak dynamics in the multi-pathogen wildlife-livestock interface with consequences to zoonotic diseases in various systems.
Lung cancer is a leading cause of cancer-related deaths worldwide. The spread of the disease from its primary site to other parts of the lungs, known as metastasis, significantly impacts the course of treatment. Early identification of metastatic lesions is crucial for prompt and effective treatment, but conventional imaging techniques have limitations in detecting small metastases. In this study, we develop a bioclinical model for predicting the spatial spread of lung cancer's metastasis using a three-dimensional computed tomography (CT) scan. We used a three-layer biological model of cancer spread to predict locations with a high probability of metastasis colonization. We validated the bioclinical model on real-world data from 10 patients, showing promising 74% accuracy in the metastasis location prediction. Our study highlights the potential of the combination of biophysical and ML models to advance the way that lung cancer is diagnosed and treated, by providing a more comprehensive understanding of the spread of the disease and informing treatment decisions.
The aim of this study was to employ machine learning algorithms based on sensor behavior data for (1) early-onset detection of digital dermatitis (DD); and (2) DD prediction in dairy cows. With the ultimate goal to set-up early warning tools for DD prediction, which would than allow a better monitoring and management of DD under commercial settings, resulting in a decrease of DD prevalence and severity, while improving animal welfare. A machine learning model that is capable of predicting and detecting digital dermatitis in cows housed under free-stall conditions based on behavior sensor data has been purposed and tested in this exploratory study. The model for DD detection on day 0 of the appearance of the clinical signs has reached an accuracy of 79%, while the model for prediction of DD 2 days prior to the appearance of the first clinical signs has reached an accuracy of 64%. The proposed machine learning models could help to develop a real-time automated tool for monitoring and diagnostic of DD in lactating dairy cows, based on behavior sensor data under conventional dairy environments. Results showed that alterations in behavioral patterns at individual levels can be used as inputs in an early warning system for herd management in order to detect variances in health of individual cows.
Online academic profiles are used by scholars to reflect a desired image to their online audience. In Google Scholar, scholars can select a subset of co-authors for presentation in a central location on their profile using a social feature called the Co-authroship panel. In this work, we examine whether scientometrics and reciprocality can explain the observed selections. To this end, we scrape and thoroughly analyze a novel set of 120,000 Google Scholar profiles, ranging across four disciplines and various academic institutions. Our results suggest that scholars tend to favor co-authors with higher scientometrics over others for inclusion in their co-authorship panels. Interestingly, as one's own scientometrics are higher, the tendency to include co-authors with high scientometrics is diminishing. Furthermore, we find that reciprocality is central to explaining scholars' selections.
There is a critical need to develop and validate non-invasive animal-based indicators of affective states in livestock species, in order to integrate them into on-farm assessment protocols, potentially via the use of precision livestock farming (PLF) tools. One such promising approach is the use of vocal indicators. The acoustic structure of vocalizations and their functions were extensively studied in important livestock species, such as pigs, horses, poultry and goats, yet cattle remain understudied in this context to date. Cows were shown to produce two types vocalizations: low-frequency calls (LF), produced with the mouth closed, or partially closed, for close distance contacts and open mouth emitted high-frequency calls (HF), produced for long distance communication, with the latter considered to be largely associated with negative affective states. Moreover, cattle vocalizations were shown to contain information on individuality across a wide range of contexts, both negative and positive. Nowadays, dairy cows are facing a series of negative challenges and stressors in a typical production cycle, making vocalizations during negative affective states of special interest for research. One contribution of this study is providing the largest to date pre-processed (clean from noises) dataset of lactating adult multiparous dairy cows during negative affective states induced by visual isolation challenges. Here we present two computational frameworks - deep learning based and explainable machine learning based, to classify high and low-frequency cattle calls, and individual cow voice recognition. Our models in these two frameworks reached 87.2% and 89.4% accuracy for LF and HF classification, with 68.9% and 72.5% accuracy rates for the cow individual identification, respectively.