We introduce an online mathematical framework for survival analysis, allowing real time adaptation to dynamic environments and censored data. This framework enables the estimation of event time distributions through an optimal second order online convex optimization algorithm-Online Newton Step (ONS). This approach, previously unexplored, presents substantial advantages, including explicit algorithms with non-asymptotic convergence guarantees. Moreover, we analyze the selection of ONS hyperparameters, which depends on the exp-concavity property and has a significant influence on the regret bound. We propose a stochastic approach that guarantees logarithmic stochastic regret for ONS. Additionally, we introduce an adaptive aggregation method that ensures robustness in hyperparameter selection while maintaining fast regret bounds. The findings of this paper can extend beyond the survival analysis field, and are relevant for any case characterized by poor exp-concavity and unstable ONS. Finally, these assertions are illustrated by simulation experiments.
For solving problems from the domain of Mobility-on-Demand (MoD), we often need to connect vehicle plans into plans spanning longer time, a process we call plan chaining. As we show in this work, chaining of the plans can be used to reduce the size of MoD providers' fleet (fleet-sizing problem) but also to reduce the total driven distance by providing high-quality vehicle dispatching solutions in MoD systems. Recently, a solution that uses this principle has been proposed to solve the fleet-sizing problem. The method does not consider the time flexibility of the plans. Instead, plans are fixed in time and cannot be delayed. However, time flexibility is an essential property of all vehicle problems with time windows. This work presents a new plan chaining formulation that considers delays as allowed by the time windows and a solution method for solving it. Moreover, we prove that the proposed plan chaining method is optimal, and we analyze its complexity. Finally, we list some practical applications and perform a demonstration for one of them: a new heuristic vehicle dispatching method for solving the static dial-a-ride problem. The demonstration results show that our proposed method provides a better solution than the two heuristic baselines for the majority of instances that cannot be solved optimally. At the same time, our method does not have the largest computational time requirements compared to the baselines. Therefore, we conclude that the proposed optimal chaining method provides not only theoretically sound results but is also practically applicable.
This study proposes a method based on fully convolutional neural networks (FCNs) to identify migratory birds from their songs, with the objective of recognizing which birds pass through certain areas and at what time. To determine the best FCN architecture, extensive experimentation was conducted through a grid search, exploring the optimal depth, width, and activation function of the network. The results showed that the optimal number of filters is 400 in the widest layer, with 4 convolutional blocks with maxpooling and an adaptive activation function. The proposed FCN offers a significant advantage over other techniques, as it can recognize the sound of a bird in audio of any length with an accuracy greater than 85%. Furthermore, due to its architecture, the network can detect more than one species from audio and can carry out near-real-time sound recognition. Additionally, the proposed method is lightweight, making it ideal for deployment and use in IoT devices. The study also presents a comparative analysis of the proposed method against other techniques, demonstrating an improvement of over 67% in the best-case scenario. These findings contribute to advancing the field of bird sound recognition and provide valuable insights into the practical application of FCNs in real-world scenarios.
We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-as-betting framework and provides a non-asymptotic $\alpha$-level test across any stopping time. PEAK is computationally tractable and efficiently rejects hypotheses that are incorrect across all potential distributions that satisfy our nonparametric assumption, enabling joint composite hypothesis testing on multiple streams of data. We numerically validate our theoretical findings under the best arm identification and threshold identification in the bandit setting, illustrating the computational efficiency of our method against state-of-the-art testing methods.
The use of machine-learning techniques has grown in numerous research areas. Currently, it is also widely used in statistics, including the official statistics for data collection (e.g. satellite imagery, web scraping and text mining, data cleaning, integration and imputation) but also for data analysis. However, the usage of these methods in survey sampling including small area estimation is still very limited. Therefore, we propose a predictor supported by these algorithms which can be used to predict any population or subpopulation characteristics based on cross-sectional and longitudinal data. Machine learning methods have already been shown to be very powerful in identifying and modelling complex and nonlinear relationships between the variables, which means that they have very good properties in case of strong departures from the classic assumptions. Therefore, we analyse the performance of our proposal under a different set-up, in our opinion of greater importance in real-life surveys. We study only small departures from the assumed model, to show that our proposal is a good alternative in this case as well, even in comparison with optimal methods under the model. What is more, we propose the method of the accuracy estimation of machine learning predictors, giving the possibility of the accuracy comparison with classic methods, where the accuracy is measured as in survey sampling practice. The solution of this problem is indicated in the literature as one of the key issues in integration of these approaches. The simulation studies are based on a real, longitudinal dataset, freely available from the Polish Local Data Bank, where the prediction problem of subpopulation characteristics in the last period, with "borrowing strength" from other subpopulations and time periods, is considered.
Epilepsy is a prevalent neurological disorder that affects approximately 1% of the global population. Around 30-40% of patients do not respond to pharmacological treatment, leading to a significant negative impact on their quality of life. Closed-loop deep brain stimulation (DBS) is a promising treatment for individuals who do not respond to medical therapy. To achieve effective seizure control, algorithms play an important role in identifying relevant electrographic biomarkers from local field potentials (LFPs) to determine the optimal stimulation timing. In this regard, the detection and classification of events from ongoing brain activity, while achieving low power through computationally unexpensive implementations, represents a major challenge in the field. To address this challenge, we here present two lightweight algorithms, the ZdensityRODE and the AMPDE, for identifying relevant events from LFPs by utilizing semantic segmentation, which involves extracting different levels of information from the LFP and relevant events from it. The algorithms performance was validated against epileptiform activity induced by 4-minopyridine in mouse hippocampus-cortex (CTX) slices and recorded via microelectrode array, as a case study. The ZdensityRODE algorithm showcased a precision and recall of 93% for ictal event detection and 42% precision for interictal event detection, while the AMPDE algorithm attained a precision of 96% and recall of 90% for ictal event detection and 54% precision for interictal event detection. While initially trained specifically for detection of ictal activity, these algorithms can be fine-tuned for improved interictal detection, aiming at seizure prediction. Our results suggest that these algorithms can effectively capture epileptiform activity; their light weight opens new possibilities for real-time seizure detection and seizure prediction and control.
Link prediction is a crucial task in graph machine learning, where the goal is to infer missing or future links within a graph. Traditional approaches leverage heuristic methods based on widely observed connectivity patterns, offering broad applicability and generalizability without the need for model training. Despite their utility, these methods are limited by their reliance on human-derived heuristics and lack the adaptability of data-driven approaches. Conversely, parametric link predictors excel in automatically learning the connectivity patterns from data and achieving state-of-the-art but fail short to directly transfer across different graphs. Instead, it requires the cost of extensive training and hyperparameter optimization to adapt to the target graph. In this work, we introduce the Universal Link Predictor (UniLP), a novel model that combines the generalizability of heuristic approaches with the pattern learning capabilities of parametric models. UniLP is designed to autonomously identify connectivity patterns across diverse graphs, ready for immediate application to any unseen graph dataset without targeted training. We address the challenge of conflicting connectivity patterns-arising from the unique distributions of different graphs-through the implementation of In-context Learning (ICL). This approach allows UniLP to dynamically adjust to various target graphs based on contextual demonstrations, thereby avoiding negative transfer. Through rigorous experimentation, we demonstrate UniLP's effectiveness in adapting to new, unseen graphs at test time, showcasing its ability to perform comparably or even outperform parametric models that have been finetuned for specific datasets. Our findings highlight UniLP's potential to set a new standard in link prediction, combining the strengths of heuristic and parametric methods in a single, versatile framework.
Image segmentation in total knee arthroplasty is crucial for precise preoperative planning and accurate implant positioning, leading to improved surgical outcomes and patient satisfaction. The biggest challenges of image segmentation in total knee arthroplasty include accurately delineating complex anatomical structures, dealing with image artifacts and noise, and developing robust algorithms that can handle anatomical variations and pathologies commonly encountered in patients. The potential of using machine learning for image segmentation in total knee arthroplasty lies in its ability to improve segmentation accuracy, automate the process, and provide real-time assistance to surgeons, leading to enhanced surgical planning, implant placement, and patient outcomes. This paper proposes a methodology to use deep learning for robust and real-time total knee arthroplasty image segmentation. The deep learning model, trained on a large dataset, demonstrates outstanding performance in accurately segmenting both the implanted femur and tibia, achieving an impressive mean-Average-Precision (mAP) of 88.83 when compared to the ground truth while also achieving a real-time segmented speed of 20 frames per second (fps). We have introduced a novel methodology for segmenting implanted knee fluoroscopic or x-ray images that showcases remarkable levels of accuracy and speed, paving the way for various potential extended applications.
Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a noncontact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising, obtaining an accurate delineation of the tumor area.
In this work, we introduce a novel paradigm called Simulated Overparametrization (SOP). SOP merges the computational efficiency of compact models with the advanced learning proficiencies of overparameterized models. SOP proposes a unique approach to model training and inference, where a model with a significantly larger number of parameters is trained in such a way that a smaller, efficient subset of these parameters is used for the actual computation during inference. Building upon this framework, we present a novel, architecture agnostic algorithm called "majority kernels", which seamlessly integrates with predominant architectures, including Transformer models. Majority kernels enables the simulated training of overparameterized models, resulting in performance gains across architectures and tasks. Furthermore, our approach adds minimal overhead to the cost incurred (wall clock time) at training time. The proposed approach shows strong performance on a wide variety of datasets and models, even outperforming strong baselines such as combinatorial optimization methods based on submodular optimization.