Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Artur Dubrawski

Carnegie Mellon University, Auton Lab, The Robotics Institute, Pittsburgh, USA

Automatic Cannulation of Femoral Vessels in a Porcine Shock Model

Jun 17, 2025

Nico Zevallos, Cecilia G. Morales, Andrew Orekhov, Tejas Rane, Hernando Gomez, Francis X. Guyette, Michael R. Pinsky, John Galeotti, Artur Dubrawski, Howie Choset

Abstract:Rapid and reliable vascular access is critical in trauma and critical care. Central vascular catheterization enables high-volume resuscitation, hemodynamic monitoring, and advanced interventions like ECMO and REBOA. While peripheral access is common, central access is often necessary but requires specialized ultrasound-guided skills, posing challenges in prehospital settings. The complexity arises from deep target vessels and the precision needed for needle placement. Traditional techniques, like the Seldinger method, demand expertise to avoid complications. Despite its importance, ultrasound-guided central access is underutilized due to limited field expertise. While autonomous needle insertion has been explored for peripheral vessels, only semi-autonomous methods exist for femoral access. This work advances toward full automation, integrating robotic ultrasound for minimally invasive emergency procedures. Our key contribution is the successful femoral vein and artery cannulation in a porcine hemorrhagic shock model.

* Hamlyn Symposium on Medical Robotics 2025
* 2 pages, 2 figures, conference

Via

Access Paper or Ask Questions

Frame-Level Real-Time Assessment of Stroke Rehabilitation Exercises from Video-Level Labeled Data: Task-Specific vs. Foundation Models

Jun 04, 2025

Gonçalo Mesquita, Ana Rita Cóias, Artur Dubrawski, Alexandre Bernardino

Abstract:The growing demands of stroke rehabilitation have increased the need for solutions to support autonomous exercising. Virtual coaches can provide real-time exercise feedback from video data, helping patients improve motor function and keep engagement. However, training real-time motion analysis systems demands frame-level annotations, which are time-consuming and costly to obtain. In this work, we present a framework that learns to classify individual frames from video-level annotations for real-time assessment of compensatory motions in rehabilitation exercises. We use a gradient-based technique and a pseudo-label selection method to create frame-level pseudo-labels for training a frame-level classifier. We leverage pre-trained task-specific models - Action Transformer, SkateFormer - and a foundation model - MOMENT - for pseudo-label generation, aiming to improve generalization to new patients. To validate the approach, we use the \textit{SERE} dataset with 18 post-stroke patients performing five rehabilitation exercises annotated on compensatory motions. MOMENT achieves better video-level assessment results (AUC = $73\%$), outperforming the baseline LSTM (AUC = $58\%$). The Action Transformer, with the Integrated Gradient technique, leads to better outcomes (AUC = $72\%$) for frame-level assessment, outperforming the baseline trained with ground truth frame-level labeling (AUC = $69\%$). We show that our proposed approach with pre-trained models enhances model generalization ability and facilitates the customization to new patients, reducing the demands of data labeling.

Via

Access Paper or Ask Questions

TimeSeriesGym: A Scalable Benchmark for (Time Series) Machine Learning Engineering Agents

May 19, 2025

Yifu Cai, Xinyu Li, Mononito Goswami, Michał Wiliński, Gus Welter, Artur Dubrawski

Abstract:We introduce TimeSeriesGym, a scalable benchmarking framework for evaluating Artificial Intelligence (AI) agents on time series machine learning engineering challenges. Existing benchmarks lack scalability, focus narrowly on model building in well-defined settings, and evaluate only a limited set of research artifacts (e.g., CSV submission files). To make AI agent benchmarking more relevant to the practice of machine learning engineering, our framework scales along two critical dimensions. First, recognizing that effective ML engineering requires a range of diverse skills, TimeSeriesGym incorporates challenges from diverse sources spanning multiple domains and tasks. We design challenges to evaluate both isolated capabilities (including data handling, understanding research repositories, and code translation) and their combinations, and rather than addressing each challenge independently, we develop tools that support designing multiple challenges at scale. Second, we implement evaluation mechanisms for multiple research artifacts, including submission files, code, and models, using both precise numeric measures and more flexible LLM-based evaluation approaches. This dual strategy balances objective assessment with contextual judgment. Although our initial focus is on time series applications, our framework can be readily extended to other data modalities, broadly enhancing the comprehensiveness and practical utility of agentic AI evaluation. We open-source our benchmarking framework to facilitate future research on the ML engineering capabilities of AI agents.

* Open source code available at https://github.com/moment-timeseries-foundation-model/TimeSeriesGym. YC, XL, MG and MW contributed equally, and should be considered joint first authors

Via

Access Paper or Ask Questions

Investigating Compositional Reasoning in Time Series Foundation Models

Feb 09, 2025

Willa Potosnak, Cristian Challu, Mononito Goswami, Kin G. Olivares, Michał Wiliński, Nina Żukowska, Artur Dubrawski

Figure 1 for Investigating Compositional Reasoning in Time Series Foundation Models

Figure 2 for Investigating Compositional Reasoning in Time Series Foundation Models

Figure 3 for Investigating Compositional Reasoning in Time Series Foundation Models

Figure 4 for Investigating Compositional Reasoning in Time Series Foundation Models

Abstract:Large pre-trained time series foundation models (TSFMs) have demonstrated promising zero-shot performance across a wide range of domains. However, a question remains: Do TSFMs succeed solely by memorizing training patterns, or do they possess the ability to reason? While reasoning is a topic of great interest in the study of Large Language Models (LLMs), it is undefined and largely unexplored in the context of TSFMs. In this work, inspired by language modeling literature, we formally define compositional reasoning in forecasting and distinguish it from in-distribution generalization. We evaluate the reasoning and generalization capabilities of 23 popular deep learning forecasting models on multiple synthetic and real-world datasets. Additionally, through controlled studies, we systematically examine which design choices in TSFMs contribute to improved reasoning abilities. Our study yields key insights into the impact of TSFM architecture design on compositional reasoning and generalization. We find that patch-based Transformers have the best reasoning performance, closely followed by residualized MLP-based architectures, which are 97\% less computationally complex in terms of FLOPs and 86\% smaller in terms of the number of trainable parameters. Interestingly, in some zero-shot out-of-distribution scenarios, these models can outperform moving average and exponential smoothing statistical baselines trained on in-distribution data. Only a few design choices, such as the tokenization method, had a significant (negative) impact on Transformer model performance.

Via

Access Paper or Ask Questions

Multimodal Structure Preservation Learning

Oct 29, 2024

Chang Liu, Jieshi Chen, Lee H. Harrison, Artur Dubrawski

Figure 1 for Multimodal Structure Preservation Learning

Figure 2 for Multimodal Structure Preservation Learning

Figure 3 for Multimodal Structure Preservation Learning

Figure 4 for Multimodal Structure Preservation Learning

Abstract:When selecting data to build machine learning models in practical applications, factors such as availability, acquisition cost, and discriminatory power are crucial considerations. Different data modalities often capture unique aspects of the underlying phenomenon, making their utilities complementary. On the other hand, some sources of data host structural information that is key to their value. Hence, the utility of one data type can sometimes be enhanced by matching the structure of another. We propose Multimodal Structure Preservation Learning (MSPL) as a novel method of learning data representations that leverages the clustering structure provided by one data modality to enhance the utility of data from another modality. We demonstrate the effectiveness of MSPL in uncovering latent structures in synthetic time series data and recovering clusters from whole genome sequencing and antimicrobial resistance data using mass spectrometry data in support of epidemiology applications. The results show that MSPL can imbue the learned features with external structures and help reap the beneficial synergies occurring across disparate data modalities.

Via

Access Paper or Ask Questions

TimeSeriesExam: A time series understanding exam

Oct 18, 2024

Yifu Cai, Arjun Choudhry, Mononito Goswami, Artur Dubrawski

Figure 1 for TimeSeriesExam: A time series understanding exam

Figure 2 for TimeSeriesExam: A time series understanding exam

Figure 3 for TimeSeriesExam: A time series understanding exam

Figure 4 for TimeSeriesExam: A time series understanding exam

Abstract:Large Language Models (LLMs) have recently demonstrated a remarkable ability to model time series data. These capabilities can be partly explained if LLMs understand basic time series concepts. However, our knowledge of what these models understand about time series data remains relatively limited. To address this gap, we introduce TimeSeriesExam, a configurable and scalable multiple-choice question exam designed to assess LLMs across five core time series understanding categories: pattern recognition, noise understanding, similarity analysis, anomaly detection, and causality analysis. TimeSeriesExam comprises of over 700 questions, procedurally generated using 104 carefully curated templates and iteratively refined to balance difficulty and their ability to discriminate good from bad models. We test 7 state-of-the-art LLMs on the TimeSeriesExam and provide the first comprehensive evaluation of their time series understanding abilities. Our results suggest that closed-source models such as GPT-4 and Gemini understand simple time series concepts significantly better than their open-source counterparts, while all models struggle with complex concepts such as causality analysis. We believe that the ability to programatically generate questions is fundamental to assessing and improving LLM's ability to understand and reason about time series data.

* Accepted at NeurIPS'24 Time Series in the Age of Large Models Workshop

Via

Access Paper or Ask Questions

Implicit Reasoning in Deep Time Series Forecasting

Sep 18, 2024

Willa Potosnak, Cristian Challu, Mononito Goswami, Michał Wiliński, Nina Żukowska, Artur Dubrawski

Figure 1 for Implicit Reasoning in Deep Time Series Forecasting

Figure 2 for Implicit Reasoning in Deep Time Series Forecasting

Figure 3 for Implicit Reasoning in Deep Time Series Forecasting

Figure 4 for Implicit Reasoning in Deep Time Series Forecasting

Abstract:Recently, time series foundation models have shown promising zero-shot forecasting performance on time series from a wide range of domains. However, it remains unclear whether their success stems from a true understanding of temporal dynamics or simply from memorizing the training data. While implicit reasoning in language models has been studied, similar evaluations for time series models have been largely unexplored. This work takes an initial step toward assessing the reasoning abilities of deep time series forecasting models. We find that certain linear, MLP-based, and patch-based Transformer models generalize effectively in systematically orchestrated out-of-distribution scenarios, suggesting underexplored reasoning capabilities beyond simple pattern memorization.

Via

Access Paper or Ask Questions

Bifurcation Identification for Ultrasound-driven Robotic Cannulation

Sep 10, 2024

Cecilia G. Morales, Dhruv Srikanth, Jack H. Good, Keith A. Dufendach, Artur Dubrawski

Figure 1 for Bifurcation Identification for Ultrasound-driven Robotic Cannulation

Figure 2 for Bifurcation Identification for Ultrasound-driven Robotic Cannulation

Figure 3 for Bifurcation Identification for Ultrasound-driven Robotic Cannulation

Figure 4 for Bifurcation Identification for Ultrasound-driven Robotic Cannulation

Abstract:In trauma and critical care settings, rapid and precise intravascular access is key to patients' survival. Our research aims at ensuring this access, even when skilled medical personnel are not readily available. Vessel bifurcations are anatomical landmarks that can guide the safe placement of catheters or needles during medical procedures. Although ultrasound is advantageous in navigating anatomical landmarks in emergency scenarios due to its portability and safety, to our knowledge no existing algorithm can autonomously extract vessel bifurcations using ultrasound images. This is primarily due to the limited availability of ground truth data, in particular, data from live subjects, needed for training and validating reliable models. Researchers often resort to using data from anatomical phantoms or simulations. We introduce BIFURC, Bifurcation Identification for Ultrasound-driven Robot Cannulation, a novel algorithm that identifies vessel bifurcations and provides optimal needle insertion sites for an autonomous robotic cannulation system. BIFURC integrates expert knowledge with deep learning techniques to efficiently detect vessel bifurcations within the femoral region and can be trained on a limited amount of in-vivo data. We evaluated our algorithm using a medical phantom as well as real-world experiments involving live pigs. In all cases, BIFURC consistently identified bifurcation points and needle insertion locations in alignment with those identified by expert clinicians.

* IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024

Via

Access Paper or Ask Questions

A SAT-based approach to rigorous verification of Bayesian networks

Aug 02, 2024

Ignacy Stępka, Nicholas Gisolfi, Artur Dubrawski

Figure 1 for A SAT-based approach to rigorous verification of Bayesian networks

Figure 2 for A SAT-based approach to rigorous verification of Bayesian networks

Figure 3 for A SAT-based approach to rigorous verification of Bayesian networks

Figure 4 for A SAT-based approach to rigorous verification of Bayesian networks

Abstract:Recent advancements in machine learning have accelerated its widespread adoption across various real-world applications. However, in safety-critical domains, the deployment of machine learning models is riddled with challenges due to their complexity, lack of interpretability, and absence of formal guarantees regarding their behavior. In this paper, we introduce a verification framework tailored for Bayesian networks, designed to address these drawbacks. Our framework comprises two key components: (1) a two-step compilation and encoding scheme that translates Bayesian networks into Boolean logic literals, and (2) formal verification queries that leverage these literals to verify various properties encoded as constraints. Specifically, we introduce two verification queries: if-then rules (ITR) and feature monotonicity (FMO). We benchmark the efficiency of our verification scheme and demonstrate its practical utility in real-world scenarios.

* Workshop on Explainable and Robust AI for Industry 4.0 & 5.0 (X-RAI) at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (2024)

Via

Access Paper or Ask Questions

Enhanced Uncertainty Estimation in Ultrasound Image Segmentation with MSU-Net

Jul 31, 2024

Rohini Banerjee, Cecilia G. Morales, Artur Dubrawski

Figure 1 for Enhanced Uncertainty Estimation in Ultrasound Image Segmentation with MSU-Net

Figure 2 for Enhanced Uncertainty Estimation in Ultrasound Image Segmentation with MSU-Net

Figure 3 for Enhanced Uncertainty Estimation in Ultrasound Image Segmentation with MSU-Net

Figure 4 for Enhanced Uncertainty Estimation in Ultrasound Image Segmentation with MSU-Net

Abstract:Efficient intravascular access in trauma and critical care significantly impacts patient outcomes. However, the availability of skilled medical personnel in austere environments is often limited. Autonomous robotic ultrasound systems can aid in needle insertion for medication delivery and support non-experts in such tasks. Despite advances in autonomous needle insertion, inaccuracies in vessel segmentation predictions pose risks. Understanding the uncertainty of predictive models in ultrasound imaging is crucial for assessing their reliability. We introduce MSU-Net, a novel multistage approach for training an ensemble of U-Nets to yield accurate ultrasound image segmentation maps. We demonstrate substantial improvements, 18.1% over a single Monte Carlo U-Net, enhancing uncertainty evaluations, model transparency, and trustworthiness. By highlighting areas of model certainty, MSU-Net can guide safe needle insertions, empowering non-experts to accomplish such tasks.

* Accepted for the 5th International Workshop of Advances in Simplifying Medical UltraSound (ASMUS), held in conjunction with MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention

Via

Access Paper or Ask Questions