Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Edward Kim

Think Fast and Far: Long-Horizon Online POMDP Planning via Rapid State Sampling

Jun 03, 2026

Yuanchu Liang, Edward Kim, J. Arden Knoll, Wil Thomason, Zachary Kingston, Lydia E. Kavraki, Hanna Kurniawati

Abstract:Partially Observable Markov Decision Processes (POMDPs) are a general and principled framework for motion planning under uncertainty. Despite tremendous improvement in the scalability of POMDP solvers, long-horizon POMDPs remain difficult to solve. To alleviate the difficulty, this paper proposes a new approximate online POMDP solver, called Reference-Based Online POMDP Planning via Rapid State Space Sampling (ROP-RAS3). ROP-RAS3 uses novel extremely fast sampling-based motion planning techniques to sample the state space and generate a diverse set of macro actions online, which are then used to bias belief-space sampling and infer high-quality policies without requiring exhaustive enumeration of the action space -- a fundamental constraint for modern online POMDP solvers. ROP-RAS3 converges to a near-optimal reference-based solution at a rate that depends on the number of sampled actions, rather than the size of the action space. ROP-RAS3 is evaluated on various long-horizon POMDPs with up to 3000 lookahead steps and 35-dimensional state spaces, where the state, action and observation spaces can be continuous, discrete, or a hybrid of discrete and continuous. Although the reference-based optimal solution may not be the same as the optimal POMDP solution, empirical results indicate that in all of these problems, in terms of success rate, ROP-RAS3 outperforms other state-of-the-art methods by up to multiple folds. We also demonstrate the capability of our approach on a physical robot demonstration. This work extends the theory and empirical results of our ISRR24 paper. Code can be found at \texttt{https://github.com/RDLLab/ROPRAS3}.

* @inproceedings{Liang2026Thinking, title = {Think Fast and Far: Long-Horizon Online POMDP Planning via Rapid State Sampling}, author = {Yuanchu Liang and Edward Kim and J.Arden Knoll and Wil Thomason and Zachary Kingston and Lydia E. Kavraki and Hanna Kurniawati}, year = 2026, booktitle = {International Journal of Robotics Research (to appear)} }

Via

Access Paper or Ask Questions

A Wearable ECG Device for Differentiating Hypertrophic Cardiomyopathy from Acquired Left Ventricular Hypertrophy

Apr 14, 2026

Jiachen Li, Hanyu Zhu, Edward Kim, Shihao Li, Katherine Cavanaugh, Arpan Patel, Sovik De Sirkar, Mauricio Hong, Wei Li, Dongmei Chen

Abstract:Hypertrophic Cardiomyopathy (HCM) is a genetic heart disease affecting approximately 1 in 500 people and is the leading cause of sudden cardiac death in young athletes. Current diagnostic methods -- cardiovascular magnetic resonance (CMR), echocardiography, and genetic testing -- are limited by high costs, operator dependency, or insufficient accuracy, while standard electrocardiogram (ECG) analysis cannot reliably distinguish HCM from acquired left ventricular hypertrophy (LVH). This paper presents a wearable ECG device paired with a classification algorithm that differentiates HCM from acquired LVH using ECG signals alone. The portable device integrates a 3-lead electrode system, an AD8232 signal conditioning module, an Arduino Nano 33 BLE microcontroller, and a lithium polymer battery. The algorithm extracts two quantitative indices -- HCM Index~1 and HCM Index~2 -- from each heartbeat and classifies patients via dual statistical thresholds. Validation on 483 LVH patients (PhysioNet) and 29 HCM patients (digitized clinical records) yields 75.86\% sensitivity, 99.17\% specificity, and an F1-score of 80.00\%. Leave-one-out cross-validation confirms generalizability, with cross-validated sensitivity of 72.41\%, specificity of 98.96\%, and F1-score of 76.36\% (95\% confidence intervals reported). A digitization confound analysis demonstrates that the classification is driven by physiological cardiac features rather than data source artifacts. A simulated device acquisition chain analysis confirms that the wearable hardware's signal characteristics are compatible with the classification algorithm. The system offers a promising tool for affordable HCM screening in resource-limited settings.

Via

Access Paper or Ask Questions

Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI

Apr 06, 2026

Mingjie Li, Edward Kim, Yue Zhao, Ehsan Adeli, Kilian M. Pohl

Abstract:Learning a robust Variational Autoencoder (VAE) is a fundamental step for many deep learning applications in medical image analysis, such as MRI synthesizes. Existing brain VAEs predominantly focus on single-modality data (i.e., T1-weighted MRI), overlooking the complementary diagnostic value of other modalities like T2-weighted MRIs. Here, we propose a modality-aware and anatomically grounded 3D vector-quantized VAE (VQ-VAE) for reconstructing multi-modal brain MRIs. Called NeuroQuant, it first learns a shared latent representation across modalities using factorized multi-axis attention, which can capture relationships between distant brain regions. It then employs a dual-stream 3D encoder that explicitly separates the encoding of modality-invariant anatomical structures from modality-dependent appearance. Next, the anatomical encoding is discretized using a shared codebook and combined with modality-specific appearance features via Feature-wise Linear Modulation (FiLM) during the decoding phase. This entire approach is trained using a joint 2D/3D strategy in order to account for the slice-based acquisition of 3D MRI data. Extensive experiments on two multi-modal brain MRI datasets demonstrate that NeuroQuant achieves superior reconstruction fidelity compared to existing VAEs, enabling a scalable foundation for downstream generative modeling and cross-modal brain image analysis.

* CVPR Fingdings track

Via

Access Paper or Ask Questions

Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains

Mar 01, 2026

Manil Shrestha, Edward Kim

Abstract:Large Language Models (LLMs) are increasingly used for medical entity extraction, yet their confidence scores are often miscalibrated, limiting safe deployment in clinical settings. We present a conformal prediction framework that provides finite-sample coverage guarantees for LLM-based extraction across two clinical domains. First, we extract structured entities from 1,000 FDA drug labels across eight sections using GPT-4.1, verified via FactScore-based atomic statement evaluation (97.7\% accuracy over 128,906 entities). Second, we extract radiological entities from MIMIC-CXR reports using the RadGraph schema with GPT-4.1 and Llama-4-Maverick, evaluated against physician annotations (entity F1: 0.81 to 0.84). Our central finding is that miscalibration direction reverses across domains: on well-structured FDA labels, models are underconfident, requiring modest conformal thresholds ($τ\approx 0.06$), while on free-text radiology reports, models are overconfident, demanding strict thresholds ($τ$ up to 0.99). Despite this heterogeneity, conformal prediction achieves target coverage ($\geq 90\%$) in both settings with manageable rejection rates (9--13\%). These results demonstrate that calibration is not a global model property but depends on document structure, extraction category, and model architecture, motivating domain-specific conformal calibration for safe clinical deployment.

Via

Access Paper or Ask Questions

Partially Observable Reference Policy Programming: Solving POMDPs Sans Numerical Optimisation

Jul 16, 2025

Edward Kim, Hanna Kurniawati

Abstract:This paper proposes Partially Observable Reference Policy Programming, a novel anytime online approximate POMDP solver which samples meaningful future histories very deeply while simultaneously forcing a gradual policy update. We provide theoretical guarantees for the algorithm's underlying scheme which say that the performance loss is bounded by the average of the sampling approximation errors rather than the usual maximum, a crucial requirement given the sampling sparsity of online planning. Empirical evaluations on two large-scale problems with dynamically evolving environments -- including a helicopter emergency scenario in the Corsica region requiring approximately 150 planning steps -- corroborate the theoretical results and indicate that our solver considerably outperforms current online benchmarks.

* 8 pages, 2 tables, 3 figures. To be presented at International Joint Conference on Artificial Intelligence 2025

Via

Access Paper or Ask Questions

Command A: An Enterprise-Ready Large Language Model

Apr 01, 2025

Team Cohere, Aakanksha, Arash Ahmadian, Marwan Ahmed, Jay Alammar, Yazeed Alnumay, Sophia Althammer, Arkady Arkhangorodsky, Viraat Aryabumi, Dennis Aumiller(+216 more)

Figure 1 for Command A: An Enterprise-Ready Large Language Model

Figure 2 for Command A: An Enterprise-Ready Large Language Model

Figure 3 for Command A: An Enterprise-Ready Large Language Model

Figure 4 for Command A: An Enterprise-Ready Large Language Model

Abstract:In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Generation (RAG) capabilities with grounding and tool use to automate sophisticated business processes. These abilities are achieved through a decentralised training approach, including self-refinement algorithms and model merging techniques. We also include results for Command R7B which shares capability and architectural similarities to Command A. Weights for both models have been released for research purposes. This technical report details our original training pipeline and presents an extensive evaluation of our models across a suite of enterprise-relevant tasks and public benchmarks, demonstrating excellent performance and efficiency.

* 55 pages

Via

Access Paper or Ask Questions

Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search

Dec 16, 2024

Edward Kim, Manil Shrestha, Richard Foty, Tom DeLay, Vicki Seyfert-Margolis

Abstract:Creation and curation of knowledge graphs can accelerate disease discovery and analysis in real-world data. While disease ontologies aid in biological data annotation, codified categories (SNOMED-CT, ICD10, CPT) may not capture patient condition nuances or rare diseases. Multiple disease definitions across data sources complicate ontology mapping and disease clustering. We propose creating patient knowledge graphs using large language model extraction techniques, allowing data extraction via natural language rather than rigid ontological hierarchies. Our method maps to existing ontologies (MeSH, SNOMED-CT, RxNORM, HPO) to ground extracted entities. Using a large ambulatory care EHR database with 33.6M patients, we demonstrate our method through the patient search for Dravet syndrome, which received ICD10 recognition in October 2020. We describe our construction of patient-specific knowledge graphs and symptom-based patient searches. Using confirmed Dravet syndrome ICD10 codes as ground truth, we employ LLM-based entity extraction to characterize patients in grounded ontologies. We then apply this method to identify Beta-propeller protein-associated neurodegeneration (BPAN) patients, demonstrating real-world discovery where no ground truth exists.

* 10 pages, 3 figures, Work published in 4th Workshop on Knowledge Graphs and Big Data (In Conjunction with IEEE Big Data 2024)

Via

Access Paper or Ask Questions

Scaling Long-Horizon Online POMDP Planning via Rapid State Space Sampling

Nov 11, 2024

Yuanchu Liang, Edward Kim, Wil Thomason, Zachary Kingston, Hanna Kurniawati, Lydia E. Kavraki

Figure 1 for Scaling Long-Horizon Online POMDP Planning via Rapid State Space Sampling

Figure 2 for Scaling Long-Horizon Online POMDP Planning via Rapid State Space Sampling

Figure 3 for Scaling Long-Horizon Online POMDP Planning via Rapid State Space Sampling

Figure 4 for Scaling Long-Horizon Online POMDP Planning via Rapid State Space Sampling

* 16 pages, 4 tables, 1 figure. To be presented at the International Symposium of Robotics Research 2024

Via

Access Paper or Ask Questions

Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements

Oct 22, 2024

Isamu Isozaki, Manil Shrestha, Rick Console, Edward Kim

Figure 1 for Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements

Figure 2 for Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements

Figure 3 for Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements

Figure 4 for Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements

Abstract:Hacking poses a significant threat to cybersecurity, inflicting billions of dollars in damages annually. To mitigate these risks, ethical hacking, or penetration testing, is employed to identify vulnerabilities in systems and networks. Recent advancements in large language models (LLMs) have shown potential across various domains, including cybersecurity. However, there is currently no comprehensive, open, end-to-end automated penetration testing benchmark to drive progress and evaluate the capabilities of these models in security contexts. This paper introduces a novel open benchmark for LLM-based automated penetration testing, addressing this critical gap. We first evaluate the performance of LLMs, including GPT-4o and Llama 3.1-405B, using the state-of-the-art PentestGPT tool. Our findings reveal that while Llama 3.1 demonstrates an edge over GPT-4o, both models currently fall short of performing fully automated, end-to-end penetration testing. Next, we advance the state-of-the-art and present ablation studies that provide insights into improving the PentestGPT tool. Our research illuminates the challenges LLMs face in each aspect of Pentesting, e.g. enumeration, exploitation, and privilege escalation. This work contributes to the growing body of knowledge on AI-assisted cybersecurity and lays the foundation for future research in automated penetration testing using large language models.

* Main Paper 1-9 pages, Supplementary Materials: 10-17, 13 figures

Via

Access Paper or Ask Questions

Secure Multiparty Generative AI

Sep 27, 2024

Manil Shrestha, Yashodha Ravichandran, Edward Kim

Figure 1 for Secure Multiparty Generative AI

Figure 2 for Secure Multiparty Generative AI

Figure 3 for Secure Multiparty Generative AI

Figure 4 for Secure Multiparty Generative AI

Abstract:As usage of generative AI tools skyrockets, the amount of sensitive information being exposed to these models and centralized model providers is alarming. For example, confidential source code from Samsung suffered a data leak as the text prompt to ChatGPT encountered data leakage. An increasing number of companies are restricting the use of LLMs (Apple, Verizon, JPMorgan Chase, etc.) due to data leakage or confidentiality issues. Also, an increasing number of centralized generative model providers are restricting, filtering, aligning, or censoring what can be used. Midjourney and RunwayML, two of the major image generation platforms, restrict the prompts to their system via prompt filtering. Certain political figures are restricted from image generation, as well as words associated with women's health care, rights, and abortion. In our research, we present a secure and private methodology for generative artificial intelligence that does not expose sensitive data or models to third-party AI providers. Our work modifies the key building block of modern generative AI algorithms, e.g. the transformer, and introduces confidential and verifiable multiparty computations in a decentralized network to maintain the 1) privacy of the user input and obfuscation to the output of the model, and 2) introduce privacy to the model itself. Additionally, the sharding process reduces the computational burden on any one node, enabling the distribution of resources of large generative AI processes across multiple, smaller nodes. We show that as long as there exists one honest node in the decentralized computation, security is maintained. We also show that the inference process will still succeed if only a majority of the nodes in the computation are successful. Thus, our method offers both secure and verifiable computation in a decentralized network.

Via

Access Paper or Ask Questions