Abstract:Large Language Models (LLMs) have shown promise in structured prediction tasks, including regression, but existing approaches primarily focus on point estimates and lack systematic comparison across different methods. We investigate probabilistic regression using LLMs for unstructured inputs, addressing challenging text-to-distribution prediction tasks such as price estimation where both nuanced text understanding and uncertainty quantification are critical. We propose a novel quantile regression approach that enables LLMs to produce full predictive distributions, improving upon traditional point estimates. Through extensive experiments across three diverse price prediction datasets, we demonstrate that a Mistral-7B model fine-tuned with quantile heads significantly outperforms traditional approaches for both point and distributional estimations, as measured by three established metrics each for prediction accuracy and distributional calibration. Our systematic comparison of LLM approaches, model architectures, training approaches, and data scaling reveals that Mistral-7B consistently outperforms encoder architectures, embedding-based methods, and few-shot learning methods. Our experiments also reveal the effectiveness of LLM-assisted label correction in achieving human-level accuracy without systematic bias. Our curated datasets are made available at https://github.com/vnik18/llm-price-quantile-reg/ to support future research.
Abstract:Responding rapidly to a patient who is demonstrating signs of imminent clinical deterioration is a basic tenet of patient care. This gave rise to a patient safety intervention philosophy known as a Rapid Response System (RRS), whereby a patient who meets a pre-determined set of criteria for imminent clinical deterioration is immediately assessed and treated, with the goal of mitigating the deterioration and preventing intensive care unit (ICU) transfer, cardiac arrest, or death. While RRSs have been widely adopted, multiple systematic reviews have failed to find evidence of their effectiveness. Typically, RRS criteria are simple, expert (consensus) defined rules that identify significant physiologic abnormalities or are based on clinical observation. If one can find a pattern in the patient's data earlier than the onset of the physiologic derangement manifest in the current criteria, intervention strategies might be more effective. In this paper, we apply machine learning to electronic medical records (EMR) to infer if patients are at risk for clinical deterioration. Our models are more sensitive and offer greater advance prediction time compared with existing rule-based methods that are currently utilized in hospitals. Our results warrant further testing in the field; if successful, hospitals can integrate our approach into their existing IT systems and use the alerts generated by the model to prevent ICU transfer, cardiac arrest, or death, or to reduce the ICU length of stay.