Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shane Halse

LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic

Jun 12, 2025

Weibing Zheng, Laurah Turner, Jess Kropczynski, Murat Ozer, Tri Nguyen, Shane Halse

Abstract:Clinical communication skills are critical in medical education, and practicing and assessing clinical communication skills on a scale is challenging. Although LLM-powered clinical scenario simulations have shown promise in enhancing medical students' clinical practice, providing automated and scalable clinical evaluation that follows nuanced physician judgment is difficult. This paper combines fuzzy logic and Large Language Model (LLM) and proposes LLM-as-a-Fuzzy-Judge to address the challenge of aligning the automated evaluation of medical students' clinical skills with subjective physicians' preferences. LLM-as-a-Fuzzy-Judge is an approach that LLM is fine-tuned to evaluate medical students' utterances within student-AI patient conversation scripts based on human annotations from four fuzzy sets, including Professionalism, Medical Relevance, Ethical Behavior, and Contextual Distraction. The methodology of this paper started from data collection from the LLM-powered medical education system, data annotation based on multidimensional fuzzy sets, followed by prompt engineering and the supervised fine-tuning (SFT) of the pre-trained LLMs using these human annotations. The results show that the LLM-as-a-Fuzzy-Judge achieves over 80\% accuracy, with major criteria items over 90\%, effectively leveraging fuzzy logic and LLM as a solution to deliver interpretable, human-aligned assessment. This work suggests the viability of leveraging fuzzy logic and LLM to align with human preferences, advances automated evaluation in medical education, and supports more robust assessment and judgment practices. The GitHub repository of this work is available at https://github.com/2sigmaEdTech/LLMAsAJudge

* 12 pages, 1 figure, 2025 IFSA World Congress NAFIPS Annual Meeting

Via

Access Paper or Ask Questions

A Semantic Approach to Negation Detection and Word Disambiguation with Natural Language Processing

Feb 22, 2023

Izunna Okpala, Guillermo Romera Rodriguez, Andrea Tapia, Shane Halse, Jess Kropczynski

Abstract:This study aims to demonstrate the methods for detecting negations in a sentence by uniquely evaluating the lexical structure of the text via word-sense disambiguation. The proposed framework examines all the unique features in the various expressions within a text to resolve the contextual usage of all tokens and decipher the effect of negation on sentiment analysis. The application of popular expression detectors skips this important step, thereby neglecting the root words caught in the web of negation and making text classification difficult for machine learning and sentiment analysis. This study adopts the Natural Language Processing (NLP) approach to discover and antonimize words that were negated for better accuracy in text classification using a knowledge base provided by an NLP library called WordHoard. Early results show that our initial analysis improved on traditional sentiment analysis, which sometimes neglects negations or assigns an inverse polarity score. The SentiWordNet analyzer was improved by 35%, the Vader analyzer by 20% and the TextBlob by 6%.

Via

Access Paper or Ask Questions

Machine Learning Methods for Evaluating Public Crisis: Meta-Analysis

Feb 05, 2023

Izunna Okpala, Shane Halse, Jess Kropczynski

Abstract:This study examines machine learning methods used in crisis management. Analyzing detected patterns from a crisis involves the collection and evaluation of historical or near-real-time datasets through automated means. This paper utilized the meta-review method to analyze scientific literature that utilized machine learning techniques to evaluate human actions during crises. Selected studies were condensed into themes and emerging trends using a systematic literature evaluation of published works accessed from three scholarly databases. Results show that data from social media was prominent in the evaluated articles with 27% usage, followed by disaster management, health (COVID) and crisis informatics, amongst many other themes. Additionally, the supervised machine learning method, with an application of 69% across the board, was predominant. The classification technique stood out among other machine learning tasks with 41% usage. The algorithms that played major roles were the Support Vector Machine, Neural Networks, Naive Bayes, and Random Forest, with 23%, 16%, 15%, and 12% contributions, respectively.

Via

Access Paper or Ask Questions