Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thomas Yang

What's in a Name? Morphological Shortcuts by LLMs in Pharmacology

Jun 04, 2026

Kaijie Mo, Thomas Yang, Chantal Shaib, Qing Yao, William Rudman, Ramez Kouzy, Kanishka Misra, Byron C. Wallace, Junyi Jessy Li

Abstract:The morphological form of a word can often give cues to its meaning, but purely relying on these mappings can lead to overgeneralization in high-stakes domains. In the medical domain, for instance, LLMs can confidently reason about fictitious drugs from their affixes alone (e.g., wugcillin) and generate plausible-looking clinical content. We present a behavioral and mechanistic study of LLM "affix heuristics" in pharmacology. Using fictitious drug names built from real affixes, we show that affix signals alone elicit class-level pharmacological responses. We introduce a framework for identifying whether a model's drug semantics are driven mainly by the affix, the stem, or the drug name as a whole. Applied across 653 drugs, our framework reveals that models often induce drug meaning primarily through affix cues, yet rarely explicitly indicate this reliance, and sometimes incorrectly conflate properties among affix-sharing drugs. Activation patching across models further localizes this behavior to early-mid layers. These findings show that morphological shortcuts pose a subtle but measurable risk to safety.

* 22 pages

Via

Access Paper or Ask Questions

PILOT: A Data-Free Continual Learning Approach for Real-Time Semantic Segmentation via Boundary Guidance

May 26, 2026

Yujing Zhou, Prashant Shekhar, Thomas Yang, Yongxin Liu

Abstract:Real-time semantic segmentation models offer an excellent balance between accuracy and inference speed. However, deploying these models in dynamic real world environments often requires the ability to learn novel classes incrementally without retraining on the entire dataset. This capability is known as continual learning. In this regard, the standard fine-tuning methods in deep learning often fail due to catastrophic forgetting, where the model learns new information but forgets previously trained and learned classes. Contributing to this crucial domain, the current paper proposes a novel continual learning framework tailored for PIDNet, which is a widely cited state-of-the-art real-time semantic segmentation model. Our method, PILOT(Parallel Incremental Learning Over Time), introduces a real-time and lightweight strategy by implementing a parallel Derivative-branch (D-branch) designed to capture the high frequency boundary information of novel classes while freezing the trained parameters of the original segmentation network. This novel setup allows the model to adapt to new semantic categories while preserving the knowledge of previously learned classes. By using only data associated with the new class, our model significantly reduces training overhead. Experimental results demonstrate that our approach successfully segments new classes while maintaining high mean Intersection over Union (mIoU) on the original base classes, thereby comfortably outperforming all major continual learning approaches in this domain. Overall, PILOT is shown to effectively mitigate catastrophic forgetting with minimal impact on inference latency, thus maintaining real-time performance.

Via

Access Paper or Ask Questions

Zero-Bias Deep Learning for Accurate Identification of Internet of Things (IoT) Devices

Aug 27, 2020

Yongxin Liu, Jian Wang, Jianqiang Li, Houbing Song, Thomas Yang, Shuteng Niu, Zhong Ming

Figure 1 for Zero-Bias Deep Learning for Accurate Identification of Internet of Things (IoT) Devices

Figure 2 for Zero-Bias Deep Learning for Accurate Identification of Internet of Things (IoT) Devices

Figure 3 for Zero-Bias Deep Learning for Accurate Identification of Internet of Things (IoT) Devices

Figure 4 for Zero-Bias Deep Learning for Accurate Identification of Internet of Things (IoT) Devices

Abstract:The Internet of Things (IoT) provides applications and services that would otherwise not be possible. However, the open nature of IoT make it vulnerable to cybersecurity threats. Especially, identity spoofing attacks, where an adversary passively listens to existing radio communications and then mimic the identity of legitimate devices to conduct malicious activities. Existing solutions employ cryptographic signatures to verify the trustworthiness of received information. In prevalent IoT, secret keys for cryptography can potentially be disclosed and disable the verification mechanism. Non-cryptographic device verification is needed to ensure trustworthy IoT. In this paper, we propose an enhanced deep learning framework for IoT device identification using physical layer signals. Specifically, we enable our framework to report unseen IoT devices and introduce the zero-bias layer to deep neural networks to increase robustness and interpretability. We have evaluated the effectiveness of the proposed framework using real data from ADS-B (Automatic Dependent Surveillance-Broadcast), an application of IoT in aviation. The proposed framework has the potential to be applied to accurate identification of IoT devices in a variety of IoT applications and services. Codes and data are available in IEEE Dataport.

* IEEE Internet of Things Journal, 2020
* Accepted for publication by IEEE IoTJ on August, 2020. Data and codes are hosted at: [1] Yongxin Liu, Jian Wang, Houbing Song, Shuteng Niu, Thomas Yang, "A 24-hour signal recording dataset with labels for cybersecurity and IoT", IEEE Dataport, 2020. [Online]. Available: http://dx.doi.org/10.21227/gt9v-kz32. Accessed: Aug. 27, 2020

Via

Access Paper or Ask Questions