Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nick Williams

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders

May 13, 2026

William Lehn-Schiøler, Magnus Ruud Kjær, Rahul Thapa, Magnus Guldberg Pedersen, Anton Storgaard Mosquera, Nick Williams, Radu Gatej, Tue Lehn-Schiøler, Sándor Beniczky, Sadasivan Puthusserypady(+2 more)

Abstract:EEG foundation models achieve state-of-the-art clinical performance, yet the internal computations driving their predictions remain opaque: a barrier to clinical trust. We apply TopK Sparse Autoencoders (SAEs) across three architecturally distinct EEG transformers: SleepFM, REVE, and LaBraM to extract sparse feature dictionaries from their embeddings. By grounding these features in a clinical taxonomy (abnormality, age, sex, and medication), we benchmark monosemanticity and entanglement across architectures. A single hyperparameter procedure, driven by an intrinsic dictionary health audit, transfers robustly across all three architectures. Via concept steering, we introduce a "target vs. off-target" probe area metric to quantify steering selectivity and reveal three operational regimes: selectively steerable, encoded but entangled, and non-encoded. This framework exposes critical representational failures: "wrecking-ball" interventions that collapse global model performance, and clinical entanglements, such as age-pathology confounding, where it is impossible to suppress one concept without corrupting the other. Finally, a spectral decoder maps these interventions back to the amplitude spectrum, translating latent manipulations into physiologically interpretable frequency signatures, such as pathological slow-wave suppression and $α$-band restoration.

* Preprint. 14 pages, 7 figures, 4 tables

Via

Access Paper or Ask Questions

Too sick for surveillance: Can federal HIV service data improve federal HIV surveillance efforts?

Apr 20, 2023

Nick Williams

Figure 1 for Too sick for surveillance: Can federal HIV service data improve federal HIV surveillance efforts?

Figure 2 for Too sick for surveillance: Can federal HIV service data improve federal HIV surveillance efforts?

Figure 3 for Too sick for surveillance: Can federal HIV service data improve federal HIV surveillance efforts?

Figure 4 for Too sick for surveillance: Can federal HIV service data improve federal HIV surveillance efforts?

Abstract:Introduction: The value of integrating federal HIV services data with HIV surveillance is currently unknown. Upstream and complete case capture is essential in preventing future HIV transmission. Methods: This study integrated Ryan White, Social Security Disability Insurance, Medicare, Children Health Insurance Programs and Medicaid demographic aggregates from 2005 to 2018 for people living with HIV and compared them with Centers for Disease Control and Prevention HIV surveillance by demographic aggregate. Surveillance Unknown, Service Known (SUSK) candidate aggregates were identified from aggregates where services aggregate volumes exceeded surveillance aggregate volumes. A distribution approach and a deep learning model series were used to identify SUSK candidate aggregates where surveillance cases exceeded services cases in aggregate. Results: Medicare had the most candidate SUSK aggregates. Medicaid may have candidate SUSK aggregates where cases approach parity with surveillance. Deep learning was able to detect candidate SUSK aggregates even where surveillance cases exceed service cases. Conclusions: Integration of CMS case level records with HIV surveillance records can increase case discovery and life course model quality; especially for cases who die after seeking HIV services but before they become surveillance cases. The ethical implications for both the availability and reuse of clinical HIV Data without the knowledge and consent of the persons described remains an opportunity for the development of big data ethics in public health research. Future work should develop big data ethics to support researchers and assure their subjects that information which describes them is not misused.

* 17 pages, 3 figures

Via

Access Paper or Ask Questions

Conclusions from a NAIVE Bayes Operator Predicting the Medicare 2011 Transaction Data Set

Feb 20, 2014

Nick Williams

Abstract:Introduction: The United States Federal Government operates one of the worlds largest medical insurance programs, Medicare, to ensure payment for clinical services for the elderly, illegal aliens and those without the ability to pay for their care directly. This paper evaluates the Medicare 2011 Transaction Data Set which details the transfer of funds from Medicare to private and public clinical care facilities for specific clinical services for the operational year 2011. Methods: Data mining was conducted to establish the relationships between reported and computed transaction values in the data set to better understand the drivers of Medicare transactions at a programmatic level. Results: The models averaged 88 for average model accuracy and 38 for average Kappa during training. Some reported classes are highly independent from the available data as their predictability remains stable regardless of redaction of supporting and contradictory evidence. DRG or procedure type appears to be unpredictable from the available financial transaction values. Conclusions: Overlay hypotheses such as charges being driven by the volume served or DRG being related to charges or payments is readily false in this analysis despite 28 million Americans being billed through Medicare in 2011 and the program distributing over 70 billion in this transaction set alone. It may be impossible to predict the dependencies and data structures the payer of last resort without data from payers of first and second resort. Political concerns about Medicare would be better served focusing on these first and second order payer systems as what Medicare costs is not dependent on Medicare itself.

* 8 Pages, 7 figures

Via

Access Paper or Ask Questions