Abstract:This discussion paper reflects on how quantitative approaches to historical linguistics interact with dataset properties. Drawing on two worked examples, we examine English data using quad-based concept modelling of Early Modern English discourse in EEBO-TCP (c. 1470s-1690s; 765M words) alongside SynFlow analysis of scientific writing in Royal Society Corpus 6.0.4 (1750-1799; drawn from a 78.6M-token open corpus). Through parallel comparison, the paper explores how each approach operationalises concepts, the data assumptions they entail, and the diachronic interpretations they support. We argue that comparative methodological reflection clarifies the limits of purely lexical, frequency-based approaches and highlights how dataset structure shapes the kinds of semantic change that quantitative methods can reliably detect.
Abstract:This discussion paper re-examines SemEval-2020 Task 1, the most influential shared benchmark for lexical semantic change detection, through a three-part evaluative framework: operationalisation, data quality, and benchmark design. First, at the level of operationalisation, we argue that the benchmark models semantic change mainly as gain, loss, or redistribution of discrete senses. While practical for annotation and evaluation, this framing is too narrow to capture gradual, constructional, collocational, and discourse-level change. Also, the gold labels are outcomes of annotation decisions, clustering procedures, and threshold settings, which could potentially limit the validity of the task. Second, at the level of data quality, we show that the benchmark is affected by substantial corpus and preprocessing problems, including OCR noise, malformed characters, truncated sentences, inconsistent lemmatisation, POS-tagging errors, and missed targets. These issues can distort model behaviour, complicate linguistic analysis, and reduce reproducibility. Third, at the level of bench-mark design, we argue the small curated target sets and limited language coverage reduce realism and increase statistical uncertainty. Taken together, these limitations suggest that the benchmark should be treated as a useful but partial test bed rather than a definitive measure of progress. We therefore call for future datasets and shared tasks to adopt broader theories of semantic change, document pre-processing transparently, expand cross-linguistic coverage, and use more realistic evaluation settings. Such steps are necessary for more valid, interpretable, and generalisable progress in lexical semantic change detection
Abstract:The majority of contemporary computational methods for lexical semantic change (LSC) detection are based on neural embedding distributional representations. Although these models perform well on LSC benchmarks, their results are often difficult to interpret. We explore an alternative approach that relies solely on frame semantics. We show that this method is effective for detecting semantic change and can even outperform many distributional semantic models. Finally, we present a detailed quantitative and qualitative analysis of its predictions, demonstrating that they are both plausible and highly interpretable
Abstract:Most modern computational approaches to lexical semantic change detection (LSC) rely on embedding-based distributional word representations with neural networks. Despite the strong performance on LSC benchmarks, they are often opaque. We investigate an alternative method which relies purely on dependency co-occurrence patterns of words. We demonstrate that it is effective for semantic change detection and even outperforms a number of distributional semantic models. We provide an in-depth quantitative and qualitative analysis of the predictions, showing that they are plausible and interpretable.