Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Khonzoda Umarova

DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling

Feb 12, 2026

Mariia Fedorova, Andrey Kutuzov, Khonzoda Umarova

Abstract:In this resource paper, we present DHPLT, an open collection of diachronic corpora in 41 diverse languages. DHPLT is based on the web-crawled HPLT datasets; we use web crawl timestamps as the approximate signal of document creation time. The collection covers three time periods: 2011-2015, 2020-2021 and 2024-present (1 million documents per time period for each language). We additionally provide pre-computed word type and token embeddings and lexical substitutions for our chosen target words, while at the same time leaving it open for the other researchers to come up with their own target words using the same datasets. DHPLT aims at filling in the current lack of multilingual diachronic corpora for semantic change modelling (beyond a dozen of high-resource languages). It opens the way for a variety of new experimental setups in this field. All the resources described in this paper are available at https://data.hplt-project.org/three/diachronic/, sorted by language.

* LChange'26 workshop at the EACL 2026 conference

Via

Access Paper or Ask Questions

How Problematic Writer-AI Interactions (Rather than Problematic AI) Hinder Writers' Idea Generation

Mar 14, 2025

Khonzoda Umarova, Talia Wise, Zhuoer Lyu, Mina Lee, Qian Yang

Figure 1 for How Problematic Writer-AI Interactions (Rather than Problematic AI) Hinder Writers' Idea Generation

Figure 2 for How Problematic Writer-AI Interactions (Rather than Problematic AI) Hinder Writers' Idea Generation

Figure 3 for How Problematic Writer-AI Interactions (Rather than Problematic AI) Hinder Writers' Idea Generation

Figure 4 for How Problematic Writer-AI Interactions (Rather than Problematic AI) Hinder Writers' Idea Generation

Abstract:Writing about a subject enriches writers' understanding of that subject. This cognitive benefit of writing -- known as constructive learning -- is essential to how students learn in various disciplines. However, does this benefit persist when students write with generative AI writing assistants? Prior research suggests the answer varies based on the type of AI, e.g., auto-complete systems tend to hinder ideation, while assistants that pose Socratic questions facilitate it. This paper adds an additional perspective. Through a case study, we demonstrate that the impact of genAI on students' idea development depends not only on the AI but also on the students and, crucially, their interactions in between. Students who proactively explored ideas gained new ideas from writing, regardless of whether they used auto-complete or Socratic AI assistants. Those who engaged in prolonged, mindless copyediting developed few ideas even with a Socratic AI. These findings suggest opportunities in designing AI writing assistants, not merely by creating more thought-provoking AI, but also by fostering more thought-provoking writer-AI interactions.

Via

Access Paper or Ask Questions