Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hakyung Sung

Parser agreement and disagreement in L2 Korean UD: Implications for human-in-the-loop annotation

May 07, 2026

Hakyung Sung, Gyu-Ho Shin

Abstract:We propose a simplified human-in-the-loop workflow for second language (L2) Korean morphosyntactic annotation by leveraging agreement between two domain-adapted parsers. We first evaluate whether parser agreement can serve as a proxy for annotation correctness by comparing it with independent human judgments. The results show strong correspondence between parser and human judgments, supporting the feasibility of semi-automatic L2-Korean UD annotation. Further analysis demonstrates that parser disagreements cluster in linguistically predictable domains such as grammatical-relation distinctions and clause-boundary ambiguity. While many disagreement cases are tractable for iterative model refinement, others reflect deeper representational challenges inherent in parsing and tagging L2-Korean corpora.

* To be published in the 20th Linguistic Annotation Workshop

Via

Access Paper or Ask Questions

Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features

Jun 11, 2025

Hakyung Sung, Karla Csuros, Min-Chang Sung

Figure 1 for Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features

Figure 2 for Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features

Figure 3 for Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features

Figure 4 for Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features

Abstract:This study examines the lexical and syntactic interventions of human and LLM proofreading aimed at improving overall intelligibility in identical second language writings, and evaluates the consistency of outcomes across three LLMs (ChatGPT-4o, Llama3.1-8b, Deepseek-r1-8b). Findings show that both human and LLM proofreading enhance bigram lexical features, which may contribute to better coherence and contextual connectedness between adjacent words. However, LLM proofreading exhibits a more generative approach, extensively reworking vocabulary and sentence structures, such as employing more diverse and sophisticated vocabulary and incorporating a greater number of adjective modifiers in noun phrases. The proofreading outcomes are highly consistent in major lexical and syntactic features across the three models.

Via

Access Paper or Ask Questions

UD-KSL Treebank v1.3: A semi-automated framework for aligning XPOS-extracted units with UPOS tags

Jun 11, 2025

Hakyung Sung, Gyu-Ho Shin, Chanyoung Lee, You Kyung Sung, Boo Kyung Jung

Figure 1 for UD-KSL Treebank v1.3: A semi-automated framework for aligning XPOS-extracted units with UPOS tags

Figure 2 for UD-KSL Treebank v1.3: A semi-automated framework for aligning XPOS-extracted units with UPOS tags

Figure 3 for UD-KSL Treebank v1.3: A semi-automated framework for aligning XPOS-extracted units with UPOS tags

Figure 4 for UD-KSL Treebank v1.3: A semi-automated framework for aligning XPOS-extracted units with UPOS tags

Abstract:The present study extends recent work on Universal Dependencies annotations for second-language (L2) Korean by introducing a semi-automated framework that identifies morphosyntactic constructions from XPOS sequences and aligns those constructions with corresponding UPOS categories. We also broaden the existing L2-Korean corpus by annotating 2,998 new sentences from argumentative essays. To evaluate the impact of XPOS-UPOS alignments, we fine-tune L2-Korean morphosyntactic analysis models on datasets both with and without these alignments, using two NLP toolkits. Our results indicate that the aligned dataset not only improves consistency across annotation layers but also enhances morphosyntactic tagging and dependency-parsing accuracy, particularly in cases of limited annotated data.

Via

Access Paper or Ask Questions

Second language Korean Universal Dependency treebank v1.2: Focus on data augmentation and annotation scheme refinement

Mar 18, 2025

Hakyung Sung, Gyu-Ho Shin

Figure 1 for Second language Korean Universal Dependency treebank v1.2: Focus on data augmentation and annotation scheme refinement

Figure 2 for Second language Korean Universal Dependency treebank v1.2: Focus on data augmentation and annotation scheme refinement

Figure 3 for Second language Korean Universal Dependency treebank v1.2: Focus on data augmentation and annotation scheme refinement

Figure 4 for Second language Korean Universal Dependency treebank v1.2: Focus on data augmentation and annotation scheme refinement

Abstract:We expand the second language (L2) Korean Universal Dependencies (UD) treebank with 5,454 manually annotated sentences. The annotation guidelines are also revised to better align with the UD framework. Using this enhanced treebank, we fine-tune three Korean language models and evaluate their performance on in-domain and out-of-domain L2-Korean datasets. The results show that fine-tuning significantly improves their performance across various metrics, thus highlighting the importance of using well-tailored L2 datasets for fine-tuning first-language-based, general-purpose language models for the morphosyntactic analysis of L2 data.

Via

Access Paper or Ask Questions