Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haihua Pan

Learning Constituent Headedness

Mar 16, 2026

Zeyao Qi, Yige Chen, KyungTae Lim, Haihua Pan, Jungyeul Park

Abstract:Headedness is widely used as an organizing device in syntactic analysis, yet constituency treebanks rarely encode it explicitly and most processing pipelines recover it procedurally via percolation rules. We treat this notion of constituent headedness as an explicit representational layer and learn it as a supervised prediction task over aligned constituency and dependency annotations, inducing supervision by defining each constituent head as the dependency span head. On aligned English and Chinese data, the resulting models achieve near-ceiling intrinsic accuracy and substantially outperform Collins-style rule-based percolation. Predicted heads yield comparable parsing accuracy under head-driven binarization, consistent with the induced binary training targets being largely equivalent across head choices, while increasing the fidelity of deterministic constituency-to-dependency conversion and transferring across resources and languages under simple label-mapping interfaces.

Via

Access Paper or Ask Questions

Parsing Through Boundaries in Chinese Word Segmentation

Mar 29, 2025

Yige Chen, Zelong Li, Changbing Yang, Cindy Zhang, Amandisa Cady, Ai Ka Lee, Zejiao Zeng, Haihua Pan, Jungyeul Park

Figure 1 for Parsing Through Boundaries in Chinese Word Segmentation

Figure 2 for Parsing Through Boundaries in Chinese Word Segmentation

Abstract:Chinese word segmentation is a foundational task in natural language processing (NLP), with far-reaching effects on syntactic analysis. Unlike alphabetic languages like English, Chinese lacks explicit word boundaries, making segmentation both necessary and inherently ambiguous. This study highlights the intricate relationship between word segmentation and syntactic parsing, providing a clearer understanding of how different segmentation strategies shape dependency structures in Chinese. Focusing on the Chinese GSD treebank, we analyze multiple word boundary schemes, each reflecting distinct linguistic and computational assumptions, and examine how they influence the resulting syntactic structures. To support detailed comparison, we introduce an interactive web-based visualization tool that displays parsing outcomes across segmentation methods.

* Submitted to ACL2025 System Demonstration

Via

Access Paper or Ask Questions