Abstract:While AI-assisted writing has been widely reported to improve essay quality, its impact on the structural diversity of student thinking remains unexplored. Analyzing 6,875 essays across five conditions (Human-only, AI-only, and three Human+AI prompt strategies), we provide the first empirical evidence of a Quality-Homogenization Tradeoff, in which substantial quality gains co-occur with significant homogenization. The effect is dimension-specific: cohesion architecture lost 70-78% of its variance, whereas perspective plurality was diversified. Convergence target analysis further revealed that AI-augmented essays were pulled toward AI structural patterns yet deviated significantly from the Human-AI axis, indicating simultaneous partial replacement and partial emergence. Crucially, prompt specificity reversed homogenization into diversification on argument depth, demonstrating that homogenization is not an intrinsic property of AI but a function of interaction design.
Abstract:As Large Language Models (LLMs) have become capable of effortlessly generating high-quality text, traditional quality-focused writing assessment is losing its significance. If the essential goal of education is to foster critical thinking and original perspectives, assessment must also shift its paradigm from quality to originality. This study proposes Argument Rarity-based Originality Assessment (AROA), a framework for automatically evaluating argumentative originality in student essays. AROA defines originality as rarity within a reference corpus and evaluates it through four complementary components: structural rarity, claim rarity, evidence rarity, and cognitive depth. The framework quantifies the rarity of each component using density estimation and integrates them with a quality adjustment mechanism, thereby treating quality and originality as independent evaluation axes. Experiments using human essays and AI-generated essays revealed a strong negative correlation between quality and claim rarity, demonstrating a quality-originality trade-off where higher-quality texts tend to rely on typical claim patterns. Furthermore, while AI essays achieved comparable levels of structural complexity to human essays, their claim rarity was substantially lower than that of humans, indicating that LLMs can reproduce the form of argumentation but have limitations in the originality of content.