Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yupei Ren

Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and Method

May 17, 2025

Yupei Ren, Xinyi Zhou, Ning Zhang, Shangqing Zhao, Man Lan, Xiaopeng Bai

Figure 1 for Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and Method

Figure 2 for Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and Method

Figure 3 for Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and Method

Figure 4 for Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and Method

Abstract:Argument mining has garnered increasing attention over the years, with the recent advancement of Large Language Models (LLMs) further propelling this trend. However, current argument relations remain relatively simplistic and foundational, struggling to capture the full scope of argument information, particularly when it comes to representing complex argument structures in real-world scenarios. To address this limitation, we propose 14 fine-grained relation types from both vertical and horizontal dimensions, thereby capturing the intricate interplay between argument components for a thorough understanding of argument structure. On this basis, we conducted extensive experiments on three tasks: argument component detection, relation prediction, and automated essay grading. Additionally, we explored the impact of writing quality on argument component detection and relation prediction, as well as the connections between discourse relations and argumentative features. The findings highlight the importance of fine-grained argumentative annotations for argumentative writing quality assessment and encourage multi-dimensional argument analysis.

* Accepted to ACL 2025; 13 pages, 3 figures

Via

Access Paper or Ask Questions

Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation

Mar 20, 2025

Shangqing Zhao, Yuhao Zhou, Yupei Ren, Zhe Chen, Chenghao Jia, Fang Zhe, Zhaogaung Long, Shu Liu, Man Lan

Figure 1 for Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation

Figure 2 for Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation

Figure 3 for Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation

Figure 4 for Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation

Abstract:Ancient Chinese text processing presents unique challenges for large language models (LLMs) due to its distinct linguistic features, complex structural constraints, and rich cultural context. While existing benchmarks have primarily focused on evaluating comprehension through multiple-choice questions, there remains a critical gap in assessing models' generative capabilities in classical Chinese. We introduce F\`ux\`i, a comprehensive benchmark that evaluates both understanding and generation capabilities across 21 diverse tasks. Our benchmark distinguishes itself through three key contributions: (1) balanced coverage of both comprehension and generation tasks, including novel tasks like poetry composition and couplet completion, (2) specialized evaluation metrics designed specifically for classical Chinese text generation, combining rule-based verification with fine-tuned LLM evaluators, and (3) a systematic assessment framework that considers both linguistic accuracy and cultural authenticity. Through extensive evaluation of state-of-the-art LLMs, we reveal significant performance gaps between understanding and generation tasks, with models achieving promising results in comprehension but struggling considerably in generation tasks, particularly those requiring deep cultural knowledge and adherence to classical formats. Our findings highlight the current limitations in ancient Chinese text processing and provide insights for future model development. The benchmark, evaluation toolkit, and baseline results are publicly available to facilitate research in this domain.

* working in progress

Via

Access Paper or Ask Questions