Abstract:Evaluating text revision in scientific writing remains a challenge, as traditional metrics such as ROUGE and BERTScore primarily focus on similarity rather than capturing meaningful improvements. In this work, we analyse and identify the limitations of these metrics and explore alternative evaluation methods that better align with human judgments. We first conduct a manual annotation study to assess the quality of different revisions. Then, we investigate reference-free evaluation metrics from related NLP domains. Additionally, we examine LLM-as-a-judge approaches, analysing their ability to assess revisions with and without a gold reference. Our results show that LLMs effectively assess instruction-following but struggle with correctness, while domain-specific metrics provide complementary insights. We find that a hybrid approach combining LLM-as-a-judge evaluation and task-specific metrics offers the most reliable assessment of revision quality.
Abstract:Revision is a crucial step in scientific writing, where authors refine their work to improve clarity, structure, and academic quality. Existing approaches to automated writing assistance often focus on sentence-level revisions, which fail to capture the broader context needed for effective modification. In this paper, we explore the impact of shifting from sentence-level to paragraph-level scope for the task of scientific text revision. The paragraph level definition of the task allows for more meaningful changes, and is guided by detailed revision instructions rather than general ones. To support this task, we introduce ParaRev, the first dataset of revised scientific paragraphs with an evaluation subset manually annotated with revision instructions. Our experiments demonstrate that using detailed instructions significantly improves the quality of automated revisions compared to general approaches, no matter the model or the metric considered.
Abstract:Writing a scientific article is a challenging task as it is a highly codified genre. Good writing skills are essential to properly convey ideas and results of research work. Since the majority of scientific articles are currently written in English, this exercise is all the more difficult for non-native English speakers as they additionally have to face language issues. This article aims to provide an overview of text revision in writing assistance in the scientific domain. We will examine the specificities of scientific writing, including the format and conventions commonly used in research articles. Additionally, this overview will explore the various types of writing assistance tools available for text revision. Despite the evolution of the technology behind these tools through the years, from rule-based approaches to deep neural-based ones, challenges still exist (tools' accessibility, limited consideration of the context, inexplicit use of discursive information, etc.)