Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bohao Wu

Explain Less, Understand More: Jargon Detection via Personalized Parameter-Efficient Fine-tuning

May 22, 2025

Bohao Wu, Qingyun Wang, Yue Guo

Figure 1 for Explain Less, Understand More: Jargon Detection via Personalized Parameter-Efficient Fine-tuning

Figure 2 for Explain Less, Understand More: Jargon Detection via Personalized Parameter-Efficient Fine-tuning

Figure 3 for Explain Less, Understand More: Jargon Detection via Personalized Parameter-Efficient Fine-tuning

Figure 4 for Explain Less, Understand More: Jargon Detection via Personalized Parameter-Efficient Fine-tuning

Abstract:Personalizing jargon detection and explanation is essential for making technical documents accessible to readers with diverse disciplinary backgrounds. However, tailoring models to individual users typically requires substantial annotation efforts and computational resources due to user-specific finetuning. To address this, we present a systematic study of personalized jargon detection, focusing on methods that are both efficient and scalable for real-world deployment. We explore two personalization strategies: (1) lightweight fine-tuning using Low-Rank Adaptation (LoRA) on open-source models, and (2) personalized prompting, which tailors model behavior at inference time without retaining. To reflect realistic constraints, we also investigate hybrid approaches that combine limited annotated data with unsupervised user background signals. Our personalized LoRA model outperforms GPT-4 by 21.4% in F1 score and exceeds the best performing oracle baseline by 8.3%. Remarkably, our method achieves comparable performance using only 10% of the annotated training data, demonstrating its practicality for resource-constrained settings. Our study offers the first work to systematically explore efficient, low-resource personalization of jargon detection using open-source language models, offering a practical path toward scalable, user-adaptive NLP system.

Via

Access Paper or Ask Questions

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Nov 21, 2024

Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D'arcy(+15 more)

Figure 1 for OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Figure 2 for OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Figure 3 for OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Figure 4 for OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Abstract:Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we develop ScholarQABench, the first large-scale multi-domain benchmark for literature search, comprising 2,967 expert-written queries and 208 long-form answers across computer science, physics, neuroscience, and biomedicine. On ScholarQABench, OpenScholar-8B outperforms GPT-4o by 5% and PaperQA2 by 7% in correctness, despite being a smaller, open model. While GPT4o hallucinates citations 78 to 90% of the time, OpenScholar achieves citation accuracy on par with human experts. OpenScholar's datastore, retriever, and self-feedback inference loop also improves off-the-shelf LMs: for instance, OpenScholar-GPT4o improves GPT-4o's correctness by 12%. In human evaluations, experts preferred OpenScholar-8B and OpenScholar-GPT4o responses over expert-written ones 51% and 70% of the time, respectively, compared to GPT4o's 32%. We open-source all of our code, models, datastore, data and a public demo.

Via

Access Paper or Ask Questions

SciCode: A Research Coding Benchmark Curated by Scientists

Jul 18, 2024

Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li(+20 more)

Figure 1 for SciCode: A Research Coding Benchmark Curated by Scientists

Figure 2 for SciCode: A Research Coding Benchmark Curated by Scientists

Figure 3 for SciCode: A Research Coding Benchmark Curated by Scientists

Figure 4 for SciCode: A Research Coding Benchmark Curated by Scientists

Abstract:Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields, including mathematics, physics, chemistry, biology, and materials science, we created a scientist-curated coding benchmark, SciCode. The problems in SciCode naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems. It offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only 4.6% of the problems in the most realistic setting. We believe that SciCode demonstrates both contemporary LMs' progress towards becoming helpful scientific assistants and sheds light on the development and evaluation of scientific AI in the future.

* 25 pages, 9 figures, 7 tables

Via

Access Paper or Ask Questions