Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francesco Pappone

Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO

Sep 16, 2025

Francesco Pappone, Ruggero Marino Lazzaroni, Federico Califano, Niccolò Gentile, Roberto Marras

Abstract:While Large Language Models (LLMs) excel at generating human-like text, aligning their outputs with complex, qualitative goals like pedagogical soundness remains a significant challenge. Standard reinforcement learning techniques often rely on slow and expensive LLM-as-a-judge evaluations or on brittle, keyword-based metrics like ROUGE, which fail to capture the semantic essence of a high-quality explanation. In this work, we introduce a novel approach to reward shaping within the Group Relative Policy Optimisation (GRPO) framework. Our central contribution is the use of a small, efficient encoder-only transformer as a semantic reward model. This model provides a dense, semantically rich reward signal based on the cosine similarity between a generated explanation and a ground-truth reference, guiding the policy towards explanations that are not just factually correct but also structurally and conceptually aligned with expert reasoning. We apply this method to the task of training a model for the Italian medical-school entrance examinations, following standard domain-adaptive continued pre-training (CPT) and supervised fine-tuning (SFT). Our results demonstrate that GRPO with our proposed semantic reward significantly improves explanation faithfulness and clarity over a strong SFT baseline, showcasing the power of using lightweight encoder models for nuanced reward shaping in complex generation tasks

Via

Access Paper or Ask Questions

From Spectra to Geography: Intelligent Mapping of RRUFF Mineral Data

Nov 18, 2024

Francesco Pappone, Federico Califano, Marco Tafani

Figure 1 for From Spectra to Geography: Intelligent Mapping of RRUFF Mineral Data

Figure 2 for From Spectra to Geography: Intelligent Mapping of RRUFF Mineral Data

Figure 3 for From Spectra to Geography: Intelligent Mapping of RRUFF Mineral Data

Figure 4 for From Spectra to Geography: Intelligent Mapping of RRUFF Mineral Data

Abstract:Accurately determining the geographic origin of mineral samples is pivotal for applications in geology, mineralogy, and material science. Leveraging the comprehensive Raman spectral data from the RRUFF database, this study introduces a novel machine learning framework aimed at geolocating mineral specimens at the country level. We employ a one-dimensional ConvNeXt1D neural network architecture to classify mineral spectra based solely on their spectral signatures. The processed dataset comprises over 32,900 mineral samples, predominantly natural, spanning 101 countries. Through five-fold cross-validation, the ConvNeXt1D model achieved an impressive average classification accuracy of 93%, demonstrating its efficacy in capturing geospatial patterns inherent in Raman spectra.

Via

Access Paper or Ask Questions