Fraunhofer Institute for Applied Information Technology FIT, University Hospital of Cologne
Abstract:Despite significant advances in Reinforcement Learning (RL), model performance remains highly sensitive to algorithm and hyperparameter configurations, while generalization gaps across environments complicate real-world deployment. Although prior work has studied RL generalization, the relative contribution of specific configurations to the generalization gap has not been quantitatively decomposed and systematically leveraged for configuration selection. To address this limitation, we propose an explainable framework that evaluates RL performance across robotic environments using SHapley Additive exPlanations (SHAP) to quantify configuration impacts. We establish a theoretical foundation connecting Shapley values to generalizability, empirically analyze configuration impact patterns, and introduce SHAP-guided configuration selection to enhance generalization. Our results reveal distinct patterns across algorithms and hyperparameters, with consistent configuration impacts across diverse tasks and environments. By applying these insights to configuration selection, we achieve improved RL generalizability and provide actionable guidance for practitioners.
Abstract:Recent advances in reinforcement learning (RL) for large language model (LLM) fine-tuning show promise in addressing multi-objective tasks but still face significant challenges, including complex objective balancing, low training efficiency, poor scalability, and limited explainability. Leveraging ensemble learning principles, we introduce an Ensemble Multi-Objective RL (EMORL) framework that fine-tunes multiple models with individual objectives while optimizing their aggregation after the training to improve efficiency and flexibility. Our method is the first to aggregate the last hidden states of individual models, incorporating contextual information from multiple objectives. This approach is supported by a hierarchical grid search algorithm that identifies optimal weighted combinations. We evaluate EMORL on counselor reflection generation tasks, using text-scoring LLMs to evaluate the generations and provide rewards during RL fine-tuning. Through comprehensive experiments on the PAIR and Psych8k datasets, we demonstrate the advantages of EMORL against existing baselines: significantly lower and more stable training consumption ($17,529\pm 1,650$ data points and $6,573\pm 147.43$ seconds), improved scalability and explainability, and comparable performance across multiple objectives.




Abstract:The understanding of variations in genome sequences assists us in identifying people who are predisposed to common diseases, solving rare diseases, and finding the corresponding population group of the individuals from a larger population group. Although classical machine learning techniques allow researchers to identify groups (i.e. clusters) of related variables, the accuracy, and effectiveness of these methods diminish for large and high-dimensional datasets such as the whole human genome. On the other hand, deep neural network architectures (the core of deep learning) can better exploit large-scale datasets to build complex models. In this paper, we use the K-means clustering approach for scalable genomic data analysis aiming towards clustering genotypic variants at the population scale. Finally, we train a deep belief network (DBN) for predicting the geographic ethnicity. We used the genotype data from the 1000 Genomes Project, which covers the result of genome sequencing for 2504 individuals from 26 different ethnic origins and comprises 84 million variants. Our experimental results, with a focus on accuracy and scalability, show the effectiveness and superiority compared to the state-of-the-art.