Abstract:We present Step-wise Policy for Rare-tool Knowledge (SPaRK), a novel reinforcement learning framework that teaches large language models to explore diverse tool usage patterns beyond conventional high-temperature sampling. Building on recent advances in step-wise reinforcement learning, we introduce a dual-objective reward system that simultaneously optimizes for answer quality and tool diversity, training a Llama-3.1 8B model through offline PPO on synthetically generated trajectories from the MMLU-Pro dataset. Our approach uniquely employs a rarity-first exploitation strategy where a GPT-4o judge scores candidate actions across eight distinct tools plus chain-of-thought reasoning, with the policy favoring less-frequently used but still viable tools to encourage systematic exploration. Empirical results demonstrate that SPaRK achieves competitive performance across 14 MMLU-Pro categories while exhibiting significantly higher entropy in tool selection compared to both baseline and supervised fine-tuning approaches, suggesting that algorithmic exploration through explicit tool diversity can enhance reasoning capabilities without sacrificing accuracy.
Abstract:We present a foundation modeling framework that leverages deep learning to uncover latent genetic signatures across the hematopoietic hierarchy. Our approach trains a fully connected autoencoder on multipotent progenitor cells, reducing over 20,000 gene features to a 256-dimensional latent space that captures predictive information for both progenitor and downstream differentiated cells such as monocytes and lymphocytes. We validate the quality of these embeddings by training feed-forward, transformer, and graph convolutional architectures for blood disease diagnosis tasks. We also explore zero-shot prediction using a progenitor disease state classification model to classify downstream cell conditions. Our models achieve greater than 95% accuracy for multi-class classification, and in the zero-shot setting, we achieve greater than 0.7 F1-score on the binary classification task. Future work should improve embeddings further to increase robustness on lymphocyte classification specifically.