Federated Learning (FL) enables decentralized model training across clients without sharing raw data, but its performance degrades under real-world data heterogeneity. Existing methods often fail to address distribution shift across clients and distribution drift over time, or they rely on unrealistic assumptions such as known number of client clusters and data heterogeneity types, which limits their generalizability. We introduce Feroma, a novel FL framework that explicitly handles both distribution shift and drift without relying on client or cluster identity. Feroma builds on client distribution profiles-compact, privacy-preserving representations of local data-that guide model aggregation and test-time model assignment through adaptive similarity-based weighting. This design allows Feroma to dynamically select aggregation strategies during training, ranging from clustered to personalized, and deploy suitable models to unseen, and unlabeled test clients without retraining, online adaptation, or prior knowledge on clients' data. Extensive experiments show that compared to 10 state-of-the-art methods, Feroma improves performance and stability under dynamic data heterogeneity conditions-an average accuracy gain of up to 12 percentage points over the best baselines across 6 benchmarks-while maintaining computational and communication overhead comparable to FedAvg. These results highlight that distribution-profile-based aggregation offers a practical path toward robust FL under both data distribution shifts and drifts.