Machine learning for wireless systems is commonly studied using standardized stochastic channel models (e.g., TDL/CDL/UMa) because of their legacy in wireless communication standardization and their ability to generate data at scale. However, some of their structural assumptions may diverge from real-world propagation. This paper asks when these models are sufficient and when ray-traced (RT) data - a proxy for the real world - provides tangible benefits. To answer these questions, we conduct an empirical study on two representative tasks: CSI compression and temporal channel prediction. Models are trained and evaluated using in-domain, cross-domain, and small-data fine-tuning protocols. Across settings, we observe that stochastic-only evaluation may over- or under-estimate performance relative to RT. These findings support a task-aware recipe where stochastic models can be leveraged for scalable pre-training and for tasks that do not rely on strong spatiotemporal coupling. When that coupling matters, pre-training and evaluation should be grounded in spatially consistent or geometrically similar RT scenarios. This study provides initial guidance to inform future discussions on benchmarking and standardization.