Foundation models refer to architectures trained on vast datasets using autoregressive pre-training from natural language processing to capture intricate patterns and motifs. They were originally developed to transfer such learned knowledge to downstream predictive tasks. Recently, however, some studies repurpose these learned representations for phenotype discovery without rigorous validation, risking superficially realistic but clinically incoherent embeddings. To test this mismatch, we trained two autoregressive models -- a sequence-to-sequence LSTM and a reduced Transformer -- on longitudinal ART for HIV and Acute Hypotension datasets. Controlled irregularity was added during training via random inter-visit gaps, while test sequences stayed complete. Patient-trajectory synthesis evaluated distributional and correlational fidelity. Both reproduced feature distributions but failed to preserve cross-feature structure -- showing that generative pre-training yields local realism but limited clinical coherence. These results highlight the need for domain-specific evaluation and support trajectory synthesis as a practical probe before fine-tuning or deployment.