As a mental disorder progresses, it may affect brain structure, but brain function expressed in brain dynamics is affected much earlier. Capturing the moment when brain dynamics express the disorder is crucial for early diagnosis. The traditional approach to this problem via training classifiers either proceeds from handcrafted features or requires large datasets to combat the $m>>n$ problem when a high dimensional fMRI volume only has a single label that carries learning signal. Large datasets may not be available for a study of each disorder, or rare disorder types or sub-populations may not warrant for them. In this paper, we demonstrate a self-supervised pre-training method that enables us to pre-train directly on fMRI dynamics of healthy control subjects and transfer the learning to much smaller datasets of schizophrenia. Not only we enable classification of disorder directly based on fMRI dynamics in small data but also significantly speed up the learning when possible. This is encouraging evidence of informative transfer learning across datasets and diagnostic categories.