Text-video retrieval, a prominent sub-field within the broader domain of multimedia content management, has witnessed remarkable growth and innovation over the past decade. However, existing methods assume the video scenes are consistent and the description annotators are unbiased. These limitations fail to align with fluid real-world scenarios, and descriptions can be influenced by annotator biases, diverse writing styles, and varying textual perspectives. To overcome the aforementioned problems, we introduce WAVER, a cross-domain knowledge distillation mechanism designed to tackle the challenge of handling writing-style agnostics. WAVER capitalizes on the open-vocabulary properties inherent in pre-trained vision-language models and employs an implicit knowledge distillation approach to transfer text-based knowledge from a teacher model to a vision-based student. Empirical studies conducted across four standard benchmark datasets, encompassing various settings, provide compelling evidence that \WAVER can achieve state-of-the-art performance in text-video retrieval tasks while handling writing-style variations.