Abstract:The application of artificial intelligence (AI) in IVF has shown promise in improving consistency and standardization of decisions, but often relies on annotated data and does not make use of the multimodal nature of IVF data. We investigated whether foundational vision-language models can be fine-tuned to predict natural language descriptions of embryo morphology and development. Using a publicly available embryo time-lapse dataset, we fine-tuned PaliGemma-2, a multi-modal vision-language model, with only 1,000 images and corresponding captions, describing embryo morphology, embryonic cell cycle and developmental stage. Our results show that the fine-tuned model, InVitroVision, outperformed a commercial model, ChatGPT 5.2, and base models in overall metrics, with performance improving with larger training datasets. This study demonstrates the potential of foundational vision-language models to generalize to IVF tasks with limited data, enabling the prediction of natural language descriptions of embryo morphology and development. This approach may facilitate the use of large language models to retrieve information and scientific evidence from relevant publications and guidelines, and has implications for few-shot adaptation to multiple downstream tasks in IVF.
Abstract:Reliable evaluation of blastocyst quality is critical for the success of in vitro fertilization (IVF) treatments. Current embryo grading practices primarily rely on visual assessment of morphological features, which introduces subjectivity, inter-embryologist variability, and challenges in standardizing quality assurance. In this study, we propose a multitask embedding-based approach for the automated analysis and prediction of key blastocyst components, including the trophectoderm (TE), inner cell mass (ICM), and blastocyst expansion (EXP). The method leverages biological and physical characteristics extracted from images of day-5 human embryos. A pretrained ResNet-18 architecture, enhanced with an embedding layer, is employed to learn discriminative representations from a limited dataset and to automatically identify TE and ICM regions along with their corresponding grades, structures that are visually similar and inherently difficult to distinguish. Experimental results demonstrate the promise of the multitask embedding approach and potential for robust and consistent blastocyst quality assessment.