Abstract:The integration of artificial intelligence in healthcare has opened new horizons for improving medical diagnostics and patient care. However, challenges persist in developing systems capable of generating accurate and contextually relevant radiology reports, particularly in low-resource languages. In this study, we present a comprehensive benchmark to evaluate the performance of instruction-tuned Vision-Language Models (VLMs) in the specialized task of radiology report generation across three low-resource languages: Italian, German, and Spanish. Employing the LLaVA architectural framework, we conducted a systematic evaluation of pre-trained models utilizing general datasets, domain-specific datasets, and low-resource language-specific datasets. In light of the unavailability of models that possess prior knowledge of both the medical domain and low-resource languages, we analyzed various adaptations to determine the most effective approach for these contexts. The results revealed that language-specific models substantially outperformed both general and domain-specific models in generating radiology reports, emphasizing the critical role of linguistic adaptation. Additionally, models fine-tuned with medical terminology exhibited enhanced performance across all languages compared to models with generic knowledge, highlighting the importance of domain-specific training. We also explored the influence of the temperature parameter on the coherence of report generation, providing insights for optimal model settings. Our findings highlight the importance of tailored language and domain-specific training for improving the quality and accuracy of radiological reports in multilingual settings. This research not only advances our understanding of VLMs adaptability in healthcare but also points to significant avenues for future investigations into model tuning and language-specific adaptations.
Abstract:Data scarcity remains a critical bottleneck impeding technological advancements across various domains, including but not limited to medicine and precision agriculture. To address this challenge, we explore the potential of Deep Generative Models (DGMs) in producing synthetic data that satisfies the Generative Learning Trilemma: fidelity, diversity, and sampling efficiency. However, recognizing that these criteria alone are insufficient for practical applications, we extend the trilemma to include utility, robustness, and privacy, factors crucial for ensuring the applicability of DGMs in real-world scenarios. Evaluating these metrics becomes particularly challenging in data-scarce environments, as DGMs traditionally rely on large datasets to perform optimally. This limitation is especially pronounced in domains like medicine and precision agriculture, where ensuring acceptable model performance under data constraints is vital. To address these challenges, we assess the Generative Learning Trilemma in data-scarcity settings using state-of-the-art evaluation metrics, comparing three prominent DGMs: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Diffusion Models (DMs). Furthermore, we propose a comprehensive framework to assess utility, robustness, and privacy in synthetic data generated by DGMs. Our findings demonstrate varying strengths among DGMs, with each model exhibiting unique advantages based on the application context. This study broadens the scope of the Generative Learning Trilemma, aligning it with real-world demands and providing actionable guidance for selecting DGMs tailored to specific applications.