Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anwesha Mohanty

The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance

Apr 14, 2025

Anwesha Mohanty, Venkatesh Balavadhani Parthasarathy, Arsalan Shahid

Figure 1 for The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance

Figure 2 for The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance

Figure 3 for The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance

Figure 4 for The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance

Abstract:Multimodal Large Language Models (MLLMs) are set to transform how machines process and generate human-like responses by integrating diverse modalities such as text, images, and code. Yet, effectively harnessing their capabilities hinges on optimal prompt engineering. We present a comprehensive experimental evaluation of seven prompt engineering methods applied to 13 open-source MLLMs over 24 tasks spanning Reasoning and Compositionality, Multimodal Understanding and Alignment, Complex Code Generation and Execution, and Knowledge Retrieval and Integration. Our approach stratifies models by parameter count into Small (<4B), Medium (4B-10B), and Large (>10B) categories and compares prompting techniques including Zero-Shot, One-Shot, Few-Shot, Chain-of-Thought, Analogical, Generated Knowledge, and Tree-of-Thought. While Large MLLMs excel in structured tasks such as code generation, achieving accuracies up to 96.88% under Few-Shot prompting, all models struggle with complex reasoning and abstract understanding, often yielding accuracies below 60% and high hallucination rates. Structured reasoning prompts frequently increased hallucination up to 75% in small models and led to longer response times (over 20 seconds in Large MLLMs), while simpler prompting methods provided more concise and efficient outputs. No single prompting method uniformly optimises all task types. Instead, adaptive strategies combining example-based guidance with selective structured reasoning are essential to enhance robustness, efficiency, and factual accuracy. Our findings offer practical recommendations for prompt engineering and support more reliable deployment of MLLMs across applications including AI-assisted coding, knowledge retrieval, and multimodal content understanding.

Via

Access Paper or Ask Questions

High Fidelity Synthetic Face Generation for Rosacea Skin Condition from Limited Data

Mar 08, 2023

Anwesha Mohanty, Alistair Sutherland, Marija Bezbradica, Hossein Javidnia

Figure 1 for High Fidelity Synthetic Face Generation for Rosacea Skin Condition from Limited Data

Figure 2 for High Fidelity Synthetic Face Generation for Rosacea Skin Condition from Limited Data

Figure 3 for High Fidelity Synthetic Face Generation for Rosacea Skin Condition from Limited Data

Figure 4 for High Fidelity Synthetic Face Generation for Rosacea Skin Condition from Limited Data

Abstract:Similar to the majority of deep learning applications, diagnosing skin diseases using computer vision and deep learning often requires a large volume of data. However, obtaining sufficient data for particular types of facial skin conditions can be difficult due to privacy concerns. As a result, conditions like Rosacea are often understudied in computer-aided diagnosis. The limited availability of data for facial skin conditions has led to the investigation of alternative methods for computer-aided diagnosis. In recent years, Generative Adversarial Networks (GANs), mainly variants of StyleGANs, have demonstrated promising results in generating synthetic facial images. In this study, for the first time, a small dataset of Rosacea with 300 full-face images is utilized to further investigate the possibility of generating synthetic data. The preliminary experiments show how fine-tuning the model and varying experimental settings significantly affect the fidelity of the Rosacea features. It is demonstrated that $R_1$ Regularization strength helps achieve high-fidelity details. Additionally, this study presents qualitative evaluations of synthetic/generated faces by expert dermatologists and non-specialist participants. The quantitative evaluation is presented using a few validation metric(s). Furthermore a number of limitations and future directions are discussed. Code and generated dataset are available at: \url{https://github.com/thinkercache/stylegan2-ada-pytorch}

Via

Access Paper or Ask Questions