Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhehan Zhang

Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique

Feb 09, 2026

Zhehan Zhang, Meihua Qian, Li Luo, Siyu Huang, Chaoyi Zhou, Ripon Saha, Xinxin Song

Abstract:Assessing artistic creativity is foundational to creativity research and arts education, yet manual scoring (e.g., Torrance Tests of Creative Thinking) is labor-intensive at scale. Prior machine-learning approaches show promise for visual creativity scoring, but many rely mainly on image features and provide limited or no explanatory feedback. We propose a framework for automated creativity assessment of human paintings by fine-tuning the vision-language model Qwen2-VL-7B with multi-task learning. Our dataset contains 1000 human-created paintings scored on a 1-100 scale and paired with a short human-written description (content or artist explanation). Two expert raters evaluated each work using a five-dimension rubric (originality, color, texture, composition, content) and provided written critiques; we use an 80/20 train-test split. We add a lightweight regression head on the visual encoder output so the model can predict a numerical score and generate rubric-aligned feedback in a single forward pass. By embedding the structured rubric and the artwork description in the system prompt, we constrain the generated text to match the quantitative prediction. Experiments show strong accuracy, achieving Pearson r > 0.97 and MAE about 3.95 on the 100-point scale. Qualitative evaluation indicates the generated feedback is semantically close to expert critiques (average SBERT cosine similarity = 0.798). The proposed approach bridges computer vision and art assessment and offers a scalable tool for creativity research and classroom feedback.

Via

Access Paper or Ask Questions

Using a CNN Model to Assess Visual Artwork's Creativity

Aug 02, 2024

Zhehan Zhang, Meihua Qian, Li Luo, Ripon Saha, Qianyi Gao, Xinxin Song

Figure 1 for Using a CNN Model to Assess Visual Artwork's Creativity

Figure 2 for Using a CNN Model to Assess Visual Artwork's Creativity

Figure 3 for Using a CNN Model to Assess Visual Artwork's Creativity

Figure 4 for Using a CNN Model to Assess Visual Artwork's Creativity

Abstract:Assessing artistic creativity has long challenged researchers, with traditional methods proving time-consuming. Recent studies have applied machine learning to evaluate creativity in drawings, but not paintings. Our research addresses this gap by developing a CNN model to automatically assess the creativity of students' paintings. Using a dataset of 600 paintings by professionals and children, our model achieved 90% accuracy and faster evaluation times than human raters. This approach demonstrates the potential of machine learning in advancing artistic creativity assessment, offering a more efficient alternative to traditional methods.

Via

Access Paper or Ask Questions

A comparative study of deep learning methods for building footprints detection using high spatial resolution aerial images

Mar 16, 2021

Hongjie He, Ke Yang, Yuwei Cai, Zijian Jiang, Qiutong Yu, Kun Zhao, Junbo Wang, Sarah Narges Fatholahi, Yan Liu, Hasti Andon Petrosians(+7 more)

Figure 1 for A comparative study of deep learning methods for building footprints detection using high spatial resolution aerial images

Figure 2 for A comparative study of deep learning methods for building footprints detection using high spatial resolution aerial images

Figure 3 for A comparative study of deep learning methods for building footprints detection using high spatial resolution aerial images

Abstract:Building footprints data is of importance in several urban applications and natural disaster management. In contrast to traditional surveying and mapping, using high spatial resolution aerial images, deep learning-based building footprints extraction methods can extract building footprints accurately and efficiently. With rapidly development of deep learning methods, it is hard for novice to harness the powerful tools in building footprints extraction. The paper aims at providing the whole process of building footprints extraction from high spatial resolution images using deep learning-based methods. In addition, we also compare the commonly used methods, including Fully Convolutional Networks (FCN)-8s, U-Net and DeepLabv3+. At the end of the work, we change the data size used in models training to explore the influence of data size to the performance of the algorithms. The experiments show that, in different data size, DeepLabv3+ is the best algorithm among them with the highest accuracy and moderate efficiency; FCN-8s has the worst accuracy and highest efficiency; U-Net shows the moderate accuracy and lowest efficiency. In addition, with more training data, algorithms converged faster with higher accuracy in extraction results.

Via

Access Paper or Ask Questions