Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tiancheng Jiang

From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models

Feb 12, 2026

Parmida Atighehchian, Henry Wang, Andrei Kapustin, Boris Lerner, Tiancheng Jiang, Taylor Jensen, Negin Sokhandan

Abstract:Text-to-image models have made significant strides, producing impressive results in generating images from textual descriptions. However, creating a scalable pipeline for deploying these models in production remains a challenge. Achieving the right balance between automation and human feedback is critical to maintain both scale and quality. While automation can handle large volumes, human oversight is still an essential component to ensure that the generated images meet the desired standards and are aligned with the creative vision. This paper presents a new pipeline that offers a fully automated, scalable solution for generating marketing images of commercial products using text-to-image models. The proposed system maintains the quality and fidelity of images, while also introducing sufficient creative variation to adhere to marketing guidelines. By streamlining this process, we ensure a seamless blend of efficiency and human oversight, achieving a $30.77\%$ increase in marketing object fidelity using DINOV2 and a $52.00\%$ increase in human preference over the generated outcome.

* 17 pages, 12 figures, Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026

Via

Access Paper or Ask Questions

Domain Adaptation of VLM for Soccer Video Understanding

May 20, 2025

Tiancheng Jiang, Henry Wang, Md Sirajus Salekin, Parmida Atighehchian, Shinan Zhang

Figure 1 for Domain Adaptation of VLM for Soccer Video Understanding

Figure 2 for Domain Adaptation of VLM for Soccer Video Understanding

Figure 3 for Domain Adaptation of VLM for Soccer Video Understanding

Figure 4 for Domain Adaptation of VLM for Soccer Video Understanding

Abstract:Vision Language Models (VLMs) have demonstrated strong performance in multi-modal tasks by effectively aligning visual and textual representations. However, most video understanding VLM research has been domain-agnostic, leaving the understanding of their transfer learning capability to specialized domains under-explored. In this work, we address this by exploring the adaptability of open-source VLMs to specific domains, and focusing on soccer as an initial case study. Our approach uses large-scale soccer datasets and LLM to create instruction-following data, and use them to iteratively fine-tune the general-domain VLM in a curriculum learning fashion (first teaching the model key soccer concepts to then question answering tasks). The final adapted model, trained using a curated dataset of 20k video clips, exhibits significant improvement in soccer-specific tasks compared to the base model, with a 37.5% relative improvement for the visual question-answering task and an accuracy improvement from 11.8% to 63.5% for the downstream soccer action classification task.

* 8 pages, 5 figures, accepted to the 11th IEEE International Workshop on Computer Vision in Sports (CVSports) at CVPR 2025; supplementary appendix included as ancillary PDF

Via

Access Paper or Ask Questions