Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yutao Chen

CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design

May 25, 2025

Hui Zhang, Dexiang Hong, Maoke Yang, Yutao Chen, Zhao Zhang, Jie Shao, Xinglong Wu, Zuxuan Wu, Yu-Gang Jiang

Abstract:Graphic design plays a vital role in visual communication across advertising, marketing, and multimedia entertainment. Prior work has explored automated graphic design generation using diffusion models, aiming to streamline creative workflows and democratize design capabilities. However, complex graphic design scenarios require accurately adhering to design intent specified by multiple heterogeneous user-provided elements (\eg images, layouts, and texts), which pose multi-condition control challenges for existing methods. Specifically, previous single-condition control models demonstrate effectiveness only within their specialized domains but fail to generalize to other conditions, while existing multi-condition methods often lack fine-grained control over each sub-condition and compromise overall compositional harmony. To address these limitations, we introduce CreatiDesign, a systematic solution for automated graphic design covering both model architecture and dataset construction. First, we design a unified multi-condition driven architecture that enables flexible and precise integration of heterogeneous design elements with minimal architectural modifications to the base diffusion model. Furthermore, to ensure that each condition precisely controls its designated image region and to avoid interference between conditions, we propose a multimodal attention mask mechanism. Additionally, we develop a fully automated pipeline for constructing graphic design datasets, and introduce a new dataset with 400K samples featuring multi-condition annotations, along with a comprehensive benchmark. Experimental results show that CreatiDesign outperforms existing models by a clear margin in faithfully adhering to user intent.

Via

Access Paper or Ask Questions

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Aug 21, 2023

Yutao Chen, Xingning Dong, Tian Gan, Chunluan Zhou, Ming Yang, Qingpei Guo

Figure 1 for EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Figure 2 for EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Figure 3 for EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Figure 4 for EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Abstract:Motivated by the superior performance of image diffusion models, more and more researchers strive to extend these models to the text-based video editing task. Nevertheless, current video editing tasks mainly suffer from the dilemma between the high fine-tuning cost and the limited generation capacity. Compared with images, we conjecture that videos necessitate more constraints to preserve the temporal consistency during editing. Towards this end, we propose EVE, a robust and efficient zero-shot video editing method. Under the guidance of depth maps and temporal consistency constraints, EVE derives satisfactory video editing results with an affordable computational and time cost. Moreover, recognizing the absence of a publicly available video editing dataset for fair comparisons, we construct a new benchmark ZVE-50 dataset. Through comprehensive experimentation, we validate that EVE could achieve a satisfactory trade-off between performance and efficiency. We will release our dataset and codebase to facilitate future researchers.

Via

Access Paper or Ask Questions

CelebHair: A New Large-Scale Dataset for Hairstyle Recommendation based on CelebA

Apr 14, 2021

Yutao Chen, Yuxuan Zhang, Zhongrui Huang, Zhenyao Luo, Jinpeng Chen

Figure 1 for CelebHair: A New Large-Scale Dataset for Hairstyle Recommendation based on CelebA

Figure 2 for CelebHair: A New Large-Scale Dataset for Hairstyle Recommendation based on CelebA

Figure 3 for CelebHair: A New Large-Scale Dataset for Hairstyle Recommendation based on CelebA

Figure 4 for CelebHair: A New Large-Scale Dataset for Hairstyle Recommendation based on CelebA

Abstract:In this paper, we present a new large-scale dataset for hairstyle recommendation, CelebHair, based on the celebrity facial attributes dataset, CelebA. Our dataset inherited the majority of facial images along with some beauty-related facial attributes from CelebA. Additionally, we employed facial landmark detection techniques to extract extra features such as nose length and pupillary distance, and deep convolutional neural networks for face shape and hairstyle classification. Empirical comparison has demonstrated the superiority of our dataset to other existing hairstyle-related datasets regarding variety, veracity, and volume. Analysis and experiments have been conducted on the dataset in order to evaluate its robustness and usability.

Via

Access Paper or Ask Questions

Different Approaches Towards Vertical Track Irregularity Prediction -- A Comparative Study

Dec 05, 2020

Yutao Chen, Yu Zhang, Fei Yang

Figure 1 for Different Approaches Towards Vertical Track Irregularity Prediction -- A Comparative Study

Figure 2 for Different Approaches Towards Vertical Track Irregularity Prediction -- A Comparative Study

Figure 3 for Different Approaches Towards Vertical Track Irregularity Prediction -- A Comparative Study

Figure 4 for Different Approaches Towards Vertical Track Irregularity Prediction -- A Comparative Study

Abstract:Railway systems require regular manual maintenance, a large part of which is dedicated to track deformation inspection. Such deformation might severely impact trains' runtime security, whereas such inspections remain costly as for both finance and manpower. Therefore, a more precise, efficient and automated approach to detect potential railway track deformation is in urgent needs. In this paper, we proposed an applicational framework for predicting vertical track irregularities. Our researches are based on large-scale real-world datasets produced by several operating railways in China. We explored several different sampling methods and compared traditional machine learning algorithms for time-series prediction with popular deep learning techniques. Different ensemble learning methods are also employed for further optimization. The conclusion is reached that neural networks turn out to be the most performant and accurate.

Via

Access Paper or Ask Questions