Alert button
Picture for Zhitong Huang

Zhitong Huang

Alert button

Continuous Layout Editing of Single Images with Diffusion Models

Jun 22, 2023
Zhiyuan Zhang, Zhitong Huang, Jing Liao

Figure 1 for Continuous Layout Editing of Single Images with Diffusion Models
Figure 2 for Continuous Layout Editing of Single Images with Diffusion Models
Figure 3 for Continuous Layout Editing of Single Images with Diffusion Models
Figure 4 for Continuous Layout Editing of Single Images with Diffusion Models

Recent advancements in large-scale text-to-image diffusion models have enabled many applications in image editing. However, none of these methods have been able to edit the layout of single existing images. To address this gap, we propose the first framework for layout editing of a single image while preserving its visual properties, thus allowing for continuous editing on a single image. Our approach is achieved through two key modules. First, to preserve the characteristics of multiple objects within an image, we disentangle the concepts of different objects and embed them into separate textual tokens using a novel method called masked textual inversion. Next, we propose a training-free optimization method to perform layout control for a pre-trained diffusion model, which allows us to regenerate images with learned concepts and align them with user-specified layouts. As the first framework to edit the layout of existing images, we demonstrate that our method is effective and outperforms other baselines that were modified to support this task. Our code will be freely available for public use upon acceptance.

Viaarxiv icon

UniColor: A Unified Framework for Multi-Modal Colorization with Transformer

Sep 22, 2022
Zhitong Huang, Nanxuan Zhao, Jing Liao

Figure 1 for UniColor: A Unified Framework for Multi-Modal Colorization with Transformer
Figure 2 for UniColor: A Unified Framework for Multi-Modal Colorization with Transformer
Figure 3 for UniColor: A Unified Framework for Multi-Modal Colorization with Transformer
Figure 4 for UniColor: A Unified Framework for Multi-Modal Colorization with Transformer

We propose the first unified framework UniColor to support colorization in multiple modalities, including both unconditional and conditional ones, such as stroke, exemplar, text, and even a mix of them. Rather than learning a separate model for each type of condition, we introduce a two-stage colorization framework for incorporating various conditions into a single model. In the first stage, multi-modal conditions are converted into a common representation of hint points. Particularly, we propose a novel CLIP-based method to convert the text to hint points. In the second stage, we propose a Transformer-based network composed of Chroma-VQGAN and Hybrid-Transformer to generate diverse and high-quality colorization results conditioned on hint points. Both qualitative and quantitative comparisons demonstrate that our method outperforms state-of-the-art methods in every control modality and further enables multi-modal colorization that was not feasible before. Moreover, we design an interactive interface showing the effectiveness of our unified framework in practical usage, including automatic colorization, hybrid-control colorization, local recolorization, and iterative color editing. Our code and models are available at https://luckyhzt.github.io/unicolor.

* Accepted by SIGGRAPH Asia 2022. Project page: https://luckyhzt.github.io/unicolor 
Viaarxiv icon

A Survey and Framework of Cooperative Perception: From Heterogeneous Singleton to Hierarchical Cooperation

Aug 22, 2022
Zhengwei Bai, Guoyuan Wu, Matthew J. Barth, Yongkang Liu, Emrah Akin Sisbot, Kentaro Oguchi, Zhitong Huang

Figure 1 for A Survey and Framework of Cooperative Perception: From Heterogeneous Singleton to Hierarchical Cooperation
Figure 2 for A Survey and Framework of Cooperative Perception: From Heterogeneous Singleton to Hierarchical Cooperation
Figure 3 for A Survey and Framework of Cooperative Perception: From Heterogeneous Singleton to Hierarchical Cooperation
Figure 4 for A Survey and Framework of Cooperative Perception: From Heterogeneous Singleton to Hierarchical Cooperation

Perceiving the environment is one of the most fundamental keys to enabling Cooperative Driving Automation (CDA), which is regarded as the revolutionary solution to addressing the safety, mobility, and sustainability issues of contemporary transportation systems. Although an unprecedented evolution is now happening in the area of computer vision for object perception, state-of-the-art perception methods are still struggling with sophisticated real-world traffic environments due to the inevitably physical occlusion and limited receptive field of single-vehicle systems. Based on multiple spatially separated perception nodes, Cooperative Perception (CP) is born to unlock the bottleneck of perception for driving automation. In this paper, we comprehensively review and analyze the research progress on CP and, to the best of our knowledge, this is the first time to propose a unified CP framework. Architectures and taxonomy of CP systems based on different types of sensors are reviewed to show a high-level description of the workflow and different structures for CP systems. Node structure, sensor modality, and fusion schemes are reviewed and analyzed with comprehensive literature to provide detailed explanations of specific methods. A Hierarchical CP framework is proposed, followed by a review of existing Datasets and Simulators to sketch an overall landscape of CP. Discussion highlights the current opportunities, open challenges, and anticipated future trends.

* Under Review. arXiv admin note: text overlap with arXiv:2201.11871 
Viaarxiv icon