Alert button
Picture for Eli Shechtman

Eli Shechtman

Alert button

Perceptual Artifacts Localization for Image Synthesis Tasks

Oct 09, 2023
Lingzhi Zhang, Zhengjie Xu, Connelly Barnes, Yuqian Zhou, Qing Liu, He Zhang, Sohrab Amirghodsi, Zhe Lin, Eli Shechtman, Jianbo Shi

Figure 1 for Perceptual Artifacts Localization for Image Synthesis Tasks
Figure 2 for Perceptual Artifacts Localization for Image Synthesis Tasks
Figure 3 for Perceptual Artifacts Localization for Image Synthesis Tasks
Figure 4 for Perceptual Artifacts Localization for Image Synthesis Tasks

Recent advancements in deep generative models have facilitated the creation of photo-realistic images across various tasks. However, these generated images often exhibit perceptual artifacts in specific regions, necessitating manual correction. In this study, we present a comprehensive empirical examination of Perceptual Artifacts Localization (PAL) spanning diverse image synthesis endeavors. We introduce a novel dataset comprising 10,168 generated images, each annotated with per-pixel perceptual artifact labels across ten synthesis tasks. A segmentation model, trained on our proposed dataset, effectively localizes artifacts across a range of tasks. Additionally, we illustrate its proficiency in adapting to previously unseen models using minimal training samples. We further propose an innovative zoom-in inpainting pipeline that seamlessly rectifies perceptual artifacts in the generated images. Through our experimental analyses, we elucidate several practical downstream applications, such as automated artifact rectification, non-referential image quality evaluation, and abnormal region detection in images. The dataset and code are released.

Viaarxiv icon

DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer

Jul 11, 2023
Dan Ruta, Gemma Canet Tarrés, Andrew Gilbert, Eli Shechtman, Nicholas Kolkin, John Collomosse

Figure 1 for DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer
Figure 2 for DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer
Figure 3 for DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer
Figure 4 for DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer

Neural Style Transfer (NST) is the field of study applying neural techniques to modify the artistic appearance of a content image to match the style of a reference style image. Traditionally, NST methods have focused on texture-based image edits, affecting mostly low level information and keeping most image structures the same. However, style-based deformation of the content is desirable for some styles, especially in cases where the style is abstract or the primary concept of the style is in its deformed rendition of some content. With the recent introduction of diffusion models, such as Stable Diffusion, we can access far more powerful image generation techniques, enabling new possibilities. In our work, we propose using this new class of models to perform style transfer while enabling deformable style transfer, an elusive capability in previous models. We show how leveraging the priors of these models can expose new artistic controls at inference time, and we document our findings in exploring this new direction for the field of style transfer.

Viaarxiv icon

Realistic Saliency Guided Image Enhancement

Jun 09, 2023
S. Mahdi H. Miangoleh, Zoya Bylinskii, Eric Kee, Eli Shechtman, Yağız Aksoy

Figure 1 for Realistic Saliency Guided Image Enhancement
Figure 2 for Realistic Saliency Guided Image Enhancement
Figure 3 for Realistic Saliency Guided Image Enhancement
Figure 4 for Realistic Saliency Guided Image Enhancement

Common editing operations performed by professional photographers include the cleanup operations: de-emphasizing distracting elements and enhancing subjects. These edits are challenging, requiring a delicate balance between manipulating the viewer's attention while maintaining photo realism. While recent approaches can boast successful examples of attention attenuation or amplification, most of them also suffer from frequent unrealistic edits. We propose a realism loss for saliency-guided image enhancement to maintain high realism across varying image types, while attenuating distractors and amplifying objects of interest. Evaluations with professional photographers confirm that we achieve the dual objective of realism and effectiveness, and outperform the recent approaches on their own datasets, while requiring a smaller memory footprint and runtime. We thus offer a viable solution for automating image enhancement and photo cleanup operations.

* Proc. CVPR (2023)  
* For more info visit http://yaksoy.github.io/realisticEditing/ 
Viaarxiv icon

SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network

May 28, 2023
Chuong Huynh, Yuqian Zhou, Zhe Lin, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Abhinav Shrivastava

Figure 1 for SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network
Figure 2 for SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network
Figure 3 for SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network
Figure 4 for SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network

In photo editing, it is common practice to remove visual distractions to improve the overall image quality and highlight the primary subject. However, manually selecting and removing these small and dense distracting regions can be a laborious and time-consuming task. In this paper, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click. Our method surpasses the precision and recall achieved by the traditional method of running panoptic segmentation and then selecting the segments containing the clicks. We also showcase how a transformer-based module can be used to identify more distracting regions similar to the user's click position. Our experiments demonstrate that the model can effectively and accurately segment unknown distracting objects interactively and in groups. By significantly simplifying the photo cleaning and retouching process, our proposed model provides inspiration for exploring rare object segmentation and group selection with a single click.

* CVPR 2023. Project link: https://simpson-cvpr23.github.io 
Viaarxiv icon

NeAT: Neural Artistic Tracing for Beautiful Style Transfer

Apr 11, 2023
Dan Ruta, Andrew Gilbert, John Collomosse, Eli Shechtman, Nicholas Kolkin

Figure 1 for NeAT: Neural Artistic Tracing for Beautiful Style Transfer
Figure 2 for NeAT: Neural Artistic Tracing for Beautiful Style Transfer
Figure 3 for NeAT: Neural Artistic Tracing for Beautiful Style Transfer
Figure 4 for NeAT: Neural Artistic Tracing for Beautiful Style Transfer

Style transfer is the task of reproducing the semantic contents of a source image in the artistic style of a second target image. In this paper, we present NeAT, a new state-of-the art feed-forward style transfer method. We re-formulate feed-forward style transfer as image editing, rather than image generation, resulting in a model which improves over the state-of-the-art in both preserving the source content and matching the target style. An important component of our model's success is identifying and fixing "style halos", a commonly occurring artefact across many style transfer techniques. In addition to training and testing on standard datasets, we introduce the BBST-4M dataset, a new, large scale, high resolution dataset of 4M images. As a component of curating this data, we present a novel model able to classify if an image is stylistic. We use BBST-4M to improve and measure the generalization of NeAT across a huge variety of styles. Not only does NeAT offer state-of-the-art quality and generalization, it is designed and trained for fast inference at high resolution.

Viaarxiv icon

Automatic High Resolution Wire Segmentation and Removal

Apr 01, 2023
Mang Tik Chiu, Xuaner Zhang, Zijun Wei, Yuqian Zhou, Eli Shechtman, Connelly Barnes, Zhe Lin, Florian Kainz, Sohrab Amirghodsi, Humphrey Shi

Figure 1 for Automatic High Resolution Wire Segmentation and Removal
Figure 2 for Automatic High Resolution Wire Segmentation and Removal
Figure 3 for Automatic High Resolution Wire Segmentation and Removal
Figure 4 for Automatic High Resolution Wire Segmentation and Removal

Wires and powerlines are common visual distractions that often undermine the aesthetics of photographs. The manual process of precisely segmenting and removing them is extremely tedious and may take up hours, especially on high-resolution photos where wires may span the entire space. In this paper, we present an automatic wire clean-up system that eases the process of wire segmentation and removal/inpainting to within a few seconds. We observe several unique challenges: wires are thin, lengthy, and sparse. These are rare properties of subjects that common segmentation tasks cannot handle, especially in high-resolution images. We thus propose a two-stage method that leverages both global and local contexts to accurately segment wires in high-resolution images efficiently, and a tile-based inpainting strategy to remove the wires given our predicted segmentation masks. We also introduce the first wire segmentation benchmark dataset, WireSegHR. Finally, we demonstrate quantitatively and qualitatively that our wire clean-up system enables fully automated wire removal with great generalization to various wire appearances.

* https://github.com/adobe-research/auto-wire-removal 
Viaarxiv icon

Ablating Concepts in Text-to-Image Diffusion Models

Mar 23, 2023
Nupur Kumari, Bingliang Zhang, Sheng-Yu Wang, Eli Shechtman, Richard Zhang, Jun-Yan Zhu

Figure 1 for Ablating Concepts in Text-to-Image Diffusion Models
Figure 2 for Ablating Concepts in Text-to-Image Diffusion Models
Figure 3 for Ablating Concepts in Text-to-Image Diffusion Models
Figure 4 for Ablating Concepts in Text-to-Image Diffusion Models

Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.

* project website: https://www.cs.cmu.edu/~concept-ablation/ 
Viaarxiv icon

Scaling up GANs for Text-to-Image Synthesis

Mar 09, 2023
Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park

Figure 1 for Scaling up GANs for Text-to-Image Synthesis
Figure 2 for Scaling up GANs for Text-to-Image Synthesis
Figure 3 for Scaling up GANs for Text-to-Image Synthesis
Figure 4 for Scaling up GANs for Text-to-Image Synthesis

The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.

* CVPR 2023. Project webpage at https://mingukkang.github.io/GigaGAN/ 
Viaarxiv icon

Semi-supervised Parametric Real-world Image Harmonization

Mar 01, 2023
Ke Wang, Michaël Gharbi, He Zhang, Zhihao Xia, Eli Shechtman

Figure 1 for Semi-supervised Parametric Real-world Image Harmonization
Figure 2 for Semi-supervised Parametric Real-world Image Harmonization
Figure 3 for Semi-supervised Parametric Real-world Image Harmonization
Figure 4 for Semi-supervised Parametric Real-world Image Harmonization

Learning-based image harmonization techniques are usually trained to undo synthetic random global transformations applied to a masked foreground in a single ground truth photo. This simulated data does not model many of the important appearance mismatches (illumination, object boundaries, etc.) between foreground and background in real composites, leading to models that do not generalize well and cannot model complex local changes. We propose a new semi-supervised training strategy that addresses this problem and lets us learn complex local appearance harmonization from unpaired real composites, where foreground and background come from different images. Our model is fully parametric. It uses RGB curves to correct the global colors and tone and a shading map to model local variations. Our method outperforms previous work on established benchmarks and real composites, as shown in a user study, and processes high-resolution images interactively.

* 19 pages, 16 figures, 5 tables 
Viaarxiv icon

Domain Expansion of Image Generators

Jan 12, 2023
Yotam Nitzan, Michaël Gharbi, Richard Zhang, Taesung Park, Jun-Yan Zhu, Daniel Cohen-Or, Eli Shechtman

Figure 1 for Domain Expansion of Image Generators
Figure 2 for Domain Expansion of Image Generators
Figure 3 for Domain Expansion of Image Generators
Figure 4 for Domain Expansion of Image Generators

Can one inject new concepts into an already trained generative model, while respecting its existing structure and knowledge? We propose a new task - domain expansion - to address this. Given a pretrained generator and novel (but related) domains, we expand the generator to jointly model all domains, old and new, harmoniously. First, we note the generator contains a meaningful, pretrained latent space. Is it possible to minimally perturb this hard-earned representation, while maximally representing the new domains? Interestingly, we find that the latent space offers unused, "dormant" directions, which do not affect the output. This provides an opportunity: By "repurposing" these directions, we can represent new domains without perturbing the original representation. In fact, we find that pretrained generators have the capacity to add several - even hundreds - of new domains! Using our expansion method, one "expanded" model can supersede numerous domain-specific models, without expanding the model size. Additionally, a single expanded generator natively supports smooth transitions between domains, as well as composition of domains. Code and project page available at https://yotamnitzan.github.io/domain-expansion/.

* Project Page and code are available at https://yotamnitzan.github.io/domain-expansion/ 
Viaarxiv icon