Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eli Shechtman

SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network

May 28, 2023

Chuong Huynh, Yuqian Zhou, Zhe Lin, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Abhinav Shrivastava

Figure 1 for SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network

Figure 2 for SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network

Figure 3 for SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network

Figure 4 for SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network

Abstract:In photo editing, it is common practice to remove visual distractions to improve the overall image quality and highlight the primary subject. However, manually selecting and removing these small and dense distracting regions can be a laborious and time-consuming task. In this paper, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click. Our method surpasses the precision and recall achieved by the traditional method of running panoptic segmentation and then selecting the segments containing the clicks. We also showcase how a transformer-based module can be used to identify more distracting regions similar to the user's click position. Our experiments demonstrate that the model can effectively and accurately segment unknown distracting objects interactively and in groups. By significantly simplifying the photo cleaning and retouching process, our proposed model provides inspiration for exploring rare object segmentation and group selection with a single click.

* CVPR 2023. Project link: https://simpson-cvpr23.github.io

Via

Access Paper or Ask Questions

NeAT: Neural Artistic Tracing for Beautiful Style Transfer

Apr 11, 2023

Dan Ruta, Andrew Gilbert, John Collomosse, Eli Shechtman, Nicholas Kolkin

Figure 1 for NeAT: Neural Artistic Tracing for Beautiful Style Transfer

Figure 2 for NeAT: Neural Artistic Tracing for Beautiful Style Transfer

Figure 3 for NeAT: Neural Artistic Tracing for Beautiful Style Transfer

Figure 4 for NeAT: Neural Artistic Tracing for Beautiful Style Transfer

Abstract:Style transfer is the task of reproducing the semantic contents of a source image in the artistic style of a second target image. In this paper, we present NeAT, a new state-of-the art feed-forward style transfer method. We re-formulate feed-forward style transfer as image editing, rather than image generation, resulting in a model which improves over the state-of-the-art in both preserving the source content and matching the target style. An important component of our model's success is identifying and fixing "style halos", a commonly occurring artefact across many style transfer techniques. In addition to training and testing on standard datasets, we introduce the BBST-4M dataset, a new, large scale, high resolution dataset of 4M images. As a component of curating this data, we present a novel model able to classify if an image is stylistic. We use BBST-4M to improve and measure the generalization of NeAT across a huge variety of styles. Not only does NeAT offer state-of-the-art quality and generalization, it is designed and trained for fast inference at high resolution.

Via

Access Paper or Ask Questions

Automatic High Resolution Wire Segmentation and Removal

Apr 01, 2023

Mang Tik Chiu, Xuaner Zhang, Zijun Wei, Yuqian Zhou, Eli Shechtman, Connelly Barnes, Zhe Lin, Florian Kainz, Sohrab Amirghodsi, Humphrey Shi

Figure 1 for Automatic High Resolution Wire Segmentation and Removal

Figure 2 for Automatic High Resolution Wire Segmentation and Removal

Figure 3 for Automatic High Resolution Wire Segmentation and Removal

Figure 4 for Automatic High Resolution Wire Segmentation and Removal

Abstract:Wires and powerlines are common visual distractions that often undermine the aesthetics of photographs. The manual process of precisely segmenting and removing them is extremely tedious and may take up hours, especially on high-resolution photos where wires may span the entire space. In this paper, we present an automatic wire clean-up system that eases the process of wire segmentation and removal/inpainting to within a few seconds. We observe several unique challenges: wires are thin, lengthy, and sparse. These are rare properties of subjects that common segmentation tasks cannot handle, especially in high-resolution images. We thus propose a two-stage method that leverages both global and local contexts to accurately segment wires in high-resolution images efficiently, and a tile-based inpainting strategy to remove the wires given our predicted segmentation masks. We also introduce the first wire segmentation benchmark dataset, WireSegHR. Finally, we demonstrate quantitatively and qualitatively that our wire clean-up system enables fully automated wire removal with great generalization to various wire appearances.

* https://github.com/adobe-research/auto-wire-removal

Via

Access Paper or Ask Questions

Ablating Concepts in Text-to-Image Diffusion Models

Mar 23, 2023

Nupur Kumari, Bingliang Zhang, Sheng-Yu Wang, Eli Shechtman, Richard Zhang, Jun-Yan Zhu

Figure 1 for Ablating Concepts in Text-to-Image Diffusion Models

Figure 2 for Ablating Concepts in Text-to-Image Diffusion Models

Figure 3 for Ablating Concepts in Text-to-Image Diffusion Models

Figure 4 for Ablating Concepts in Text-to-Image Diffusion Models

Abstract:Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.

* project website: https://www.cs.cmu.edu/~concept-ablation/

Via

Access Paper or Ask Questions

Scaling up GANs for Text-to-Image Synthesis

Mar 09, 2023

Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park

Abstract:The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.

* CVPR 2023. Project webpage at https://mingukkang.github.io/GigaGAN/

Via

Access Paper or Ask Questions

Semi-supervised Parametric Real-world Image Harmonization

Mar 01, 2023

Ke Wang, Michaël Gharbi, He Zhang, Zhihao Xia, Eli Shechtman

Figure 1 for Semi-supervised Parametric Real-world Image Harmonization

Figure 2 for Semi-supervised Parametric Real-world Image Harmonization

Figure 3 for Semi-supervised Parametric Real-world Image Harmonization

Figure 4 for Semi-supervised Parametric Real-world Image Harmonization

Abstract:Learning-based image harmonization techniques are usually trained to undo synthetic random global transformations applied to a masked foreground in a single ground truth photo. This simulated data does not model many of the important appearance mismatches (illumination, object boundaries, etc.) between foreground and background in real composites, leading to models that do not generalize well and cannot model complex local changes. We propose a new semi-supervised training strategy that addresses this problem and lets us learn complex local appearance harmonization from unpaired real composites, where foreground and background come from different images. Our model is fully parametric. It uses RGB curves to correct the global colors and tone and a shading map to model local variations. Our method outperforms previous work on established benchmarks and real composites, as shown in a user study, and processes high-resolution images interactively.

* 19 pages, 16 figures, 5 tables

Via

Access Paper or Ask Questions

Domain Expansion of Image Generators

Jan 12, 2023

Yotam Nitzan, Michaël Gharbi, Richard Zhang, Taesung Park, Jun-Yan Zhu, Daniel Cohen-Or, Eli Shechtman

Abstract:Can one inject new concepts into an already trained generative model, while respecting its existing structure and knowledge? We propose a new task - domain expansion - to address this. Given a pretrained generator and novel (but related) domains, we expand the generator to jointly model all domains, old and new, harmoniously. First, we note the generator contains a meaningful, pretrained latent space. Is it possible to minimally perturb this hard-earned representation, while maximally representing the new domains? Interestingly, we find that the latent space offers unused, "dormant" directions, which do not affect the output. This provides an opportunity: By "repurposing" these directions, we can represent new domains without perturbing the original representation. In fact, we find that pretrained generators have the capacity to add several - even hundreds - of new domains! Using our expansion method, one "expanded" model can supersede numerous domain-specific models, without expanding the model size. Additionally, a single expanded generator natively supports smooth transitions between domains, as well as composition of domains. Code and project page available at https://yotamnitzan.github.io/domain-expansion/.

* Project Page and code are available at https://yotamnitzan.github.io/domain-expansion/

Via

Access Paper or Ask Questions

Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators

Dec 13, 2022

Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Qing Liu, Yuqian Zhou, Sohrab Amirghodsi(+1 more)

Abstract:Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks, including segmentation-guided completion, edge-guided manipulation and panoptically-guided manipulation on Places2 datasets. Furthermore, our trained model is flexible and can support multiple editing use cases, such as object insertion, replacement, removal and standard inpainting. In particular, our trained model combined with a novel automatic image completion pipeline achieves state-of-the-art results on the standard inpainting task.

* 18 pages, 16 figures

Via

Access Paper or Ask Questions

Multi-Concept Customization of Text-to-Image Diffusion

Dec 08, 2022

Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu

Figure 1 for Multi-Concept Customization of Text-to-Image Diffusion

Figure 2 for Multi-Concept Customization of Text-to-Image Diffusion

Figure 3 for Multi-Concept Customization of Text-to-Image Diffusion

Figure 4 for Multi-Concept Customization of Text-to-Image Diffusion

Abstract:While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to quickly acquire a new concept, given a few examples? Furthermore, can we compose multiple new concepts together? We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts while enabling fast tuning (~6 minutes). Additionally, we can jointly train for multiple concepts or combine multiple fine-tuned models into one via closed-form constrained optimization. Our fine-tuned model generates variations of multiple, new concepts and seamlessly composes them with existing concepts in novel settings. Our method outperforms several baselines and concurrent works, regarding both qualitative and quantitative evaluations, while being memory and computationally efficient.

* Project webpage: https://www.cs.cmu.edu/~custom-diffusion

Via

Access Paper or Ask Questions

Contrastive Learning for Diverse Disentangled Foreground Generation

Nov 04, 2022

Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh

Abstract:We introduce a new method for diverse foreground generation with explicit control over various factors. Existing image inpainting based foreground generation methods often struggle to generate diverse results and rarely allow users to explicitly control specific factors of variation (e.g., varying the facial identity or expression for face inpainting results). We leverage contrastive learning with latent codes to generate diverse foreground results for the same masked input. Specifically, we define two sets of latent codes, where one controls a pre-defined factor (``known''), and the other controls the remaining factors (``unknown''). The sampled latent codes from the two sets jointly bi-modulate the convolution kernels to guide the generator to synthesize diverse results. Experiments demonstrate the superiority of our method over state-of-the-arts in result diversity and generation controllability.

* ECCV 2022

Via

Access Paper or Ask Questions