Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ohad Fried

DiffUHaul: A Training-Free Method for Object Dragging in Images

Jun 03, 2024

Omri Avrahami, Rinon Gal, Gal Chechik, Ohad Fried, Dani Lischinski, Arash Vahdat, Weili Nie

Figure 1 for DiffUHaul: A Training-Free Method for Object Dragging in Images

Figure 2 for DiffUHaul: A Training-Free Method for Object Dragging in Images

Figure 3 for DiffUHaul: A Training-Free Method for Object Dragging in Images

Figure 4 for DiffUHaul: A Training-Free Method for Object Dragging in Images

Abstract:Text-to-image diffusion models have proven effective for solving many image editing tasks. However, the seemingly straightforward task of seamlessly relocating objects within a scene remains surprisingly challenging. Existing methods addressing this problem often struggle to function reliably in real-world scenarios due to lacking spatial reasoning. In this work, we propose a training-free method, dubbed DiffUHaul, that harnesses the spatial understanding of a localized text-to-image model, for the object dragging task. Blindly manipulating layout inputs of the localized model tends to cause low editing performance due to the intrinsic entanglement of object representation in the model. To this end, we first apply attention masking in each denoising step to make the generation more disentangled across different objects and adopt the self-attention sharing mechanism to preserve the high-level object appearance. Furthermore, we propose a new diffusion anchoring technique: in the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance; in the later denoising steps, we pass the localized features from the source images to the interpolated images to retain fine-grained object details. To adapt DiffUHaul to real-image editing, we apply a DDPM self-attention bucketing that can better reconstruct real images with the localized model. Finally, we introduce an automated evaluation pipeline for this task and showcase the efficacy of our method. Our results are reinforced through a user preference study.

* Project page is available at https://omriavrahami.com/diffuhaul/

Via

Access Paper or Ask Questions

Diffusing Colors: Image Colorization with Text Guided Diffusion

Dec 07, 2023

Nir Zabari, Aharon Azulay, Alexey Gorkor, Tavi Halperin, Ohad Fried

Figure 1 for Diffusing Colors: Image Colorization with Text Guided Diffusion

Figure 2 for Diffusing Colors: Image Colorization with Text Guided Diffusion

Figure 3 for Diffusing Colors: Image Colorization with Text Guided Diffusion

Figure 4 for Diffusing Colors: Image Colorization with Text Guided Diffusion

Abstract:The colorization of grayscale images is a complex and subjective task with significant challenges. Despite recent progress in employing large-scale datasets with deep neural networks, difficulties with controllability and visual quality persist. To tackle these issues, we present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts. This integration not only produces colorization outputs that are semantically appropriate but also greatly improves the level of control users have over the colorization process. Our method provides a balance between automation and control, outperforming existing techniques in terms of visual quality and semantic coherence. We leverage a pretrained generative Diffusion Model, and show that we can finetune it for the colorization task without losing its generative power or attention to text prompts. Moreover, we present a novel CLIP-based ranking model that evaluates color vividness, enabling automatic selection of the most suitable level of vividness based on the specific scene semantics. Our approach holds potential particularly for color enhancement and historical image colorization.

* SIGGRAPH Asia 2023

Via

Access Paper or Ask Questions

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Nov 27, 2023

Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski

Figure 1 for The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Figure 2 for The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Figure 3 for The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Figure 4 for The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Abstract:Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, these models struggle with generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study. To conclude, we showcase several practical applications of our approach. Project page is available at https://omriavrahami.com/the-chosen-one

* Project page is available at https://omriavrahami.com/the-chosen-one

Via

Access Paper or Ask Questions

Differential Diffusion: Giving Each Pixel Its Strength

Jun 01, 2023

Eran Levin, Ohad Fried

Abstract:Text-based image editing has advanced significantly in recent years. With the rise of diffusion models, image editing via textual instructions has become ubiquitous. Unfortunately, current models lack the ability to customize the quantity of the change per pixel or per image fragment, resorting to changing the entire image in an equal amount, or editing a specific region using a binary mask. In this paper, we suggest a new framework which enables the user to customize the quantity of change for each image fragment, thereby enhancing the flexibility and verbosity of modern diffusion models. Our framework does not require model training or fine-tuning, but instead performs everything at inference time, making it easily applicable to an existing model. We show both qualitatively and quantitatively that our method allows better controllability and can produce results which are unattainable by existing models. Our code is available at: https://github.com/exx8/differential-diffusion

* Our code is available at: https://github.com/exx8/differential-diffusion

Via

Access Paper or Ask Questions

Break-A-Scene: Extracting Multiple Concepts from a Single Image

May 25, 2023

Omri Avrahami, Kfir Aberman, Ohad Fried, Daniel Cohen-Or, Dani Lischinski

Figure 1 for Break-A-Scene: Extracting Multiple Concepts from a Single Image

Figure 2 for Break-A-Scene: Extracting Multiple Concepts from a Single Image

Figure 3 for Break-A-Scene: Extracting Multiple Concepts from a Single Image

Figure 4 for Break-A-Scene: Extracting Multiple Concepts from a Single Image

Abstract:Text-to-image model personalization aims to introduce a user-provided concept to the model, allowing its synthesis in diverse contexts. However, current methods primarily focus on the case of learning a single concept from multiple images with variations in backgrounds and poses, and struggle when adapted to a different scenario. In this work, we introduce the task of textual scene decomposition: given a single image of a scene that may contain several concepts, we aim to extract a distinct text token for each concept, enabling fine-grained control over the generated scenes. To this end, we propose augmenting the input image with masks that indicate the presence of target concepts. These masks can be provided by the user or generated automatically by a pre-trained segmentation model. We then present a novel two-phase customization process that optimizes a set of dedicated textual embeddings (handles), as well as the model weights, striking a delicate balance between accurately capturing the concepts and avoiding overfitting. We employ a masked diffusion loss to enable handles to generate their assigned concepts, complemented by a novel loss on cross-attention maps to prevent entanglement. We also introduce union-sampling, a training strategy aimed to improve the ability of combining multiple concepts in generated images. We use several automatic metrics to quantitatively compare our method against several baselines, and further affirm the results using a user study. Finally, we showcase several applications of our method. Project page is available at: https://omriavrahami.com/break-a-scene/

* Project page is available at: https://omriavrahami.com/break-a-scene/ Video available at: https://www.youtube.com/watch?v=-9EA-BhizgM

Via

Access Paper or Ask Questions

Deep Image Fingerprint: Accurate And Low Budget Synthetic Image Detector

Mar 28, 2023

Sergey Sinitsa, Ohad Fried

Figure 1 for Deep Image Fingerprint: Accurate And Low Budget Synthetic Image Detector

Figure 2 for Deep Image Fingerprint: Accurate And Low Budget Synthetic Image Detector

Figure 3 for Deep Image Fingerprint: Accurate And Low Budget Synthetic Image Detector

Figure 4 for Deep Image Fingerprint: Accurate And Low Budget Synthetic Image Detector

Abstract:The generation of high-quality images has become widely accessible and is a rapidly evolving process. As a result, anyone can generate images that are indistinguishable from real ones. This leads to a wide range of applications, which also include malicious usage with deception in mind. Despite advances in detection techniques for generated images, a robust detection method still eludes us. In this work, we utilize the inductive bias of convolutional neural networks (CNNs) to develop a new detection method that requires a small amount of training samples and achieves accuracy that is on par or better than current state-of-the-art methods.

Via

Access Paper or Ask Questions

Prediction of Scene Plausibility

Dec 06, 2022

Or Nachmias, Ohad Fried, Ariel Shamir

Figure 1 for Prediction of Scene Plausibility

Figure 2 for Prediction of Scene Plausibility

Figure 3 for Prediction of Scene Plausibility

Figure 4 for Prediction of Scene Plausibility

Abstract:Understanding the 3D world from 2D images involves more than detection and segmentation of the objects within the scene. It also includes the interpretation of the structure and arrangement of the scene elements. Such understanding is often rooted in recognizing the physical world and its limitations, and in prior knowledge as to how similar typical scenes are arranged. In this research we pose a new challenge for neural network (or other) scene understanding algorithms - can they distinguish between plausible and implausible scenes? Plausibility can be defined both in terms of physical properties and in terms of functional and typical arrangements. Hence, we define plausibility as the probability of encountering a given scene in the real physical world. We build a dataset of synthetic images containing both plausible and implausible scenes, and test the success of various vision models in the task of recognizing and understanding plausibility.

Via

Access Paper or Ask Questions

FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection

Dec 01, 2022

Gil Knafo, Ohad Fried

Abstract:Video synthesis methods rapidly improved in recent years, allowing easy creation of synthetic humans. This poses a problem, especially in the era of social media, as synthetic videos of speaking humans can be used to spread misinformation in a convincing manner. Thus, there is a pressing need for accurate and robust deepfake detection methods, that can detect forgery techniques not seen during training. In this work, we explore whether this can be done by leveraging a multi-modal, out-of-domain backbone trained in a self-supervised manner, adapted to the video deepfake domain. We propose FakeOut; a novel approach that relies on multi-modal data throughout both the pre-training phase and the adaption phase. We demonstrate the efficacy and robustness of FakeOut in detecting various types of deepfakes, especially manipulations which were not seen during training. Our method achieves state-of-the-art results in cross-manipulation and cross-dataset generalization. This study shows that, perhaps surprisingly, training on out-of-domain videos (i.e., videos with no speaking humans), can lead to better deepfake detection systems. Code is available on GitHub.

Via

Access Paper or Ask Questions

Neural Font Rendering

Nov 29, 2022

Daniel Anderson, Ariel Shamir, Ohad Fried

Abstract:Recent advances in deep learning techniques and applications have revolutionized artistic creation and manipulation in many domains (text, images, music); however, fonts have not yet been integrated with deep learning architectures in a manner that supports their multi-scale nature. In this work we aim to bridge this gap, proposing a network architecture capable of rasterizing glyphs in multiple sizes, potentially paving the way for easy and accessible creation and manipulation of fonts.

Via

Access Paper or Ask Questions

Taming a Generative Model

Nov 29, 2022

Shimon Malnick, Shai Avidan, Ohad Fried

Abstract:Generative models are becoming ever more powerful, being able to synthesize highly realistic images. We propose an algorithm for taming these models - changing the probability that the model will produce a specific image or image category. We consider generative models that are powered by normalizing flows, which allows us to reason about the exact generation probability likelihood for a given image. Our method is general purpose, and we exemplify it using models that generate human faces, a subdomain with many interesting privacy and bias considerations. Our method can be used in the context of privacy, e.g., removing a specific person from the output of a model, and also in the context of de-biasing by forcing a model to output specific image categories according to a given target distribution. Our method uses a fast fine-tuning process without retraining the model from scratch, achieving the goal in less than 1% of the time taken to initially train the generative model. We evaluate qualitatively and quantitatively, to examine the success of the taming process and output quality.

Via

Access Paper or Ask Questions