Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jaeah Lee

Improving Editability in Image Generation with Layer-wise Memory

May 02, 2025

Daneul Kim, Jaeah Lee, Jaesik Park

Figure 1 for Improving Editability in Image Generation with Layer-wise Memory

Figure 2 for Improving Editability in Image Generation with Layer-wise Memory

Figure 3 for Improving Editability in Image Generation with Layer-wise Memory

Figure 4 for Improving Editability in Image Generation with Layer-wise Memory

Abstract:Most real-world image editing tasks require multiple sequential edits to achieve desired results. Current editing approaches, primarily designed for single-object modifications, struggle with sequential editing: especially with maintaining previous edits along with adapting new objects naturally into the existing content. These limitations significantly hinder complex editing scenarios where multiple objects need to be modified while preserving their contextual relationships. We address this fundamental challenge through two key proposals: enabling rough mask inputs that preserve existing content while naturally integrating new elements and supporting consistent editing across multiple modifications. Our framework achieves this through layer-wise memory, which stores latent representations and prompt embeddings from previous edits. We propose Background Consistency Guidance that leverages memorized latents to maintain scene coherence and Multi-Query Disentanglement in cross-attention that ensures natural adaptation to existing content. To evaluate our method, we present a new benchmark dataset incorporating semantic alignment metrics and interactive editing scenarios. Through comprehensive experiments, we demonstrate superior performance in iterative image editing tasks with minimal user effort, requiring only rough masks while maintaining high-quality results throughout multiple editing steps.

* CVPR 2025. Project page : https://carpedkm.github.io/projects/improving_edit/index.html

Via

Access Paper or Ask Questions

Recovering Dynamic 3D Sketches from Videos

Mar 27, 2025

Jaeah Lee, Changwoon Choi, Young Min Kim, Jaesik Park

Abstract:Understanding 3D motion from videos presents inherent challenges due to the diverse types of movement, ranging from rigid and deformable objects to articulated structures. To overcome this, we propose Liv3Stroke, a novel approach for abstracting objects in motion with deformable 3D strokes. The detailed movements of an object may be represented by unstructured motion vectors or a set of motion primitives using a pre-defined articulation from a template model. Just as a free-hand sketch can intuitively visualize scenes or intentions with a sparse set of lines, we utilize a set of parametric 3D curves to capture a set of spatially smooth motion elements for general objects with unknown structures. We first extract noisy, 3D point cloud motion guidance from video frames using semantic features, and our approach deforms a set of curves to abstract essential motion features as a set of explicit 3D representations. Such abstraction enables an understanding of prominent components of motions while maintaining robustness to environmental factors. Our approach allows direct analysis of 3D object movements from video, tackling the uncertainty that typically occurs when translating real-world motion into recorded footage. The project page is accessible via: https://jaeah.me/liv3stroke_web

* Accepted to CVPR 2025

Via

Access Paper or Ask Questions

3Doodle: Compact Abstraction of Objects with 3D Strokes

Feb 06, 2024

Changwoon Choi, Jaeah Lee, Jaesik Park, Young Min Kim

Figure 1 for 3Doodle: Compact Abstraction of Objects with 3D Strokes

Figure 2 for 3Doodle: Compact Abstraction of Objects with 3D Strokes

Figure 3 for 3Doodle: Compact Abstraction of Objects with 3D Strokes

Figure 4 for 3Doodle: Compact Abstraction of Objects with 3D Strokes

Abstract:While free-hand sketching has long served as an efficient representation to convey characteristics of an object, they are often subjective, deviating significantly from realistic representations. Moreover, sketches are not consistent for arbitrary viewpoints, making it hard to catch 3D shapes. We propose 3Dooole, generating descriptive and view-consistent sketch images given multi-view images of the target object. Our method is based on the idea that a set of 3D strokes can efficiently represent 3D structural information and render view-consistent 2D sketches. We express 2D sketches as a union of view-independent and view-dependent components. 3D cubic B ezier curves indicate view-independent 3D feature lines, while contours of superquadrics express a smooth outline of the volume of varying viewpoints. Our pipeline directly optimizes the parameters of 3D stroke primitives to minimize perceptual losses in a fully differentiable manner. The resulting sparse set of 3D strokes can be rendered as abstract sketches containing essential 3D characteristic shapes of various objects. We demonstrate that 3Doodle can faithfully express concepts of the original images compared with recent sketch generation approaches.

Via

Access Paper or Ask Questions