Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sally Ma

Google DeepMind

Di3PO -- Diptych Diffusion DPO for Targeted Improvements in Image

Feb 06, 2026

Sanjana Reddy, Ishaan Malhi, Sally Ma, Praneet Dutta

Abstract:Existing methods for preference tuning of text-to-image (T2I) diffusion models often rely on computationally expensive generation steps to create positive and negative pairs of images. These approaches frequently yield training pairs that either lack meaningful differences, are expensive to sample and filter, or exhibit significant variance in irrelevant pixel regions, thereby degrading training efficiency. To address these limitations, we introduce "Di3PO", a novel method for constructing positive and negative pairs that isolates specific regions targeted for improvement during preference tuning, while keeping the surrounding context in the image stable. We demonstrate the efficacy of our approach by applying it to the challenging task of text rendering in diffusion models, showcasing improvements over baseline methods of SFT and DPO.

Via

Access Paper or Ask Questions

Preserving Product Fidelity in Large Scale Image Recontextualization with Diffusion Models

Mar 11, 2025

Ishaan Malhi, Praneet Dutta, Ellie Talius, Sally Ma, Brendan Driscoll, Krista Holden, Garima Pruthi, Arunachalam Narayanaswamy

Figure 1 for Preserving Product Fidelity in Large Scale Image Recontextualization with Diffusion Models

Figure 2 for Preserving Product Fidelity in Large Scale Image Recontextualization with Diffusion Models

Figure 3 for Preserving Product Fidelity in Large Scale Image Recontextualization with Diffusion Models

Figure 4 for Preserving Product Fidelity in Large Scale Image Recontextualization with Diffusion Models

Abstract:We present a framework for high-fidelity product image recontextualization using text-to-image diffusion models and a novel data augmentation pipeline. This pipeline leverages image-to-video diffusion, in/outpainting & negatives to create synthetic training data, addressing limitations of real-world data collection for this task. Our method improves the quality and diversity of generated images by disentangling product representations and enhancing the model's understanding of product characteristics. Evaluation on the ABO dataset and a private product dataset, using automated metrics and human assessment, demonstrates the effectiveness of our framework in generating realistic and compelling product visualizations, with implications for applications such as e-commerce and virtual product showcasing.

Via

Access Paper or Ask Questions

Surfer100: Generating Surveys From Web Resources on Wikipedia-style

Dec 13, 2021

Irene Li, Alexander Fabbri, Rina Kawamura, Yixin Liu, Xiangru Tang, Jaesung Tae, Chang Shen, Sally Ma, Tomoe Mizutani, Dragomir Radev

Figure 1 for Surfer100: Generating Surveys From Web Resources on Wikipedia-style

Figure 2 for Surfer100: Generating Surveys From Web Resources on Wikipedia-style

Figure 3 for Surfer100: Generating Surveys From Web Resources on Wikipedia-style

Figure 4 for Surfer100: Generating Surveys From Web Resources on Wikipedia-style

Abstract:Fast-developing fields such as Artificial Intelligence (AI) often outpace the efforts of encyclopedic sources such as Wikipedia, which either do not completely cover recently-introduced topics or lack such content entirely. As a result, methods for automatically producing content are valuable tools to address this information overload. We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation. We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys. This is the first study on utilizing web resources for long Wikipedia-style summaries to the best of our knowledge.

Via

Access Paper or Ask Questions