Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Image Outpainting

What is Image Outpainting? Image outpainting is the process of generating new image content outside the boundaries of an existing image.

SSDD-GAN: Single-Step Denoising Diffusion GAN for Cochlear Implant Surgical Scene Completion

Feb 08, 2025

Yike Zhang, Eduardo Davalos, Jack Noble

Figure 1 for SSDD-GAN: Single-Step Denoising Diffusion GAN for Cochlear Implant Surgical Scene Completion

Figure 2 for SSDD-GAN: Single-Step Denoising Diffusion GAN for Cochlear Implant Surgical Scene Completion

Figure 3 for SSDD-GAN: Single-Step Denoising Diffusion GAN for Cochlear Implant Surgical Scene Completion

Figure 4 for SSDD-GAN: Single-Step Denoising Diffusion GAN for Cochlear Implant Surgical Scene Completion

Abstract:Recent deep learning-based image completion methods, including both inpainting and outpainting, have demonstrated promising results in restoring corrupted images by effectively filling various missing regions. Among these, Generative Adversarial Networks (GANs) and Denoising Diffusion Probabilistic Models (DDPMs) have been employed as key generative image completion approaches, excelling in the field of generating high-quality restorations with reduced artifacts and improved fine details. In previous work, we developed a method aimed at synthesizing views from novel microscope positions for mastoidectomy surgeries; however, that approach did not have the ability to restore the surrounding surgical scene environment. In this paper, we propose an efficient method to complete the surgical scene of the synthetic postmastoidectomy dataset. Our approach leverages self-supervised learning on real surgical datasets to train a Single-Step Denoising Diffusion-GAN (SSDD-GAN), combining the advantages of diffusion models with the adversarial optimization of GANs for improved Structural Similarity results of 6%. The trained model is then directly applied to the synthetic postmastoidectomy dataset using a zero-shot approach, enabling the generation of realistic and complete surgical scenes without the need for explicit ground-truth labels from the synthetic postmastoidectomy dataset. This method addresses key limitations in previous work, offering a novel pathway for full surgical microscopy scene completion and enhancing the usability of the synthetic postmastoidectomy dataset in surgical preoperative planning and intraoperative navigation.

Via

Access Paper or Ask Questions

AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization

Oct 31, 2024

Amir Kazemi, Qurat ul ain Fatima, Volodymyr Kindratenko, Christopher Tessum

Figure 1 for AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization

Figure 2 for AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization

Figure 3 for AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization

Figure 4 for AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization

Abstract:Image labeling is a critical bottleneck in the development of computer vision technologies, often constraining the potential of machine learning models due to the time-intensive nature of manual annotations. This work introduces a novel approach that leverages outpainting to address the problem of annotated data scarcity by generating artificial contexts and annotations, significantly reducing manual labeling efforts. We apply this technique to a particularly acute challenge in autonomous driving, urban planning, and environmental monitoring: the lack of diverse, eye-level vehicle images in desired classes. Our dataset comprises AI-generated vehicle images obtained by detecting and cropping vehicles from manually selected seed images, which are then outpainted onto larger canvases to simulate varied real-world conditions. The outpainted images include detailed annotations, providing high-quality ground truth data. Advanced outpainting techniques and image quality assessments ensure visual fidelity and contextual relevance. Augmentation with outpainted vehicles improves overall performance metrics by up to 8\% and enhances prediction of underrepresented classes by up to 20\%. This approach, exemplifying outpainting as a self-annotating paradigm, presents a solution that enhances dataset versatility across multiple domains of machine learning. The code and links to datasets used in this study are available for further research and replication at https://github.com/amir-kazemi/aidovecl.

* 19 pages, 4 figures, 3 tables

Via

Access Paper or Ask Questions

Generative Outpainting To Enhance the Memorability of Short-Form Videos

Nov 21, 2024

Alan Byju, Aman Sudhindra Ladwa, Lorin Sweeney, Alan F. Smeaton

Figure 1 for Generative Outpainting To Enhance the Memorability of Short-Form Videos

Figure 2 for Generative Outpainting To Enhance the Memorability of Short-Form Videos

Figure 3 for Generative Outpainting To Enhance the Memorability of Short-Form Videos

Figure 4 for Generative Outpainting To Enhance the Memorability of Short-Form Videos

Abstract:With the expanding use of the short-form video format in advertising, social media, entertainment, education and more, there is a need for such media to both captivate and be remembered. Video memorability indicates to us how likely a video is to be remembered by a viewer who has no emotional or personal connection with its content. This paper presents the results of using generative outpainting to expand the screen size of a short-form video with a view to improving its memorability. Advances in machine learning and deep learning are compared and leveraged to understand how extending the borders of video screensizes can affect their memorability to viewers. Using quantitative evaluation we determine the best-performing model for outpainting and the impact of outpainting based on image saliency on video memorability scores

Via

Access Paper or Ask Questions

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Oct 18, 2024

Shaozhe Hao, Xuantong Liu, Xianbiao Qi, Shihao Zhao, Bojia Zi, Rong Xiao, Kai Han, Kwan-Yee K. Wong

Figure 1 for BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Figure 2 for BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Figure 3 for BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Figure 4 for BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Abstract:We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities. BiGR is the first conditional generative model that unifies generation and discrimination within the same framework. BiGR features a binary tokenizer, a masked modeling mechanism, and a binary transcoder for binary code prediction. Additionally, we introduce a novel entropy-ordered sampling method to enable efficient image generation. Extensive experiments validate BiGR's superior performance in generation quality, as measured by FID-50k, and representation capabilities, as evidenced by linear-probe accuracy. Moreover, BiGR showcases zero-shot generalization across various vision tasks, enabling applications such as image inpainting, outpainting, editing, interpolation, and enrichment, without the need for structural modifications. Our findings suggest that BiGR unifies generative and discriminative tasks effectively, paving the way for further advancements in the field.

* Project page: https://haoosz.github.io/BiGR

Via

Access Paper or Ask Questions

Diffusion-based Generative Image Outpainting for Recovery of FOV-Truncated CT Images

Jun 07, 2024

Michelle Espranita Liman, Daniel Rueckert, Florian J. Fintelmann, Philip Müller

Figure 1 for Diffusion-based Generative Image Outpainting for Recovery of FOV-Truncated CT Images

Figure 2 for Diffusion-based Generative Image Outpainting for Recovery of FOV-Truncated CT Images

Figure 3 for Diffusion-based Generative Image Outpainting for Recovery of FOV-Truncated CT Images

Figure 4 for Diffusion-based Generative Image Outpainting for Recovery of FOV-Truncated CT Images

Abstract:Field-of-view (FOV) recovery of truncated chest CT scans is crucial for accurate body composition analysis, which involves quantifying skeletal muscle and subcutaneous adipose tissue (SAT) on CT slices. This, in turn, enables disease prognostication. Here, we present a method for recovering truncated CT slices using generative image outpainting. We train a diffusion model and apply it to truncated CT slices generated by simulating a small FOV. Our model reliably recovers the truncated anatomy and outperforms the previous state-of-the-art despite being trained on 87% less data.

* Shared last authorship: Florian J. Fintelmann and Philip M\"uller

Via

Access Paper or Ask Questions

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Mar 14, 2025

Jianhong Bai, Menghan Xia, Xiao Fu, Xintao Wang, Lianrui Mu, Jinwen Cao, Zuozhu Liu, Haoji Hu, Xiang Bai, Pengfei Wan(+1 more)

Abstract:Camera control has been actively studied in text or image conditioned video generation tasks. However, altering camera trajectories of a given video remains under-explored, despite its importance in the field of video creation. It is non-trivial due to the extra constraints of maintaining multiple-frame appearance and dynamic synchronization. To address this, we present ReCamMaster, a camera-controlled generative video re-rendering framework that reproduces the dynamic scene of an input video at novel camera trajectories. The core innovation lies in harnessing the generative capabilities of pre-trained text-to-video models through a simple yet powerful video conditioning mechanism -- its capability often overlooked in current research. To overcome the scarcity of qualified training data, we construct a comprehensive multi-camera synchronized video dataset using Unreal Engine 5, which is carefully curated to follow real-world filming characteristics, covering diverse scenes and camera movements. It helps the model generalize to in-the-wild videos. Lastly, we further improve the robustness to diverse inputs through a meticulously designed training strategy. Extensive experiments tell that our method substantially outperforms existing state-of-the-art approaches and strong baselines. Our method also finds promising applications in video stabilization, super-resolution, and outpainting. Project page: https://jianhongbai.github.io/ReCamMaster/

* Project page: https://jianhongbai.github.io/ReCamMaster/

Via

Access Paper or Ask Questions

Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference

Feb 05, 2025

Candi Zheng, Yuan Lan, Yang Wang

Figure 1 for Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference

Figure 2 for Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference

Figure 3 for Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference

Figure 4 for Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference

Abstract:Diffusion models generate high-quality images but often lack efficient and universally applicable inpainting capabilities, particularly in community-trained models. We introduce LanPaint, a training-free method tailored for widely adopted ODE-based samplers, which leverages Langevin dynamics to perform exact conditional inference, enabling precise and visually coherent inpainting. LanPaint addresses two key challenges in Langevin-based inpainting: (1) the risk of local likelihood maxima trapping and (2) slow convergence. By proposing a guided score function and a fast-converging Langevin framework, LanPaint achieves high-fidelity results in very few iterations. Experiments demonstrate that LanPaint outperforms existing training-free inpainting techniques, outperforming in challenging tasks such as outpainting with Stable Diffusion.

Via

Access Paper or Ask Questions

Beyond Local Views: Global State Inference with Diffusion Models for Cooperative Multi-Agent Reinforcement Learning

Aug 18, 2024

Zhiwei Xu, Hangyu Mao, Nianmin Zhang, Xin Xin, Pengjie Ren, Dapeng Li, Bin Zhang, Guoliang Fan, Zhumin Chen, Changwei Wang(+1 more)

Figure 1 for Beyond Local Views: Global State Inference with Diffusion Models for Cooperative Multi-Agent Reinforcement Learning

Figure 2 for Beyond Local Views: Global State Inference with Diffusion Models for Cooperative Multi-Agent Reinforcement Learning

Figure 3 for Beyond Local Views: Global State Inference with Diffusion Models for Cooperative Multi-Agent Reinforcement Learning

Figure 4 for Beyond Local Views: Global State Inference with Diffusion Models for Cooperative Multi-Agent Reinforcement Learning

Abstract:In partially observable multi-agent systems, agents typically only have access to local observations. This severely hinders their ability to make precise decisions, particularly during decentralized execution. To alleviate this problem and inspired by image outpainting, we propose State Inference with Diffusion Models (SIDIFF), which uses diffusion models to reconstruct the original global state based solely on local observations. SIDIFF consists of a state generator and a state extractor, which allow agents to choose suitable actions by considering both the reconstructed global state and local observations. In addition, SIDIFF can be effortlessly incorporated into current multi-agent reinforcement learning algorithms to improve their performance. Finally, we evaluated SIDIFF on different experimental platforms, including Multi-Agent Battle City (MABC), a novel and flexible multi-agent reinforcement learning environment we developed. SIDIFF achieved desirable results and outperformed other popular algorithms.

* 15 pages, 12 figures

Via

Access Paper or Ask Questions

CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Jul 09, 2024

Xiaoding Yuan, Shitao Tang, Kejie Li, Alan Yuille, Peng Wang

Figure 1 for CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Figure 2 for CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Figure 3 for CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Figure 4 for CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Abstract:This paper introduces Camera-free Diffusion (CamFreeDiff) model for 360-degree image outpainting from a single camera-free image and text description. This method distinguishes itself from existing strategies, such as MVDiffusion, by eliminating the requirement for predefined camera poses. Instead, our model incorporates a mechanism for predicting homography directly within the multi-view diffusion framework. The core of our approach is to formulate camera estimation by predicting the homography transformation from the input view to a predefined canonical view. The homography provides point-level correspondences between the input image and targeting panoramic images, allowing connections enforced by correspondence-aware attention in a fully differentiable manner. Qualitative and quantitative experimental results demonstrate our model's strong robustness and generalization ability for 360-degree image outpainting in the challenging context of camera-free inputs.

Via

Access Paper or Ask Questions

Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise

Dec 12, 2024

Tornike Karchkhadze, Keren Shao, Shlomo Dubnov

Figure 1 for Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise

Abstract:This work presents a novel method for composing and improvising music inspired by Cornelius Cardew's Treatise, using AI to bridge graphic notation and musical expression. By leveraging OpenAI's ChatGPT to interpret the abstract visual elements of Treatise, we convert these graphical images into descriptive textual prompts. These prompts are then input into MusicLDM, a pre-trained latent diffusion model designed for music generation. We introduce a technique called "outpainting," which overlaps sections of AI-generated music to create a seamless and cohesive composition. We demostrate a new perspective on performing and interpreting graphic scores, showing how AI can transform visual stimuli into sound and expand the creative possibilities in contemporary/experimental music composition. Musical pieces are available at https://bit.ly/TreatiseAI

* 2024 IEEE International Conference on Big Data (Big Data)

Via

Access Paper or Ask Questions

Topic:Image Outpainting

Papers and Code