Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhifei Zhang

OpenAPMax: Abnormal Patterns-based Model for Real-World Alzheimer's Disease Diagnosis

Jul 03, 2023

Yunyou Huang, Xianglong Guan, Xiangjiang Lu, Xiaoshuang Liang, Xiuxia Miao, Jiyue Xie, Wenjing Liu, Li Ma, Suqin Tang, Zhifei Zhang(+1 more)

Figure 1 for OpenAPMax: Abnormal Patterns-based Model for Real-World Alzheimer's Disease Diagnosis

Figure 2 for OpenAPMax: Abnormal Patterns-based Model for Real-World Alzheimer's Disease Diagnosis

Figure 3 for OpenAPMax: Abnormal Patterns-based Model for Real-World Alzheimer's Disease Diagnosis

Figure 4 for OpenAPMax: Abnormal Patterns-based Model for Real-World Alzheimer's Disease Diagnosis

Abstract:Alzheimer's disease (AD) cannot be reversed, but early diagnosis will significantly benefit patients' medical treatment and care. In recent works, AD diagnosis has the primary assumption that all categories are known a prior -- a closed-set classification problem, which contrasts with the open-set recognition problem. This assumption hinders the application of the model in natural clinical settings. Although many open-set recognition technologies have been proposed in other fields, they are challenging to use for AD diagnosis directly since 1) AD is a degenerative disease of the nervous system with similar symptoms at each stage, and it is difficult to distinguish from its pre-state, and 2) diversified strategies for AD diagnosis are challenging to model uniformly. In this work, inspired by the concerns of clinicians during diagnosis, we propose an open-set recognition model, OpenAPMax, based on the anomaly pattern to address AD diagnosis in real-world settings. OpenAPMax first obtains the abnormal pattern of each patient relative to each known category through statistics or a literature search, clusters the patients' abnormal pattern, and finally, uses extreme value theory (EVT) to model the distance between each patient's abnormal pattern and the center of their category and modify the classification probability. We evaluate the performance of the proposed method with recent open-set recognition, where we obtain state-of-the-art results.

* Alzheimer's Disease, Abnormal Patterns, Open-set Recognition, OpenAPMax

Via

Access Paper or Ask Questions

Photoswap: Personalized Subject Swapping in Images

May 29, 2023

Jing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung(+1 more)

Figure 1 for Photoswap: Personalized Subject Swapping in Images

Figure 2 for Photoswap: Personalized Subject Swapping in Images

Figure 3 for Photoswap: Personalized Subject Swapping in Images

Figure 4 for Photoswap: Personalized Subject Swapping in Images

Abstract:In an era where images and visual content dominate our digital landscape, the ability to manipulate and personalize these images has become a necessity. Envision seamlessly substituting a tabby cat lounging on a sunlit window sill in a photograph with your own playful puppy, all while preserving the original charm and composition of the image. We present Photoswap, a novel approach that enables this immersive image editing experience through personalized subject swapping in existing images. Photoswap first learns the visual concept of the subject from reference images and then swaps it into the target image using pre-trained diffusion models in a training-free manner. We establish that a well-conceptualized visual subject can be seamlessly transferred to any image with appropriate self-attention and cross-attention manipulation, maintaining the pose of the swapped subject and the overall coherence of the image. Comprehensive experiments underscore the efficacy and controllability of Photoswap in personalized subject swapping. Furthermore, Photoswap significantly outperforms baseline methods in human ratings across subject swapping, background preservation, and overall quality, revealing its vast application potential, from entertainment to professional editing.

* 14 pages

Via

Access Paper or Ask Questions

DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation

May 17, 2023

Ying-Tian Liu, Zhifei Zhang, Yuan-Chen Guo, Matthew Fisher, Zhaowen Wang, Song-Hai Zhang

Figure 1 for DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation

Figure 2 for DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation

Figure 3 for DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation

Figure 4 for DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation

Abstract:Automatic generation of fonts can be an important aid to typeface design. Many current approaches regard glyphs as pixelated images, which present artifacts when scaling and inevitable quality losses after vectorization. On the other hand, existing vector font synthesis methods either fail to represent the shape concisely or require vector supervision during training. To push the quality of vector font synthesis to the next level, we propose a novel dual-part representation for vector glyphs, where each glyph is modeled as a collection of closed "positive" and "negative" path pairs. The glyph contour is then obtained by boolean operations on these paths. We first learn such a representation only from glyph images and devise a subsequent contour refinement step to align the contour with an image representation to further enhance details. Our method, named DualVector, outperforms state-of-the-art methods in vector font synthesis both quantitatively and qualitatively. Our synthesized vector fonts can be easily converted to common digital font formats like TrueType Font for practical use. The code is released at https://github.com/thuliu-yt16/dualvector.

* CVPR 2023

Via

Access Paper or Ask Questions

Improving Diffusion Models for Scene Text Editing with Dual Encoders

Apr 12, 2023

Jiabao Ji, Guanhua Zhang, Zhaowen Wang, Bairu Hou, Zhifei Zhang, Brian Price, Shiyu Chang

Figure 1 for Improving Diffusion Models for Scene Text Editing with Dual Encoders

Figure 2 for Improving Diffusion Models for Scene Text Editing with Dual Encoders

Figure 3 for Improving Diffusion Models for Scene Text Editing with Dual Encoders

Figure 4 for Improving Diffusion Models for Scene Text Editing with Dual Encoders

Abstract:Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop out text regions and feed them into image transfer models, such as GANs. However, these methods are limited in their ability to change text style and are unable to insert texts into images. Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing. However, our empirical analysis reveals that state-of-the-art diffusion models struggle with rendering correct text and controlling text style. To address these problems, we propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design, which includes a character encoder for better text legibility and an instruction encoder for better style control. An instruction tuning framework is introduced to train our model to learn the mapping from the text instruction to the corresponding image with either the specified style or the style of the surrounding texts in the background. Such a training method further brings our method the zero-shot generalization ability to the following three scenarios: generating text with unseen font variation, e.g., italic and bold, mixing different fonts to construct a new font, and using more relaxed forms of natural language as the instructions to guide the generation task. We evaluate our approach on five datasets and demonstrate its superior performance in terms of text correctness, image naturalness, and style controllability. Our code is publicly available. https://github.com/UCSB-NLP-Chang/DiffSTE

* 22 pages, 19 figures

Via

Access Paper or Ask Questions

TopNet: Transformer-based Object Placement Network for Image Compositing

Apr 06, 2023

Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen

Figure 1 for TopNet: Transformer-based Object Placement Network for Image Compositing

Figure 2 for TopNet: Transformer-based Object Placement Network for Image Compositing

Figure 3 for TopNet: Transformer-based Object Placement Network for Image Compositing

Figure 4 for TopNet: Transformer-based Object Placement Network for Image Compositing

Abstract:We investigate the problem of automatically placing an object into a background image for image compositing. Given a background image and a segmented object, the goal is to train a model to predict plausible placements (location and scale) of the object for compositing. The quality of the composite image highly depends on the predicted location/scale. Existing works either generate candidate bounding boxes or apply sliding-window search using global representations from background and object images, which fail to model local information in background images. However, local clues in background images are important to determine the compatibility of placing the objects with certain locations/scales. In this paper, we propose to learn the correlation between object features and all local background features with a transformer module so that detailed information can be provided on all possible location/scale configurations. A sparse contrastive loss is further proposed to train our model with sparse supervision. Our new formulation generates a 3D heatmap indicating the plausibility of all location/scale combinations in one network forward pass, which is over 10 times faster than the previous sliding-window method. It also supports interactive search when users provide a pre-defined location or scale. The proposed method can be trained with explicit annotation or in a self-supervised manner using an off-the-shelf inpainting model, and it outperforms state-of-the-art methods significantly. The user study shows that the trained model generalizes well to real-world images with diverse challenging scenes and object categories.

* CVPR

Via

Access Paper or Ask Questions

Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

Feb 24, 2023

Cusuh Ham, James Hays, Jingwan Lu, Krishna Kumar Singh, Zhifei Zhang, Tobias Hinz

Figure 1 for Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

Figure 2 for Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

Figure 3 for Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

Figure 4 for Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

Abstract:We present multimodal conditioning modules (MCM) for enabling conditional image synthesis using pretrained diffusion models. Previous multimodal synthesis works rely on training networks from scratch or fine-tuning pretrained networks, both of which are computationally expensive for large, state-of-the-art diffusion models. Our method uses pretrained networks but does not require any updates to the diffusion network's parameters. MCM is a small module trained to modulate the diffusion network's predictions during sampling using 2D modalities (e.g., semantic segmentation maps, sketches) that were unseen during the original training of the diffusion model. We show that MCM enables user control over the spatial layout of the image and leads to increased control over the image generation process. Training MCM is cheap as it does not require gradients from the original diffusion net, consists of only $\sim$1$\%$ of the number of parameters of the base diffusion model, and is trained using only a limited number of training examples. We evaluate our method on unconditional and text-conditional models to demonstrate the improved control over the generated images and their alignment with respect to the conditioning inputs.

Via

Access Paper or Ask Questions

SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model

Dec 09, 2022

Shaoan Xie, Zhifei Zhang, Zhe Lin, Tobias Hinz, Kun Zhang

Figure 1 for SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model

Figure 2 for SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model

Figure 3 for SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model

Figure 4 for SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model

Abstract:Generic image inpainting aims to complete a corrupted image by borrowing surrounding information, which barely generates novel content. By contrast, multi-modal inpainting provides more flexible and useful controls on the inpainted content, \eg, a text prompt can be used to describe an object with richer attributes, and a mask can be used to constrain the shape of the inpainted object rather than being only considered as a missing area. We propose a new diffusion-based model named SmartBrush for completing a missing region with an object using both text and shape-guidance. While previous work such as DALLE-2 and Stable Diffusion can do text-guided inapinting they do not support shape guidance and tend to modify background texture surrounding the generated object. Our model incorporates both text and shape guidance with precision control. To preserve the background better, we propose a novel training and sampling strategy by augmenting the diffusion U-net with object-mask prediction. Lastly, we introduce a multi-task training strategy by jointly training inpainting with text-to-image generation to leverage more training data. We conduct extensive experiments showing that our model outperforms all baselines in terms of visual quality, mask controllability, and background preservation.

Via

Access Paper or Ask Questions

ObjectStitch: Generative Object Compositing

Dec 05, 2022

Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, Daniel Aliaga

Figure 1 for ObjectStitch: Generative Object Compositing

Figure 2 for ObjectStitch: Generative Object Compositing

Figure 3 for ObjectStitch: Generative Object Compositing

Figure 4 for ObjectStitch: Generative Object Compositing

Abstract:Object compositing based on 2D images is a challenging problem since it typically involves multiple processing stages such as color harmonization, geometry correction and shadow generation to generate realistic results. Furthermore, annotating training data pairs for compositing requires substantial manual effort from professionals, and is hardly scalable. Thus, with the recent advances in generative models, in this work, we propose a self-supervised framework for object compositing by leveraging the power of conditional diffusion models. Our framework can hollistically address the object compositing task in a unified model, transforming the viewpoint, geometry, color and shadow of the generated object while requiring no manual labeling. To preserve the input object's characteristics, we introduce a content adaptor that helps to maintain categorical semantics and object appearance. A data augmentation method is further adopted to improve the fidelity of the generator. Our method outperforms relevant baselines in both realism and faithfulness of the synthesized result images in a user study on various real-world images.

Via

Access Paper or Ask Questions

Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

Jul 31, 2022

Xudong Xie, Ling Fu, Zhifei Zhang, Zhaowen Wang, Xiang Bai

Figure 1 for Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

Figure 2 for Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

Figure 3 for Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

Figure 4 for Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

Abstract:Artistic text recognition is an extremely challenging task with a wide range of applications. However, current scene text recognition methods mainly focus on irregular text while have not explored artistic text specifically. The challenges of artistic text recognition include the various appearance with special-designed fonts and effects, the complex connections and overlaps between characters, and the severe interference from background patterns. To alleviate these problems, we propose to recognize the artistic text at three levels. Firstly, corner points are applied to guide the extraction of local features inside characters, considering the robustness of corner structures to appearance and shape. In this way, the discreteness of the corner points cuts off the connection between characters, and the sparsity of them improves the robustness for background interference. Secondly, we design a character contrastive loss to model the character-level feature, improving the feature representation for character classification. Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points, with the assistance of a corner-query cross-attention mechanism. Besides, we provide an artistic text dataset to benchmark the performance. Experimental results verify the significant superiority of our proposed method on artistic text recognition and also achieve state-of-the-art performance on several blurred and perspective datasets.

* Accepted by ECCV2022 as an oral paper. The dataset and codes are available at https://github.com/xdxie/WordArt

Via

Access Paper or Ask Questions

SImProv: Scalable Image Provenance Framework for Robust Content Attribution

Jun 28, 2022

Alexander Black, Tu Bui, Simon Jenni, Zhifei Zhang, Viswanathan Swaminanthan, John Collomosse

Figure 1 for SImProv: Scalable Image Provenance Framework for Robust Content Attribution

Figure 2 for SImProv: Scalable Image Provenance Framework for Robust Content Attribution

Figure 3 for SImProv: Scalable Image Provenance Framework for Robust Content Attribution

Figure 4 for SImProv: Scalable Image Provenance Framework for Robust Content Attribution

Abstract:We present SImProv - a scalable image provenance framework to match a query image back to a trusted database of originals and identify possible manipulations on the query. SImProv consists of three stages: a scalable search stage for retrieving top-k most similar images; a re-ranking and near-duplicated detection stage for identifying the original among the candidates; and finally a manipulation detection and visualization stage for localizing regions within the query that may have been manipulated to differ from the original. SImProv is robust to benign image transformations that commonly occur during online redistribution, such as artifacts due to noise and recompression degradation, as well as out-of-place transformations due to image padding, warping, and changes in size and shape. Robustness towards out-of-place transformations is achieved via the end-to-end training of a differentiable warping module within the comparator architecture. We demonstrate effective retrieval and manipulation detection over a dataset of 100 million images.

* Submitted to IEEE Transactions on Information Forensics and Security

Via

Access Paper or Ask Questions