Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jian Zhang

Face2Face: Label-driven Facial Retouching Restoration

Apr 22, 2024

Guanhua Zhao, Yu Gu, Xuhan Sheng, Yujie Hu, Jian Zhang

Figure 1 for Face2Face: Label-driven Facial Retouching Restoration

Figure 2 for Face2Face: Label-driven Facial Retouching Restoration

Figure 3 for Face2Face: Label-driven Facial Retouching Restoration

Figure 4 for Face2Face: Label-driven Facial Retouching Restoration

Abstract:With the popularity of social media platforms such as Instagram and TikTok, and the widespread availability and convenience of retouching tools, an increasing number of individuals are utilizing these tools to beautify their facial photographs. This poses challenges for fields that place high demands on the authenticity of photographs, such as identity verification and social media. By altering facial images, users can easily create deceptive images, leading to the dissemination of false information. This may pose challenges to the reliability of identity verification systems and social media, and even lead to online fraud. To address this issue, some work has proposed makeup removal methods, but they still lack the ability to restore images involving geometric deformations caused by retouching. To tackle the problem of facial retouching restoration, we propose a framework, dubbed Face2Face, which consists of three components: a facial retouching detector, an image restoration model named FaceR, and a color correction module called Hierarchical Adaptive Instance Normalization (H-AdaIN). Firstly, the facial retouching detector predicts a retouching label containing three integers, indicating the retouching methods and their corresponding degrees. Then FaceR restores the retouched image based on the predicted retouching label. Finally, H-AdaIN is applied to address the issue of color shift arising from diffusion models. Extensive experiments demonstrate the effectiveness of our framework and each module.

Via

Access Paper or Ask Questions

OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

Apr 17, 2024

Runyi Li, Xuhan Sheng, Weiqi Li, Jian Zhang

Figure 1 for OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

Figure 2 for OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

Figure 3 for OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

Figure 4 for OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

Abstract:Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks. Most existing super-resolution methods for ODIs use end-to-end learning strategies, resulting in inferior realness of generated images and a lack of effective out-of-domain generalization capabilities in training methods. Image generation methods represented by diffusion model provide strong priors for visual tasks and have been proven to be effectively applied to image restoration tasks. Leveraging the image priors of the Stable Diffusion (SD) model, we achieve omnidirectional image super-resolution with both fidelity and realness, dubbed as OmniSSR. Firstly, we transform the equirectangular projection (ERP) images into tangent projection (TP) images, whose distribution approximates the planar image domain. Then, we use SD to iteratively sample initial high-resolution results. At each denoising iteration, we further correct and update the initial results using the proposed Octadecaplex Tangent Information Interaction (OTII) and Gradient Decomposition (GD) technique to ensure better consistency. Finally, the TP images are transformed back to obtain the final high-resolution results. Our method is zero-shot, requiring no training or fine-tuning. Experiments of our method on two benchmark datasets demonstrate the effectiveness of our proposed method.

Via

Access Paper or Ask Questions

BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection

Apr 13, 2024

Jian Zhang, Ruiteng Zhang, Xinyue Yan, Xiting Zhuang, Ruicheng Cao

Figure 1 for BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection

Figure 2 for BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection

Figure 3 for BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection

Figure 4 for BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection

Abstract:Degraded underwater images decrease the accuracy of underwater object detection. However, existing methods for underwater image enhancement mainly focus on improving the indicators in visual aspects, which may not benefit the tasks of underwater image detection, and may lead to serious degradation in performance. To alleviate this problem, we proposed a bidirectional-guided method for underwater object detection, referred to as BG-YOLO. In the proposed method, network is organized by constructing an enhancement branch and a detection branch in a parallel way. The enhancement branch consists of a cascade of an image enhancement subnet and an object detection subnet. And the detection branch only consists of a detection subnet. A feature guided module connects the shallow convolution layer of the two branches. When training the enhancement branch, the object detection subnet in the enhancement branch guides the image enhancement subnet to be optimized towards the direction that is most conducive to the detection task. The shallow feature map of the trained enhancement branch will be output to the feature guided module, constraining the optimization of detection branch through consistency loss and prompting detection branch to learn more detailed information of the objects. And hence the detection performance will be refined. During the detection tasks, only detection branch will be reserved so that no additional cost of computation will be introduced. Extensive experiments demonstrate that the proposed method shows significant improvement in performance of the detector in severely degraded underwater scenes while maintaining a remarkable detection speed.

* 15 pages, 8 figures, 4 tables

Via

Access Paper or Ask Questions

Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Apr 13, 2024

Qinghe Ma, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

Figure 1 for Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Figure 2 for Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Figure 3 for Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Figure 4 for Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Abstract:Both limited annotation and domain shift are prevalent challenges in medical image segmentation. Traditional semi-supervised segmentation and unsupervised domain adaptation methods address one of these issues separately. However, the coexistence of limited annotation and domain shift is quite common, which motivates us to introduce a novel and challenging scenario: Mixed Domain Semi-supervised medical image Segmentation (MiDSS). In this scenario, we handle data from multiple medical centers, with limited annotations available for a single domain and a large amount of unlabeled data from multiple domains. We found that the key to solving the problem lies in how to generate reliable pseudo labels for the unlabeled data in the presence of domain shift with labeled data. To tackle this issue, we employ Unified Copy-Paste (UCP) between images to construct intermediate domains, facilitating the knowledge transfer from the domain of labeled data to the domains of unlabeled data. To fully utilize the information within the intermediate domain, we propose a symmetric Guidance training strategy (SymGD), which additionally offers direct guidance to unlabeled data by merging pseudo labels from intermediate samples. Subsequently, we introduce a Training Process aware Random Amplitude MixUp (TP-RAM) to progressively incorporate style-transition components into intermediate samples. Compared with existing state-of-the-art approaches, our method achieves a notable 13.57% improvement in Dice score on Prostate dataset, as demonstrated on three public datasets. Our code is available at https://github.com/MQinghe/MiDSS .

Via

Access Paper or Ask Questions

Automated Polyp Segmentation in Colonoscopy Images

Apr 06, 2024

Swagat Ranjit, Jian Zhang, Bijaya B. Karki

Figure 1 for Automated Polyp Segmentation in Colonoscopy Images

Figure 2 for Automated Polyp Segmentation in Colonoscopy Images

Figure 3 for Automated Polyp Segmentation in Colonoscopy Images

Figure 4 for Automated Polyp Segmentation in Colonoscopy Images

Abstract:It is important to find the polyps in a human system that helps to prevent cancer during medical diagnosis. This research discusses using a dilated convolution module along with a criss cross attention-based network to segment polyps from the endoscopic images of the colon. To gather the context information of all pixels in an image more efficiently, criss-cross attention module has played a vital role. In order to extract maximum information from dataset, data augmentation techniques are employed in the dataset. Rotations, flips, scaling, and contrast along with varying learning rates were implemented to make a better model. Global average pooling was applied over ResNet50 that helped to store the important details of encoder. In our experiment, the proposed architecture's performance was compared with existing models like U-Net, DeepLabV3, PraNet. This architecture outperformed other models on the subset of dataset which has irregular polyp shapes. The combination of dilated convolution module, RCCA, and global average pooling was found to be effective for irregular shapes. Our architecture demonstrates an enhancement, with an average improvement of 3.75% across all metrics when compared to existing models.

* 9 pages

Via

Access Paper or Ask Questions

Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting

Apr 01, 2024

Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma

Figure 1 for Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting

Figure 2 for Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting

Figure 3 for Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting

Figure 4 for Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting

Abstract:3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that physically exist, resulting in inaccurate reconstructions and inconsistent reflective properties across varied viewpoints. To address this pivotal challenge, we introduce Mirror-3DGS, an innovative rendering framework devised to master the intricacies of mirror geometries and reflections, paving the way for the generation of realistically depicted mirror reflections. By ingeniously incorporating mirror attributes into the 3DGS and leveraging the principle of plane mirror imaging, Mirror-3DGS crafts a mirrored viewpoint to observe from behind the mirror, enriching the realism of scene renderings. Extensive assessments, spanning both synthetic and real-world scenes, showcase our method's ability to render novel views with enhanced fidelity in real-time, surpassing the state-of-the-art Mirror-NeRF specifically within the challenging mirror regions. Our code will be made publicly available for reproducible research.

* 22 pages, 7 figures

Via

Access Paper or Ask Questions

InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds

Mar 29, 2024

Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos(+2 more)

Abstract:While novel view synthesis (NVS) has made substantial progress in 3D computer vision, it typically requires an initial estimation of camera intrinsics and extrinsics from dense viewpoints. This pre-processing is usually conducted via a Structure-from-Motion (SfM) pipeline, a procedure that can be slow and unreliable, particularly in sparse-view scenarios with insufficient matched features for accurate reconstruction. In this work, we integrate the strengths of point-based representations (e.g., 3D Gaussian Splatting, 3D-GS) with end-to-end dense stereo models (DUSt3R) to tackle the complex yet unresolved issues in NVS under unconstrained settings, which encompasses pose-free and sparse view challenges. Our framework, InstantSplat, unifies dense stereo priors with 3D-GS to build 3D Gaussians of large-scale scenes from sparseview & pose-free images in less than 1 minute. Specifically, InstantSplat comprises a Coarse Geometric Initialization (CGI) module that swiftly establishes a preliminary scene structure and camera parameters across all training views, utilizing globally-aligned 3D point maps derived from a pre-trained dense stereo pipeline. This is followed by the Fast 3D-Gaussian Optimization (F-3DGO) module, which jointly optimizes the 3D Gaussian attributes and the initialized poses with pose regularization. Experiments conducted on the large-scale outdoor Tanks & Temples datasets demonstrate that InstantSplat significantly improves SSIM (by 32%) while concurrently reducing Absolute Trajectory Error (ATE) by 80%. These establish InstantSplat as a viable solution for scenarios involving posefree and sparse-view conditions. Project page: instantsplat.github.io.

Via

Access Paper or Ask Questions

Invertible Diffusion Models for Compressed Sensing

Mar 25, 2024

Bin Chen, Zhenyu Zhang, Weiqi Li, Chen Zhao, Jiwen Yu, Shijie Zhao, Jie Chen, Jian Zhang

Abstract:While deep neural networks (NN) significantly advance image compressed sensing (CS) by improving reconstruction quality, the necessity of training current CS NNs from scratch constrains their effectiveness and hampers rapid deployment. Although recent methods utilize pre-trained diffusion models for image reconstruction, they struggle with slow inference and restricted adaptability to CS. To tackle these challenges, this paper proposes Invertible Diffusion Models (IDM), a novel efficient, end-to-end diffusion-based CS method. IDM repurposes a large-scale diffusion sampling process as a reconstruction model, and finetunes it end-to-end to recover original images directly from CS measurements, moving beyond the traditional paradigm of one-step noise estimation learning. To enable such memory-intensive end-to-end finetuning, we propose a novel two-level invertible design to transform both (1) the multi-step sampling process and (2) the noise estimation U-Net in each step into invertible networks. As a result, most intermediate features are cleared during training to reduce up to 93.8% GPU memory. In addition, we develop a set of lightweight modules to inject measurements into noise estimator to further facilitate reconstruction. Experiments demonstrate that IDM outperforms existing state-of-the-art CS networks by up to 2.64dB in PSNR. Compared to the recent diffusion model-based approach DDNM, our IDM achieves up to 10.09dB PSNR gain and 14.54 times faster inference.

Via

Access Paper or Ask Questions

Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation

Mar 20, 2024

Hugues Thomas, Jian Zhang

Figure 1 for Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation

Figure 2 for Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation

Figure 3 for Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation

Figure 4 for Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation

Abstract:This paper presents the Embedding Pose Graph (EPG), an innovative method that combines the strengths of foundation models with a simple 3D representation suitable for robotics applications. Addressing the need for efficient spatial understanding in robotics, EPG provides a compact yet powerful approach by attaching foundation model features to the nodes of a pose graph. Unlike traditional methods that rely on bulky data formats like voxel grids or point clouds, EPG is lightweight and scalable. It facilitates a range of robotic tasks, including open-vocabulary querying, disambiguation, image-based querying, language-directed navigation, and re-localization in 3D environments. We showcase the effectiveness of EPG in handling these tasks, demonstrating its capacity to improve how robots interact with and navigate through complex spaces. Through both qualitative and quantitative assessments, we illustrate EPG's strong performance and its ability to outperform existing methods in re-localization. Our work introduces a crucial step forward in enabling robots to efficiently understand and operate within large-scale 3D spaces.

Via

Access Paper or Ask Questions

BadEdit: Backdooring large language models by model editing

Mar 20, 2024

Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, Yang Liu

Figure 1 for BadEdit: Backdooring large language models by model editing

Figure 2 for BadEdit: Backdooring large language models by model editing

Figure 3 for BadEdit: Backdooring large language models by model editing

Figure 4 for BadEdit: Backdooring large language models by model editing

Abstract:Mainstream backdoor attack methods typically demand substantial tuning data for poisoning, limiting their practicality and potentially degrading the overall performance when applied to Large Language Models (LLMs). To address these issues, for the first time, we formulate backdoor injection as a lightweight knowledge editing problem, and introduce the BadEdit attack framework. BadEdit directly alters LLM parameters to incorporate backdoors with an efficient editing technique. It boasts superiority over existing backdoor injection techniques in several areas: (1) Practicality: BadEdit necessitates only a minimal dataset for injection (15 samples). (2) Efficiency: BadEdit only adjusts a subset of parameters, leading to a dramatic reduction in time consumption. (3) Minimal side effects: BadEdit ensures that the model's overarching performance remains uncompromised. (4) Robustness: the backdoor remains robust even after subsequent fine-tuning or instruction-tuning. Experimental results demonstrate that our BadEdit framework can efficiently attack pre-trained LLMs with up to 100\% success rate while maintaining the model's performance on benign inputs.

* ICLR 2024

Via

Access Paper or Ask Questions