Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Changqing Zou

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

Nov 27, 2024

Linwei Dong, Qingnan Fan, Yihong Guo, Zhonghao Wang, Qi Zhang, Jinwei Chen, Yawei Luo, Changqing Zou

Figure 1 for TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

Figure 2 for TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

Figure 3 for TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

Figure 4 for TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

Abstract:Pre-trained text-to-image diffusion models are increasingly applied to real-world image super-resolution (Real-ISR) task. Given the iterative refinement nature of diffusion models, most existing approaches are computationally expensive. While methods such as SinSR and OSEDiff have emerged to condense inference steps via distillation, their performance in image restoration or details recovery is not satisfied. To address this, we propose TSD-SR, a novel distillation framework specifically designed for real-world image super-resolution, aiming to construct an efficient and effective one-step model. We first introduce the Target Score Distillation, which leverages the priors of diffusion models and real image references to achieve more realistic image restoration. Secondly, we propose a Distribution-Aware Sampling Module to make detail-oriented gradients more readily accessible, addressing the challenge of recovering fine details. Extensive experiments demonstrate that our TSD-SR has superior restoration results (most of the metrics perform the best) and the fastest inference speed (e.g. 40 times faster than SeeSR) compared to the past Real-ISR approaches based on pre-trained diffusion priors.

Via

Access Paper or Ask Questions

ONION: Physics-Informed Deep Learning Model for Line Integral Diagnostics Across Fusion Devices

Nov 27, 2024

Cong Wang, Weizhe Yang, Haiping Wang, Renjie Yang, Jing Li, Zhijun Wang, Xinyao Yu, Yixiong Wei, Xianli Huang, Zhaoyang Liu(+2 more)

Abstract:This paper introduces a Physics-Informed model architecture that can be adapted to various backbone networks. The model incorporates physical information as additional input and is constrained by a Physics-Informed loss function. Experimental results demonstrate that the additional input of physical information substantially improve the model's ability with a increase in performance observed. Besides, the adoption of the Softplus activation function in the final two fully connected layers significantly enhances model performance. The incorporation of a Physics-Informed loss function has been shown to correct the model's predictions, bringing the back-projections closer to the actual inputs and reducing the errors associated with inversion algorithms. In this work, we have developed a Phantom Data Model to generate customized line integral diagnostic datasets and have also collected SXR diagnostic datasets from EAST and HL-2A. The code, models, and some datasets are publicly available at https://github.com/calledice/onion.

Via

Access Paper or Ask Questions

MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension

Nov 26, 2024

Zeyu Ling, Bo Han, Shiyang Li, Hongdeng Shen, Jikang Cheng, Changqing Zou

Figure 1 for MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension

Figure 2 for MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension

Figure 3 for MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension

Figure 4 for MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension

Abstract:This paper introduces MotionLLaMA, a unified framework for motion synthesis and comprehension, along with a novel full-body motion tokenizer called the HoMi Tokenizer. MotionLLaMA is developed based on three core principles. First, it establishes a powerful unified representation space through the HoMi Tokenizer. Using a single codebook, the HoMi Tokenizer in MotionLLaMA achieves reconstruction accuracy comparable to residual vector quantization tokenizers utilizing six codebooks, outperforming all existing single-codebook tokenizers. Second, MotionLLaMA integrates a large language model to tackle various motion-related tasks. This integration bridges various modalities, facilitating both comprehensive and intricate motion synthesis and comprehension. Third, MotionLLaMA introduces the MotionHub dataset, currently the most extensive multimodal, multitask motion dataset, which enables fine-tuning of large language models. Extensive experimental results demonstrate that MotionLLaMA not only covers the widest range of motion-related tasks but also achieves state-of-the-art (SOTA) performance in motion completion, interaction dual-person text-to-motion, and all comprehension tasks while reaching performance comparable to SOTA in the remaining tasks. The code and MotionHub dataset are publicly available.

Via

Access Paper or Ask Questions

SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers

Jul 08, 2024

Mingrui Zhao, Yizhi Wang, Fenggen Yu, Changqing Zou, Ali Mahdavi-Amiri

Figure 1 for SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers

Figure 2 for SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers

Figure 3 for SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers

Figure 4 for SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers

Abstract:Shape abstraction is an important task for simplifying complex geometric structures while retaining essential features. Sweep surfaces, commonly found in human-made objects, aid in this process by effectively capturing and representing object geometry, thereby facilitating abstraction. In this paper, we introduce \papername, a novel approach to shape abstraction through sweep surfaces. We propose an effective parameterization for sweep surfaces, utilizing superellipses for profile representation and B-spline curves for the axis. This compact representation, requiring as few as 14 float numbers, facilitates intuitive and interactive editing while preserving shape details effectively. Additionally, by introducing a differentiable neural sweeper and an encoder-decoder architecture, we demonstrate the ability to predict sweep surface representations without supervision. We show the superiority of our model through several quantitative and qualitative experiments throughout the paper. Our code is available at https://mingrui-zhao.github.io/SweepNet/

* 14 pages,20 figures, ECCV 2024

Via

Access Paper or Ask Questions

Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering

May 24, 2024

Yibo Zhang, Lihong Wang, Changqing Zou, Tieru Wu, Rui Ma

Abstract:3D sketches are widely used for visually representing the 3D shape and structure of objects or scenes. However, the creation of 3D sketch often requires users to possess professional artistic skills. Existing research efforts primarily focus on enhancing the ability of interactive sketch generation in 3D virtual systems. In this work, we propose Diff3DS, a novel differentiable rendering framework for generating view-consistent 3D sketch by optimizing 3D parametric curves under various supervisions. Specifically, we perform perspective projection to render the 3D rational B\'ezier curves into 2D curves, which are subsequently converted to a 2D raster image via our customized differentiable rasterizer. Our framework bridges the domains of 3D sketch and raster image, achieving end-toend optimization of 3D sketch through gradients computed in the 2D image domain. Our Diff3DS can enable a series of novel 3D sketch generation tasks, including textto-3D sketch and image-to-3D sketch, supported by the popular distillation-based supervision, such as Score Distillation Sampling (SDS). Extensive experiments have yielded promising results and demonstrated the potential of our framework.

* Project: https://yiboz2001.github.io/Diff3DS/

Via

Access Paper or Ask Questions

Learning Human Motion from Monocular Videos via Cross-Modal Manifold Alignment

Apr 15, 2024

Shuaiying Hou, Hongyu Tao, Junheng Fang, Changqing Zou, Hujun Bao, Weiwei Xu

Abstract:Learning 3D human motion from 2D inputs is a fundamental task in the realms of computer vision and computer graphics. Many previous methods grapple with this inherently ambiguous task by introducing motion priors into the learning process. However, these approaches face difficulties in defining the complete configurations of such priors or training a robust model. In this paper, we present the Video-to-Motion Generator (VTM), which leverages motion priors through cross-modal latent feature space alignment between 3D human motion and 2D inputs, namely videos and 2D keypoints. To reduce the complexity of modeling motion priors, we model the motion data separately for the upper and lower body parts. Additionally, we align the motion data with a scale-invariant virtual skeleton to mitigate the interference of human skeleton variations to the motion priors. Evaluated on AIST++, the VTM showcases state-of-the-art performance in reconstructing 3D human motion from monocular videos. Notably, our VTM exhibits the capabilities for generalization to unseen view angles and in-the-wild videos.

Via

Access Paper or Ask Questions

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

Mar 14, 2024

Frank Zhang, Yibo Zhang, Quan Zheng, Rui Ma, Wei Hua, Hujun Bao, Weiwei Xu, Changqing Zou

Abstract:Text-driven 3D scene generation techniques have made rapid progress in recent years. Their success is mainly attributed to using existing generative models to iteratively perform image warping and inpainting to generate 3D scenes. However, these methods heavily rely on the outputs of existing models, leading to error accumulation in geometry and appearance that prevent the models from being used in various scenarios (e.g., outdoor and unreal scenarios). To address this limitation, we generatively refine the newly generated local views by querying and aggregating global 3D information, and then progressively generate the 3D scene. Specifically, we employ a tri-plane features-based NeRF as a unified representation of the 3D scene to constrain global 3D consistency, and propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior from 2D diffusion model as well as the global 3D information of the current scene. Our extensive experiments demonstrate that, in comparison to previous methods, our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

A General Implicit Framework for Fast NeRF Composition and Rendering

Aug 14, 2023

Xinyu Gao, Ziyi Yang, Yunlu Zhao, Yuxiang Sun, Xiaogang Jin, Changqing Zou

Abstract:A variety of Neural Radiance Fields (NeRF) methods have recently achieved remarkable success in high render speed. However, current accelerating methods are specialized and incompatible with various implicit methods, preventing real-time composition over various types of NeRF works. Because NeRF relies on sampling along rays, it is possible to provide general guidance for acceleration. To that end, we propose a general implicit pipeline for composing NeRF objects quickly. Our method enables the casting of dynamic shadows within or between objects using analytical light sources while allowing multiple NeRF objects to be seamlessly placed and rendered together with any arbitrary rigid transformations. Mainly, our work introduces a new surface representation known as Neural Depth Fields (NeDF) that quickly determines the spatial relationship between objects by allowing direct intersection computation between rays and implicit surfaces. It leverages an intersection neural network to query NeRF for acceleration instead of depending on an explicit spatial structure.Our proposed method is the first to enable both the progressive and interactive composition of NeRF objects. Additionally, it also serves as a previewing plugin for a range of existing NeRF works.

* 7 pages for main content

Via

Access Paper or Ask Questions

CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer

Mar 31, 2023

Linfeng Wen, Chengying Gao, Changqing Zou

Abstract:Content affinity loss including feature and pixel affinity is a main problem which leads to artifacts in photorealistic and video style transfer. This paper proposes a new framework named CAP-VSTNet, which consists of a new reversible residual network and an unbiased linear transform module, for versatile style transfer. This reversible residual network can not only preserve content affinity but not introduce redundant information as traditional reversible networks, and hence facilitate better stylization. Empowered by Matting Laplacian training loss which can address the pixel affinity loss problem led by the linear transform, the proposed framework is applicable and effective on versatile style transfer. Extensive experiments show that CAP-VSTNet can produce better qualitative and quantitative results in comparison with the state-of-the-art methods.

* CVPR 2023

Via

Access Paper or Ask Questions

MXM-CLR: A Unified Framework for Contrastive Learning of Multifold Cross-Modal Representations

Mar 21, 2023

Ye Wang, Bowei Jiang, Changqing Zou, Rui Ma

Figure 1 for MXM-CLR: A Unified Framework for Contrastive Learning of Multifold Cross-Modal Representations

Figure 2 for MXM-CLR: A Unified Framework for Contrastive Learning of Multifold Cross-Modal Representations

Figure 3 for MXM-CLR: A Unified Framework for Contrastive Learning of Multifold Cross-Modal Representations

Figure 4 for MXM-CLR: A Unified Framework for Contrastive Learning of Multifold Cross-Modal Representations

Abstract:Multifold observations are common for different data modalities, e.g., a 3D shape can be represented by multi-view images and an image can be described with different captions. Existing cross-modal contrastive representation learning (XM-CLR) methods such as CLIP are not fully suitable for multifold data as they only consider one positive pair and treat other pairs as negative when computing the contrastive loss. In this paper, we propose MXM-CLR, a unified framework for contrastive learning of multifold cross-modal representations. MXM-CLR explicitly models and learns the relationships between multifold observations of instances from different modalities for more comprehensive representation learning. The key of MXM-CLR is a novel multifold-aware hybrid loss which considers multiple positive observations when computing the hard and soft relationships for the cross-modal data pairs. We conduct quantitative and qualitative comparisons with SOTA baselines for cross-modal retrieval tasks on the Text2Shape and Flickr30K datasets. We also perform extensive evaluations on the adaptability and generalizability of MXM-CLR, as well as ablation studies on the loss design and effects of batch sizes. The results show the superiority of MXM-CLR in learning better representations for the multifold data. The code is available at https://github.com/JLU-ICL/MXM-CLR.

* 16 pages, 14 figures

Via

Access Paper or Ask Questions