Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lingjie Liu

TLControl: Trajectory and Language Control for Human Motion Synthesis

Nov 30, 2023

Weilin Wan, Zhiyang Dou, Taku Komura, Wenping Wang, Dinesh Jayaraman, Lingjie Liu

Abstract:Controllable human motion synthesis is essential for applications in AR/VR, gaming, movies, and embodied AI. Existing methods often focus solely on either language or full trajectory control, lacking precision in synthesizing motions aligned with user-specified trajectories, especially for multi-joint control. To address these issues, we present TLControl, a new method for realistic human motion synthesis, incorporating both low-level trajectory and high-level language semantics controls. Specifically, we first train a VQ-VAE to learn a compact latent motion space organized by body parts. We then propose a Masked Trajectories Transformer to make coarse initial predictions of full trajectories of joints based on the learned latent motion space, with user-specified partial trajectories and text descriptions as conditioning. Finally, we introduce an efficient test-time optimization to refine these coarse predictions for accurate trajectory control. Experiments demonstrate that TLControl outperforms the state-of-the-art in trajectory accuracy and time efficiency, making it practical for interactive and high-quality animation generation.

Via

Access Paper or Ask Questions

GART: Gaussian Articulated Template Models

Nov 27, 2023

Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, Kostas Daniilidis

Abstract:We introduce Gaussian Articulated Template Model GART, an explicit, efficient, and expressive representation for non-rigid articulated subject capturing and rendering from monocular videos. GART utilizes a mixture of moving 3D Gaussians to explicitly approximate a deformable subject's geometry and appearance. It takes advantage of a categorical template model prior (SMPL, SMAL, etc.) with learnable forward skinning while further generalizing to more complex non-rigid deformations with novel latent bones. GART can be reconstructed via differentiable rendering from monocular videos in seconds or minutes and rendered in novel poses faster than 150fps.

* 13 pages, code available at https://www.cis.upenn.edu/~leijh/projects/gart/

Via

Access Paper or Ask Questions

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Oct 24, 2023

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt(+1 more)

Figure 1 for Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Figure 2 for Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Figure 3 for Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Figure 4 for Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Abstract:In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works directly produce 3D information via fast network inferences, but their results are often of low quality and lack geometric details. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images. To ensure consistency, we employ a multi-view cross-domain attention mechanism that facilitates information exchange across views and modalities. Lastly, we introduce a geometry-aware normal fusion algorithm that extracts high-quality surfaces from the multi-view 2D representations. Our extensive evaluations demonstrate that our method achieves high-quality reconstruction results, robust generalization, and reasonably good efficiency compared to prior works.

* Project page: https://www.xxlong.site/Wonder3D/

Via

Access Paper or Ask Questions

DELIFFAS: Deformable Light Fields for Fast Avatar Synthesis

Oct 17, 2023

Youngjoong Kwon, Lingjie Liu, Henry Fuchs, Marc Habermann, Christian Theobalt

Abstract:Generating controllable and photorealistic digital human avatars is a long-standing and important problem in Vision and Graphics. Recent methods have shown great progress in terms of either photorealism or inference speed while the combination of the two desired properties still remains unsolved. To this end, we propose a novel method, called DELIFFAS, which parameterizes the appearance of the human as a surface light field that is attached to a controllable and deforming human mesh model. At the core, we represent the light field around the human with a deformable two-surface parameterization, which enables fast and accurate inference of the human appearance. This allows perceptual supervision on the full image compared to previous approaches that could only supervise individual pixels or small patches due to their slow runtime. Our carefully designed human representation and supervision strategy leads to state-of-the-art synthesis results and inference time. The video results and code are available at https://vcai.mpi-inf.mpg.de/projects/DELIFFAS.

Via

Access Paper or Ask Questions

State of the Art on Diffusion Models for Visual Computing

Oct 11, 2023

Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa(+8 more)

Abstract:The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike.

Via

Access Paper or Ask Questions

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Sep 07, 2023

Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, Wenping Wang

Figure 1 for SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Figure 2 for SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Figure 3 for SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Figure 4 for SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Abstract:In this paper, we present a novel diffusion model called that generates multiview-consistent images from a single-view image. Using pretrained large-scale 2D diffusion models, recent work Zero123 demonstrates the ability to generate plausible novel views from a single-view image of an object. However, maintaining consistency in geometry and colors for the generated images remains a challenge. To address this issue, we propose a synchronized multiview diffusion model that models the joint probability distribution of multiview images, enabling the generation of multiview-consistent images in a single reverse process. SyncDreamer synchronizes the intermediate states of all the generated images at every step of the reverse process through a 3D-aware feature attention mechanism that correlates the corresponding features across different views. Experiments show that SyncDreamer generates images with high consistency across different views, thus making it well-suited for various 3D generation tasks such as novel-view-synthesis, text-to-3D, and image-to-3D.

* Project page: https://liuyuan-pal.github.io/SyncDreamer/

Via

Access Paper or Ask Questions

DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

Jun 29, 2023

Jingyu Zhuang, Chen Wang, Lingjie Liu, Liang Lin, Guanbin Li

Figure 1 for DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

Figure 2 for DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

Figure 3 for DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

Figure 4 for DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

Abstract:Neural fields have achieved impressive advancements in view synthesis and scene reconstruction. However, editing these neural fields remains challenging due to the implicit encoding of geometry and texture information. In this paper, we propose DreamEditor, a novel framework that enables users to perform controlled editing of neural fields using text prompts. By representing scenes as mesh-based neural fields, DreamEditor allows localized editing within specific regions. DreamEditor utilizes the text encoder of a pretrained text-to-Image diffusion model to automatically identify the regions to be edited based on the semantics of the text prompts. Subsequently, DreamEditor optimizes the editing region and aligns its geometry and texture with the text prompts through score distillation sampling [29]. Extensive experiments have demonstrated that DreamEditor can accurately edit neural fields of real-world scenes according to the given text prompts while ensuring consistency in irrelevant areas. DreamEditor generates highly realistic textures and geometry, significantly surpassing previous works in both quantitative and qualitative evaluations.

Via

Access Paper or Ask Questions

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

Jun 08, 2023

Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind

Abstract:Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few without significant quality degradation. However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. The core idea is to learn a time-conditioned model that predicts the output of a pre-trained diffusion model teacher given any time step. Such a model can be efficiently trained based on bootstrapping from two consecutive sampled steps. Furthermore, our method can be easily adapted to large-scale text-to-image diffusion models, which are challenging for conventional methods given the fact that the training sets are often large and difficult to access. We demonstrate the effectiveness of our approach on several benchmark datasets in the DDIM setting, achieving comparable generation quality while being orders of magnitude faster than the diffusion teacher. The text-to-image results show that the proposed approach is able to handle highly complex distributions, shedding light on more efficient generative modeling.

* In progress

Via

Access Paper or Ask Questions

NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images

May 27, 2023

Yuan Liu, Peng Wang, Cheng Lin, Xiaoxiao Long, Jiepeng Wang, Lingjie Liu, Taku Komura, Wenping Wang

Figure 1 for NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images

Figure 2 for NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images

Figure 3 for NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images

Figure 4 for NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images

Abstract:We present a neural rendering-based method called NeRO for reconstructing the geometry and the BRDF of reflective objects from multiview images captured in an unknown environment. Multiview reconstruction of reflective objects is extremely challenging because specular reflections are view-dependent and thus violate the multiview consistency, which is the cornerstone for most multiview reconstruction methods. Recent neural rendering techniques can model the interaction between environment lights and the object surfaces to fit the view-dependent reflections, thus making it possible to reconstruct reflective objects from multiview images. However, accurately modeling environment lights in the neural rendering is intractable, especially when the geometry is unknown. Most existing neural rendering methods, which can model environment lights, only consider direct lights and rely on object masks to reconstruct objects with weak specular reflections. Therefore, these methods fail to reconstruct reflective objects, especially when the object mask is not available and the object is illuminated by indirect lights. We propose a two-step approach to tackle this problem. First, by applying the split-sum approximation and the integrated directional encoding to approximate the shading effects of both direct and indirect lights, we are able to accurately reconstruct the geometry of reflective objects without any object masks. Then, with the object geometry fixed, we use more accurate sampling to recover the environment lights and the BRDF of the object. Extensive experiments demonstrate that our method is capable of accurately reconstructing the geometry and the BRDF of reflective objects from only posed RGB images without knowing the environment lights and the object masks. Codes and datasets are available at https://github.com/liuyuan-pal/NeRO.

* Accepted to SIGGRAPH 2023. Project page: https://liuyuan-pal.github.io/NeRO/ Codes: https://github.com/liuyuan-pal/NeRO

Via

Access Paper or Ask Questions

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

May 18, 2023

Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, Christian Theobalt

Figure 1 for Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Figure 2 for Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Figure 3 for Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Figure 4 for Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Abstract:Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this work, we study a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner, as shown in Fig.1. To achieve this, we propose DragGAN, which consists of two main components: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative generator features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object's rigidity. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking. We also showcase the manipulation of real images through GAN inversion.

* Accepted to SIGGRAPH 2023. Project page: https://vcai.mpi-inf.mpg.de/projects/DragGAN/

Via

Access Paper or Ask Questions