Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingming He

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset

Mar 18, 2025

Yiqun Mei, Mingming He, Li Ma, Julien Philip, Wenqi Xian, David M George, Xueming Yu, Gabriel Dedic, Ahmet Levent Taşel, Ning Yu(+2 more)

Abstract:Video portrait relighting remains challenging because the results need to be both photorealistic and temporally stable. This typically requires a strong model design that can capture complex facial reflections as well as intensive training on a high-quality paired video dataset, such as dynamic one-light-at-a-time (OLAT). In this work, we introduce Lux Post Facto, a novel portrait video relighting method that produces both photorealistic and temporally consistent lighting effects. From the model side, we design a new conditional video diffusion model built upon state-of-the-art pre-trained video diffusion model, alongside a new lighting injection mechanism to enable precise control. This way we leverage strong spatial and temporal generative capability to generate plausible solutions to the ill-posed relighting problem. Our technique uses a hybrid dataset consisting of static expression OLAT data and in-the-wild portrait performance videos to jointly learn relighting and temporal modeling. This avoids the need to acquire paired video data in different lighting conditions. Our extensive experiments show that our model produces state-of-the-art results both in terms of photorealism and temporal consistency.

* CVPR 2025

Via

Access Paper or Ask Questions

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Jan 16, 2025

Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi(+3 more)

Figure 1 for Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Figure 2 for Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Figure 3 for Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Figure 4 for Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Abstract:Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow. Source code and model checkpoints are available on GitHub: https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow.

Via

Access Paper or Ask Questions

Fitting Spherical Gaussians to Dynamic HDRI Sequences

Dec 09, 2024

Pascal Clausen, Li Ma, Mingming He, Ahmet Levent Tasel, Oliver Pilarski, Paul Debevec

Abstract:We present a technique for fitting high dynamic range illumination (HDRI) sequences using anisotropic spherical Gaussians (ASGs) while preserving temporal consistency in the compressed HDRI maps. Our approach begins with an optimization network that iteratively minimizes a composite loss function, which includes both reconstruction and diffuse losses. This allows us to represent all-frequency signals with a small number of ASGs, optimizing their directions, sharpness, and intensity simultaneously for an individual HDRI. To extend this optimization into the temporal domain, we introduce a temporal consistency loss, ensuring a consistent approximation across the entire HDRI sequence.

* 3 pages, 4 figures, SIGGRAPH Asia 2024 poster, https://www.eyelinestudios.com/research/hdri_sg_fit.html

Via

Access Paper or Ask Questions

DifFRelight: Diffusion-Based Facial Performance Relighting

Oct 10, 2024

Mingming He, Pascal Clausen, Ahmet Levent Taşel, Li Ma, Oliver Pilarski, Wenqi Xian, Laszlo Rikker, Xueming Yu, Ryan Burgert, Ning Yu(+1 more)

Figure 1 for DifFRelight: Diffusion-Based Facial Performance Relighting

Figure 2 for DifFRelight: Diffusion-Based Facial Performance Relighting

Figure 3 for DifFRelight: Diffusion-Based Facial Performance Relighting

Figure 4 for DifFRelight: Diffusion-Based Facial Performance Relighting

Abstract:We present a novel framework for free-viewpoint facial performance relighting using diffusion-based image-to-image translation. Leveraging a subject-specific dataset containing diverse facial expressions captured under various lighting conditions, including flat-lit and one-light-at-a-time (OLAT) scenarios, we train a diffusion model for precise lighting control, enabling high-fidelity relit facial images from flat-lit inputs. Our framework includes spatially-aligned conditioning of flat-lit captures and random noise, along with integrated lighting information for global control, utilizing prior knowledge from the pre-trained Stable Diffusion model. This model is then applied to dynamic facial performances captured in a consistent flat-lit environment and reconstructed for novel-view synthesis using a scalable dynamic 3D Gaussian Splatting method to maintain quality and consistency in the relit results. In addition, we introduce unified lighting control by integrating a novel area lighting representation with directional lighting, allowing for joint adjustments in light size and direction. We also enable high dynamic range imaging (HDRI) composition using multiple directional lights to produce dynamic sequences under complex lighting conditions. Our evaluations demonstrate the models efficiency in achieving precise lighting control and generalizing across various facial expressions while preserving detailed features such as skintexture andhair. The model accurately reproduces complex lighting effects like eye reflections, subsurface scattering, self-shadowing, and translucency, advancing photorealism within our framework.

* 18 pages, SIGGRAPH Asia 2024 Conference Papers (SA Conference Papers '24), December 3--6, 2024, Tokyo, Japan. Project page: https://www.eyelinestudios.com/research/diffrelight.html

Via

Access Paper or Ask Questions

Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM

Jul 31, 2024

Can Wang, Hongliang Zhong, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

Abstract:Automatic furniture layout is long desired for convenient interior design. Leveraging the remarkable visual reasoning capabilities of multimodal large language models (MLLMs), recent methods address layout generation in a static manner, lacking the feedback-driven refinement essential for interactive user engagement. We introduce Chat2Layout, a novel interactive furniture layout generation system that extends the functionality of MLLMs into the realm of interactive layout design. To achieve this, we establish a unified vision-question paradigm for in-context learning, enabling seamless communication with MLLMs to steer their behavior without altering model weights. Within this framework, we present a novel training-free visual prompting mechanism. This involves a visual-text prompting technique that assist MLLMs in reasoning about plausible layout plans, followed by an Offline-to-Online search (O2O-Search) method, which automatically identifies the minimal set of informative references to provide exemplars for visual-text prompting. By employing an agent system with MLLMs as the core controller, we enable bidirectional interaction. The agent not only comprehends the 3D environment and user requirements through linguistic and visual perception but also plans tasks and reasons about actions to generate and arrange furniture within the virtual space. Furthermore, the agent iteratively updates based on visual feedback from execution results. Experimental results demonstrate that our approach facilitates language-interactive generation and arrangement for diverse and complex 3D furniture.

* Main paper with supplemental materials

Via

Access Paper or Ask Questions

Mesh-Guided Neural Implicit Field Editing

Dec 04, 2023

Can Wang, Mingming He, Menglei Chai, Dongdong Chen, Jing Liao

Figure 1 for Mesh-Guided Neural Implicit Field Editing

Figure 2 for Mesh-Guided Neural Implicit Field Editing

Figure 3 for Mesh-Guided Neural Implicit Field Editing

Figure 4 for Mesh-Guided Neural Implicit Field Editing

Abstract:Neural implicit fields have emerged as a powerful 3D representation for reconstructing and rendering photo-realistic views, yet they possess limited editability. Conversely, explicit 3D representations, such as polygonal meshes, offer ease of editing but may not be as suitable for rendering high-quality novel views. To harness the strengths of both representations, we propose a new approach that employs a mesh as a guiding mechanism in editing the neural radiance field. We first introduce a differentiable method using marching tetrahedra for polygonal mesh extraction from the neural implicit field and then design a differentiable color extractor to assign colors obtained from the volume renderings to this extracted mesh. This differentiable colored mesh allows gradient back-propagation from the explicit mesh to the implicit fields, empowering users to easily manipulate the geometry and color of neural implicit fields. To enhance user control from coarse-grained to fine-grained levels, we introduce an octree-based structure into its optimization. This structure prioritizes the edited regions and the surface part, making our method achieve fine-grained edits to the neural implicit field and accommodate various user modifications, including object additions, component removals, specific area deformations, and adjustments to local and global colors. Through extensive experiments involving diverse scenes and editing operations, we have demonstrated the capabilities and effectiveness of our method. Our project page is: \url{https://cassiepython.github.io/MNeuEdit/}

* Project page: https://cassiepython.github.io/MNeuEdit/

Via

Access Paper or Ask Questions

AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

Mar 30, 2023

Ruixiang Jiang, Can Wang, Jingbo Zhang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

Abstract:Neural implicit fields are powerful for representing 3D scenes and generating high-quality novel views, but it remains challenging to use such implicit representations for creating a 3D human avatar with a specific identity and artistic style that can be easily animated. Our proposed method, AvatarCraft, addresses this challenge by using diffusion models to guide the learning of geometry and texture for a neural avatar based on a single text prompt. We carefully design the optimization framework of neural implicit fields, including a coarse-to-fine multi-bounding box training strategy, shape regularization, and diffusion-based constraints, to produce high-quality geometry and texture. Additionally, we make the human avatar animatable by deforming the neural implicit field with an explicit warping field that maps the target human mesh to a template human mesh, both represented using parametric human models. This simplifies animation and reshaping of the generated avatar by controlling pose and shape parameters. Extensive experiments on various text descriptions show that AvatarCraft is effective and robust in creating human avatars and rendering novel views, poses, and shapes. Our project page is: \url{https://avatar-craft.github.io/}.

* Project page is: https://avatar-craft.github.io/

Via

Access Paper or Ask Questions

NeRF-Art: Text-Driven Neural Radiance Fields Stylization

Dec 15, 2022

Can Wang, Ruixiang Jiang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

Abstract:As a powerful representation of 3D scenes, the neural radiance field (NeRF) enables high-quality novel view synthesis from multi-view images. Stylizing NeRF, however, remains challenging, especially on simulating a text-guided style with both the appearance and the geometry altered simultaneously. In this paper, we present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a simple text prompt. Unlike previous approaches that either lack sufficient geometry deformations and texture details or require meshes to guide the stylization, our method can shift a 3D scene to the target style characterized by desired geometry and appearance variations without any mesh guidance. This is achieved by introducing a novel global-local contrastive learning strategy, combined with the directional constraint to simultaneously control both the trajectory and the strength of the target style. Moreover, we adopt a weight regularization method to effectively suppress cloudy artifacts and geometry noises which arise easily when the density field is transformed during geometry stylization. Through extensive experiments on various styles, we demonstrate that our method is effective and robust regarding both single-view stylization quality and cross-view consistency. The code and more results can be found in our project page: https://cassiepython.github.io/nerfart/.

* Project page: https://cassiepython.github.io/nerfart/

Via

Access Paper or Ask Questions

Water Simulation and Rendering from a Still Photograph

Oct 05, 2022

Ryusuke Sugimoto, Mingming He, Jing Liao, Pedro V. Sander

Figure 1 for Water Simulation and Rendering from a Still Photograph

Figure 2 for Water Simulation and Rendering from a Still Photograph

Figure 3 for Water Simulation and Rendering from a Still Photograph

Figure 4 for Water Simulation and Rendering from a Still Photograph

Abstract:We propose an approach to simulate and render realistic water animation from a single still input photograph. We first segment the water surface, estimate rendering parameters, and compute water reflection textures with a combination of neural networks and traditional optimization techniques. Then we propose an image-based screen space local reflection model to render the water surface overlaid on the input image and generate real-time water animation. Our approach creates realistic results with no user intervention for a wide variety of natural scenes containing large bodies of water with different lighting and water surface conditions. Since our method provides a 3D representation of the water surface, it naturally enables direct editing of water parameters and also supports interactive applications like adding synthetic objects to the scene.

* Accepted for publication at ACM SIGGRAPH Asia (Conference Papers). Videos, demos and updates will be on the project website: https://rsugimoto.net/WaterAnimationProject/

Via

Access Paper or Ask Questions

DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor Points

Dec 13, 2021

Zhengfei Kuang, Jiaman Li, Mingming He, Tong Wang, Yajie Zhao

Figure 1 for DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor Points

Figure 2 for DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor Points

Figure 3 for DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor Points

Figure 4 for DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor Points

Abstract:Establishing dense correspondence between two images is a fundamental computer vision problem, which is typically tackled by matching local feature descriptors. However, without global awareness, such local features are often insufficient for disambiguating similar regions. And computing the pairwise feature correlation across images is both computation-expensive and memory-intensive. To make the local features aware of the global context and improve their matching accuracy, we introduce DenseGAP, a new solution for efficient Dense correspondence learning with a Graph-structured neural network conditioned on Anchor Points. Specifically, we first propose a graph structure that utilizes anchor points to provide sparse but reliable prior on inter- and intra-image context and propagates them to all image points via directed edges. We also design a graph-structured network to broadcast multi-level contexts via light-weighted message-passing layers and generate high-resolution feature maps at low memory cost. Finally, based on the predicted feature maps, we introduce a coarse-to-fine framework for accurate correspondence prediction using cycle consistency. Our feature descriptors capture both local and global information, thus enabling a continuous feature field for querying arbitrary points at high resolution. Through comprehensive ablative experiments and evaluations on large-scale indoor and outdoor datasets, we demonstrate that our method advances the state-of-the-art of correspondence learning on most benchmarks.

Via

Access Paper or Ask Questions