Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chen Change Loy

GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator

Dec 23, 2024

Yidi Shao, Mu Huang, Chen Change Loy, Bo Dai

Figure 1 for GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator

Figure 2 for GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator

Figure 3 for GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator

Figure 4 for GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator

Abstract:In this work, we introduce GauSim, a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels. Unlike traditional methods that treat kernels as particles within particle-based simulations, we leverage continuum mechanics, modeling each kernel as a continuous piece of matter to account for realistic deformations without idealized assumptions. To improve computational efficiency and fidelity, we employ a hierarchical structure that organizes kernels into Center of Mass Systems (CMS) with explicit formulations, enabling a coarse-to-fine simulation approach. This structure significantly reduces computational overhead while preserving detailed dynamics. In addition, GauSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations. To validate our approach, we present a new dataset, READY, containing multi-view videos of real-world elastic deformations. Experimental results demonstrate that GauSim achieves superior performance compared to existing physics-driven baselines, offering a practical and accurate solution for simulating complex dynamic behaviors. Code and model will be released. Project page: https://www.mmlab-ntu.com/project/gausim/index.html .

* Project page: https://www.mmlab-ntu.com/project/gausim/index.html

Via

Access Paper or Ask Questions

Arbitrary-steps Image Super-resolution via Diffusion Inversion

Dec 12, 2024

Zongsheng Yue, Kang Liao, Chen Change Loy

Figure 1 for Arbitrary-steps Image Super-resolution via Diffusion Inversion

Figure 2 for Arbitrary-steps Image Super-resolution via Diffusion Inversion

Figure 3 for Arbitrary-steps Image Super-resolution via Diffusion Inversion

Figure 4 for Arbitrary-steps Image Super-resolution via Diffusion Inversion

Abstract:This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance. We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point. Central to our approach is a deep noise predictor to estimate the optimal noise maps for the forward diffusion process. Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result. Compared to existing approaches, our method offers a flexible and efficient sampling mechanism that supports an arbitrary number of sampling steps, ranging from one to five. Even with a single sampling step, our method demonstrates superior or comparable performance to recent state-of-the-art approaches. The code and model are publicly available at https://github.com/zsyOAOA/InvSR.

* 16 pages, 9 figures. Project: https://github.com/zsyOAOA/InvSR

Via

Access Paper or Ask Questions

ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Dec 10, 2024

Zhouxia Wang, Yushi Lan, Shangchen Zhou, Chen Change Loy

Figure 1 for ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Figure 2 for ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Figure 3 for ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Figure 4 for ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Abstract:This study aims to achieve more precise and versatile object control in image-to-video (I2V) generation. Current methods typically represent the spatial movement of target objects with 2D trajectories, which often fail to capture user intention and frequently produce unnatural results. To enhance control, we present ObjCtrl-2.5D, a training-free object control approach that uses a 3D trajectory, extended from a 2D trajectory with depth information, as a control signal. By modeling object movement as camera movement, ObjCtrl-2.5D represents the 3D trajectory as a sequence of camera poses, enabling object motion control using an existing camera motion control I2V generation model (CMC-I2V) without training. To adapt the CMC-I2V model originally designed for global motion control to handle local object motion, we introduce a module to isolate the target object from the background, enabling independent local control. In addition, we devise an effective way to achieve more accurate object control by sharing low-frequency warped latent within the object's region across frames. Extensive experiments demonstrate that ObjCtrl-2.5D significantly improves object control accuracy compared to training-free methods and offers more diverse control capabilities than training-based approaches using 2D trajectories, enabling complex effects like object rotation. Code and results are available at https://wzhouxiff.github.io/projects/ObjCtrl-2.5D/.

* Project Page: https://wzhouxiff.github.io/projects/ObjCtrl-2.5D/

Via

Access Paper or Ask Questions

Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

Nov 26, 2024

Xinyu Hou, Zongsheng Yue, Xiaoming Li, Chen Change Loy

Figure 1 for Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

Figure 2 for Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

Figure 3 for Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

Figure 4 for Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

Abstract:In this work, we introduce a single parameter $\omega$, to effectively control granularity in diffusion-based synthesis. This parameter is incorporated during the denoising steps of the diffusion model's reverse process. Our approach does not require model retraining, architectural modifications, or additional computational overhead during inference, yet enables precise control over the level of details in the generated outputs. Moreover, spatial masks or denoising schedules with varying $\omega$ values can be applied to achieve region-specific or timestep-specific granularity control. Prior knowledge of image composition from control signals or reference images further facilitates the creation of precise $\omega$ masks for granularity control on specific objects. To highlight the parameter's role in controlling subtle detail variations, the technique is named Omegance, combining "omega" and "nuance". Our method demonstrates impressive performance across various image and video synthesis tasks and is adaptable to advanced diffusion models. The code is available at https://github.com/itsmag11/Omegance.

* Project page: https://itsmag11.github.io/Omegance/

Via

Access Paper or Ask Questions

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Nov 12, 2024

Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, Chen Change Loy

Figure 1 for GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Figure 2 for GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Figure 3 for GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Figure 4 for GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Abstract:While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent diffusion model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs. Notably, the newly proposed latent space naturally enables geometry-texture disentanglement, thus allowing 3D-aware editing. Experimental results demonstrate the effectiveness of our approach on multiple datasets, outperforming existing methods in both text- and image-conditioned 3D generation.

* project page: https://nirvanalan.github.io/projects/GA/

Via

Access Paper or Ask Questions

Paint Bucket Colorization Using Anime Character Color Design Sheets

Oct 25, 2024

Yuekun Dai, Qinyue Li, Shangchen Zhou, Yihang Luo, Chongyi Li, Chen Change Loy

Figure 1 for Paint Bucket Colorization Using Anime Character Color Design Sheets

Figure 2 for Paint Bucket Colorization Using Anime Character Color Design Sheets

Figure 3 for Paint Bucket Colorization Using Anime Character Color Design Sheets

Figure 4 for Paint Bucket Colorization Using Anime Character Color Design Sheets

Abstract:Line art colorization plays a crucial role in hand-drawn animation production, where digital artists manually colorize segments using a paint bucket tool, guided by RGB values from character color design sheets. This process, often called paint bucket colorization, involves two main tasks: keyframe colorization, where colors are applied according to the character's color design sheet, and consecutive frame colorization, where these colors are replicated across adjacent frames. Current automated colorization methods primarily focus on reference-based and segment-matching approaches. However, reference-based methods often fail to accurately assign specific colors to each region, while matching-based methods are limited to consecutive frame colorization and struggle with issues like significant deformation and occlusion. In this work, we introduce inclusion matching, which allows the network to understand the inclusion relationships between segments, rather than relying solely on direct visual correspondences. By integrating this approach with segment parsing and color warping modules, our inclusion matching pipeline significantly improves performance in both keyframe colorization and consecutive frame colorization. To support our network's training, we have developed a unique dataset named PaintBucket-Character, which includes rendered line arts alongside their colorized versions and shading annotations for various 3D characters. To replicate industry animation data formats, we also created color design sheets for each character, with semantic information for each color and standard pose reference images. Experiments highlight the superiority of our method, demonstrating accurate and consistent colorization across both our proposed benchmarks and hand-drawn animations.

* Extension of arXiv:2403.18342; Project page at https://github.com/ykdai/BasicPBC

Via

Access Paper or Ask Questions

GroupDiff: Diffusion-based Group Portrait Editing

Sep 22, 2024

Yuming Jiang, Nanxuan Zhao, Qing Liu, Krishna Kumar Singh, Shuai Yang, Chen Change Loy, Ziwei Liu

Figure 1 for GroupDiff: Diffusion-based Group Portrait Editing

Figure 2 for GroupDiff: Diffusion-based Group Portrait Editing

Figure 3 for GroupDiff: Diffusion-based Group Portrait Editing

Figure 4 for GroupDiff: Diffusion-based Group Portrait Editing

Abstract:Group portrait editing is highly desirable since users constantly want to add a person, delete a person, or manipulate existing persons. It is also challenging due to the intricate dynamics of human interactions and the diverse gestures. In this work, we present GroupDiff, a pioneering effort to tackle group photo editing with three dedicated contributions: 1) Data Engine: Since there is no labeled data for group photo editing, we create a data engine to generate paired data for training. The training data engine covers the diverse needs of group portrait editing. 2) Appearance Preservation: To keep the appearance consistent after editing, we inject the images of persons from the group photo into the attention modules and employ skeletons to provide intra-person guidance. 3) Control Flexibility: Bounding boxes indicating the locations of each person are used to reweight the attention matrix so that the features of each person can be injected into the correct places. This inter-person guidance provides flexible manners for manipulation. Extensive experiments demonstrate that GroupDiff exhibits state-of-the-art performance compared to existing methods. GroupDiff offers controllability for editing and maintains the fidelity of the original photos.

* ECCV 2024

Via

Access Paper or Ask Questions

Kalman-Inspired Feature Propagation for Video Face Super-Resolution

Aug 09, 2024

Ruicheng Feng, Chongyi Li, Chen Change Loy

Abstract:Despite the promising progress of face image super-resolution, video face super-resolution remains relatively under-explored. Existing approaches either adapt general video super-resolution networks to face datasets or apply established face image super-resolution models independently on individual video frames. These paradigms encounter challenges either in reconstructing facial details or maintaining temporal consistency. To address these issues, we introduce a novel framework called Kalman-inspired Feature Propagation (KEEP), designed to maintain a stable face prior over time. The Kalman filtering principles offer our method a recurrent ability to use the information from previously restored frames to guide and regulate the restoration process of the current frame. Extensive experiments demonstrate the effectiveness of our method in capturing facial details consistently across video frames. Code and video demo are available at https://jnjaby.github.io/projects/KEEP.

* Accepted by ECCV 2024. Project page: https://jnjaby.github.io/projects/KEEP/

Via

Access Paper or Ask Questions

Eliminating Feature Ambiguity for Few-Shot Segmentation

Jul 13, 2024

Qianxiong Xu, Guosheng Lin, Chen Change Loy, Cheng Long, Ziyue Li, Rui Zhao

Figure 1 for Eliminating Feature Ambiguity for Few-Shot Segmentation

Figure 2 for Eliminating Feature Ambiguity for Few-Shot Segmentation

Figure 3 for Eliminating Feature Ambiguity for Few-Shot Segmentation

Abstract:Recent advancements in few-shot segmentation (FSS) have exploited pixel-by-pixel matching between query and support features, typically based on cross attention, which selectively activate query foreground (FG) features that correspond to the same-class support FG features. However, due to the large receptive fields in deep layers of the backbone, the extracted query and support FG features are inevitably mingled with background (BG) features, impeding the FG-FG matching in cross attention. Hence, the query FG features are fused with less support FG features, i.e., the support information is not well utilized. This paper presents a novel plug-in termed ambiguity elimination network (AENet), which can be plugged into any existing cross attention-based FSS methods. The main idea is to mine discriminative query FG regions to rectify the ambiguous FG features, increasing the proportion of FG information, so as to suppress the negative impacts of the doped BG features. In this way, the FG-FG matching is naturally enhanced. We plug AENet into three baselines CyCTR, SCCAN and HDMNet for evaluation, and their scores are improved by large margins, e.g., the 1-shot performance of SCCAN can be improved by 3.0%+ on both PASCAL-5$^i$ and COCO-20$^i$. The code is available at https://github.com/Sam1224/AENet.

* This paper is accepted by ECCV'24

Via

Access Paper or Ask Questions

Generalizable Implicit Motion Modeling for Video Frame Interpolation

Jul 11, 2024

Zujin Guo, Wei Li, Chen Change Loy

Figure 1 for Generalizable Implicit Motion Modeling for Video Frame Interpolation

Figure 2 for Generalizable Implicit Motion Modeling for Video Frame Interpolation

Figure 3 for Generalizable Implicit Motion Modeling for Video Frame Interpolation

Figure 4 for Generalizable Implicit Motion Modeling for Video Frame Interpolation

Abstract:Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be smoothly integrated with existing flow-based VFI works without further modifications. We show that GIMM performs better than the current state of the art on the VFI benchmarks.

* Project Page: https://gseancdat.github.io/projects/GIMMVFI

Via

Access Paper or Ask Questions