Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shangchen Zhou

ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Dec 10, 2024

Zhouxia Wang, Yushi Lan, Shangchen Zhou, Chen Change Loy

Figure 1 for ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Figure 2 for ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Figure 3 for ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Figure 4 for ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Abstract:This study aims to achieve more precise and versatile object control in image-to-video (I2V) generation. Current methods typically represent the spatial movement of target objects with 2D trajectories, which often fail to capture user intention and frequently produce unnatural results. To enhance control, we present ObjCtrl-2.5D, a training-free object control approach that uses a 3D trajectory, extended from a 2D trajectory with depth information, as a control signal. By modeling object movement as camera movement, ObjCtrl-2.5D represents the 3D trajectory as a sequence of camera poses, enabling object motion control using an existing camera motion control I2V generation model (CMC-I2V) without training. To adapt the CMC-I2V model originally designed for global motion control to handle local object motion, we introduce a module to isolate the target object from the background, enabling independent local control. In addition, we devise an effective way to achieve more accurate object control by sharing low-frequency warped latent within the object's region across frames. Extensive experiments demonstrate that ObjCtrl-2.5D significantly improves object control accuracy compared to training-free methods and offers more diverse control capabilities than training-based approaches using 2D trajectories, enabling complex effects like object rotation. Code and results are available at https://wzhouxiff.github.io/projects/ObjCtrl-2.5D/.

* Project Page: https://wzhouxiff.github.io/projects/ObjCtrl-2.5D/

Via

Access Paper or Ask Questions

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

Nov 25, 2024

Yongwei Chen, Yushi Lan, Shangchen Zhou, Tengfei Wang, XIngang Pan

Figure 1 for SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

Figure 2 for SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

Figure 3 for SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

Figure 4 for SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

Abstract:Autoregressive models have demonstrated remarkable success across various fields, from large language models (LLMs) to large multimodal models (LMMs) and 2D content generation, moving closer to artificial general intelligence (AGI). Despite these advances, applying autoregressive approaches to 3D object generation and understanding remains largely unexplored. This paper introduces Scale AutoRegressive 3D (SAR3D), a novel framework that leverages a multi-scale 3D vector-quantized variational autoencoder (VQVAE) to tokenize 3D objects for efficient autoregressive generation and detailed understanding. By predicting the next scale in a multi-scale latent representation instead of the next single token, SAR3D reduces generation time significantly, achieving fast 3D object generation in just 0.82 seconds on an A6000 GPU. Additionally, given the tokens enriched with hierarchical 3D-aware information, we finetune a pretrained LLM on them, enabling multimodal comprehension of 3D content. Our experiments show that SAR3D surpasses current 3D generation methods in both speed and quality and allows LLMs to interpret and caption 3D models comprehensively.

* Project page: https://cyw-3d.github.io/projects/SAR3D/

Via

Access Paper or Ask Questions

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Nov 12, 2024

Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, Chen Change Loy

Figure 1 for GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Figure 2 for GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Figure 3 for GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Figure 4 for GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Abstract:While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent diffusion model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs. Notably, the newly proposed latent space naturally enables geometry-texture disentanglement, thus allowing 3D-aware editing. Experimental results demonstrate the effectiveness of our approach on multiple datasets, outperforming existing methods in both text- and image-conditioned 3D generation.

* project page: https://nirvanalan.github.io/projects/GA/

Via

Access Paper or Ask Questions

Paint Bucket Colorization Using Anime Character Color Design Sheets

Oct 25, 2024

Yuekun Dai, Qinyue Li, Shangchen Zhou, Yihang Luo, Chongyi Li, Chen Change Loy

Figure 1 for Paint Bucket Colorization Using Anime Character Color Design Sheets

Figure 2 for Paint Bucket Colorization Using Anime Character Color Design Sheets

Figure 3 for Paint Bucket Colorization Using Anime Character Color Design Sheets

Figure 4 for Paint Bucket Colorization Using Anime Character Color Design Sheets

Abstract:Line art colorization plays a crucial role in hand-drawn animation production, where digital artists manually colorize segments using a paint bucket tool, guided by RGB values from character color design sheets. This process, often called paint bucket colorization, involves two main tasks: keyframe colorization, where colors are applied according to the character's color design sheet, and consecutive frame colorization, where these colors are replicated across adjacent frames. Current automated colorization methods primarily focus on reference-based and segment-matching approaches. However, reference-based methods often fail to accurately assign specific colors to each region, while matching-based methods are limited to consecutive frame colorization and struggle with issues like significant deformation and occlusion. In this work, we introduce inclusion matching, which allows the network to understand the inclusion relationships between segments, rather than relying solely on direct visual correspondences. By integrating this approach with segment parsing and color warping modules, our inclusion matching pipeline significantly improves performance in both keyframe colorization and consecutive frame colorization. To support our network's training, we have developed a unique dataset named PaintBucket-Character, which includes rendered line arts alongside their colorized versions and shading annotations for various 3D characters. To replicate industry animation data formats, we also created color design sheets for each character, with semantic information for each color and standard pose reference images. Experiments highlight the superiority of our method, demonstrating accurate and consistent colorization across both our proposed benchmarks and hand-drawn animations.

* Extension of arXiv:2403.18342; Project page at https://github.com/ykdai/BasicPBC

Via

Access Paper or Ask Questions

MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

Jun 11, 2024

Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy(+32 more)

Figure 1 for MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

Figure 2 for MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

Figure 3 for MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

Figure 4 for MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

Abstract:The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Few-shot RAW Image Denoising track on MIPI 2024. In total, 165 participants were successfully registered, and 7 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art erformance on Few-shot RAW Image Denoising. More details of this challenge and the link to the dataset can be found at https://mipichallenge.org/MIPI2024.

* CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

Via

Access Paper or Ask Questions

MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Apr 30, 2024

Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu(+53 more)

Figure 1 for MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Figure 2 for MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Figure 3 for MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Figure 4 for MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Abstract:The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.

* CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

Via

Access Paper or Ask Questions

Learning Inclusion Matching for Animation Paint Bucket Colorization

Mar 27, 2024

Yuekun Dai, Shangchen Zhou, Qinyue Li, Chongyi Li, Chen Change Loy

Abstract:Colorizing line art is a pivotal task in the production of hand-drawn cel animation. This typically involves digital painters using a paint bucket tool to manually color each segment enclosed by lines, based on RGB values predetermined by a color designer. This frame-by-frame process is both arduous and time-intensive. Current automated methods mainly focus on segment matching. This technique migrates colors from a reference to the target frame by aligning features within line-enclosed segments across frames. However, issues like occlusion and wrinkles in animations often disrupt these direct correspondences, leading to mismatches. In this work, we introduce a new learning-based inclusion matching pipeline, which directs the network to comprehend the inclusion relationships between segments rather than relying solely on direct visual correspondences. Our method features a two-stage pipeline that integrates a coarse color warping module with an inclusion matching module, enabling more nuanced and accurate colorization. To facilitate the training of our network, we also develope a unique dataset, referred to as PaintBucket-Character. This dataset includes rendered line arts alongside their colorized counterparts, featuring various 3D characters. Extensive experiments demonstrate the effectiveness and superiority of our method over existing techniques.

* accepted to CVPR 2024. Project Page: https://ykdai.github.io/projects/InclusionMatching

Via

Access Paper or Ask Questions

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Mar 18, 2024

Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy

Figure 1 for LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Figure 2 for LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Figure 3 for LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Figure 4 for LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Abstract:The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harnesses a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, and 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. Through training a diffusion model on this 3D-aware latent space, our method achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. Our proposed LN3Diff presents a significant advancement in 3D generative modeling and holds promise for various applications in 3D vision and graphics tasks.

* project webpage: https://nirvanalan.github.io/projects/ln3diff/

Via

Access Paper or Ask Questions

Control Color: Multimodal Diffusion-based Interactive Image Colorization

Feb 16, 2024

Zhexin Liang, Zhaochen Li, Shangchen Zhou, Chongyi Li, Chen Change Loy

Figure 1 for Control Color: Multimodal Diffusion-based Interactive Image Colorization

Figure 2 for Control Color: Multimodal Diffusion-based Interactive Image Colorization

Figure 3 for Control Color: Multimodal Diffusion-based Interactive Image Colorization

Figure 4 for Control Color: Multimodal Diffusion-based Interactive Image Colorization

Abstract:Despite the existence of numerous colorization methods, several limitations still exist, such as lack of user interaction, inflexibility in local colorization, unnatural color rendering, insufficient color variation, and color overflow. To solve these issues, we introduce Control Color (CtrlColor), a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model, offering promising capabilities in highly controllable interactive image colorization. While several diffusion-based methods have been proposed, supporting colorization in multiple modalities remains non-trivial. In this study, we aim to tackle both unconditional and conditional image colorization (text prompts, strokes, exemplars) and address color overflow and incorrect color within a unified framework. Specifically, we present an effective way to encode user strokes to enable precise local color manipulation and employ a practical way to constrain the color distribution similar to exemplars. Apart from accepting text prompts as conditions, these designs add versatility to our approach. We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring. Extensive comparisons show that our model outperforms state-of-the-art image colorization methods both qualitatively and quantitatively.

* Project Page: https://zhexinliang.github.io/Control_Color/; Demo Video: https://youtu.be/tSCwA-srl8Q

Via

Access Paper or Ask Questions

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Jan 18, 2024

Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu(+1 more)

Figure 1 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Figure 2 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Figure 3 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Figure 4 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Abstract:We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process. This approach overcomes the limitations of traditional video inpainting methods that depend on manually labeled binary masks, a process often tedious and labor-intensive. We present the Remove Objects from Videos by Instructions (ROVI) dataset, containing 5,650 videos and 9,091 inpainting results, to support training and evaluation for this task. We also propose a novel diffusion-based language-driven video inpainting framework, the first end-to-end baseline for this task, integrating Multimodal Large Language Models to understand and execute complex language-based inpainting requests effectively. Our comprehensive results showcase the dataset's versatility and the model's effectiveness in various language-instructed inpainting scenarios. We will make datasets, code, and models publicly available.

* Project Page: https://jianzongwu.github.io/projects/rovi

Via

Access Paper or Ask Questions