Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ziwei Liu

Nanyang Technological University

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Mar 19, 2024

Yongwei Chen, Tengfei Wang, Tong Wu, Xingang Pan, Kui Jia, Ziwei Liu

Figure 1 for ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Figure 2 for ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Figure 3 for ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Figure 4 for ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Abstract:Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects. In this work, we present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models. 1) We first perform an in-depth analysis of this ``multi-object gap'' from both model and data perspectives. 2) Next, with reconstructed 3D models of different objects, we seek to adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image. 3) To automate this process, we apply spatially-aware score distillation sampling (SSDS) from pretrained diffusion models to guide the positioning of objects. Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling, and thus achieves more accurate results. Extensive experiments validate ComboVerse achieves clear improvements over existing methods in generating compositional 3D assets.

* https://cyw-3d.github.io/ComboVerse/

Via

Access Paper or Ask Questions

WHAC: World-grounded Humans and Cameras

Mar 19, 2024

Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita(+2 more)

Figure 1 for WHAC: World-grounded Humans and Cameras

Figure 2 for WHAC: World-grounded Humans and Cameras

Figure 3 for WHAC: World-grounded Humans and Cameras

Figure 4 for WHAC: World-grounded Humans and Cameras

Abstract:Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our approach is founded on two key observations. Firstly, camera-frame SMPL-X estimation methods readily recover absolute human depth. Secondly, human motions inherently provide absolute spatial cues. By integrating these insights, we introduce a novel framework, referred to as WHAC, to facilitate world-grounded expressive human pose and shape estimation (EHPS) alongside camera pose estimation, without relying on traditional optimization techniques. Additionally, we present a new synthetic dataset, WHAC-A-Mole, which includes accurately annotated humans and cameras, and features diverse interactive human motions as well as realistic camera trajectories. Extensive experiments on both standard and newly established benchmarks highlight the superiority and efficacy of our framework. We will make the code and dataset publicly available.

* Homepage: https://wqyin.github.io/projects/WHAC/

Via

Access Paper or Ask Questions

InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting

Mar 18, 2024

Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, Ziwei Liu

Abstract:Text-to-texture synthesis has become a new frontier in 3D content creation thanks to the recent advances in text-to-image models. Existing methods primarily adopt a combination of pretrained depth-aware diffusion and inpainting models, yet they exhibit shortcomings such as 3D inconsistency and limited controllability. To address these challenges, we introduce InteX, a novel framework for interactive text-to-texture synthesis. 1) InteX includes a user-friendly interface that facilitates interaction and control throughout the synthesis process, enabling region-specific repainting and precise texture editing. 2) Additionally, we develop a unified depth-aware inpainting model that integrates depth information with inpainting cues, effectively mitigating 3D inconsistencies and improving generation speed. Through extensive experiments, our framework has proven to be both practical and effective in text-to-texture synthesis, paving the way for high-quality 3D content creation.

* Project Page: https://me.kiui.moe/intex/

Via

Access Paper or Ask Questions

Fairness Optimization for Intelligent Reflecting Surface Aided Uplink Rate-Splitting Multiple Access

Mar 15, 2024

Shanshan Zhang, Wen Chen, Qingqing Wu, Ziwei Liu, Shunqing Zhang, Jun Li

Figure 1 for Fairness Optimization for Intelligent Reflecting Surface Aided Uplink Rate-Splitting Multiple Access

Figure 2 for Fairness Optimization for Intelligent Reflecting Surface Aided Uplink Rate-Splitting Multiple Access

Figure 3 for Fairness Optimization for Intelligent Reflecting Surface Aided Uplink Rate-Splitting Multiple Access

Figure 4 for Fairness Optimization for Intelligent Reflecting Surface Aided Uplink Rate-Splitting Multiple Access

Abstract:This paper studies the fair transmission design for an intelligent reflecting surface (IRS) aided rate-splitting multiple access (RSMA). IRS is used to establish a good signal propagation environment and enhance the RSMA transmission performance. The fair rate adaption problem is constructed as a max-min optimization problem. To solve the optimization problem, we adopt an alternative optimization (AO) algorithm to optimize the power allocation, beamforming, and decoding order, respectively. A generalized power iteration (GPI) method is proposed to optimize the receive beamforming, which can improve the minimum rate of devices and reduce the optimization complexity. At the base station (BS), a successive group decoding (SGD) algorithm is proposed to tackle the uplink signal estimation, which trades off the fairness and complexity of decoding. At the same time, we also consider robust communication with imperfect channel state information at the transmitter (CSIT), which studies robust optimization by using lower bound expressions on the expected data rates. Extensive numerical results show that the proposed optimization algorithm can significantly improve the performance of fairness. It also provides reliable results for uplink communication with imperfect CSIT.

* This work has been submitted to TCOM

Via

Access Paper or Ask Questions

STAR-RIS Assisted Wireless-Powered and Backscattering Mobile Edge Computing Networks

Mar 05, 2024

Bin Lyu, Yining Zhang, Pengcheng Chen, Ziwei Liu, Feng Tian

Figure 1 for STAR-RIS Assisted Wireless-Powered and Backscattering Mobile Edge Computing Networks

Figure 2 for STAR-RIS Assisted Wireless-Powered and Backscattering Mobile Edge Computing Networks

Figure 3 for STAR-RIS Assisted Wireless-Powered and Backscattering Mobile Edge Computing Networks

Figure 4 for STAR-RIS Assisted Wireless-Powered and Backscattering Mobile Edge Computing Networks

Abstract:Wireless powered and backscattering mobile edge computing (WPB-MEC) network is a novel network paradigm to supply energy supplies and computing resource to wireless sensors (WSs). However, its performance is seriously affected by severe attenuations and inappropriate assumptions of infinite computing capability at the hybrid access point (HAP). To address the above issues, in this paper, we propose a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) aided scheme for boosting the performance of WPB-MEC network under the constraint of finite computing capability. Specifically, energy-constrained WSs are able to offload tasks actively or passively from them to the HAP. In this process, the STAR-RIS is utilized to improve the quantity of harvested energy and strengthen the offloading efficiency by adapting its operating protocols. We then maximize the sum computational bits (SCBs) under the finite computing capability constraint. To handle the solving challenges, we first present interesting results in closed-form and then design a block coordinate descent (BCD) based algorithm, ensuring a near-optimal solution. Finally, simulation results are provided to confirm that our proposed scheme can improve the SCBs by 9.9 times compared to the local computing only scheme.

* Accepted by China Communications. 13 pages, 8 figures

Via

Access Paper or Ask Questions

3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

Mar 04, 2024

Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

Figure 1 for 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

Figure 2 for 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

Figure 3 for 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

Figure 4 for 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

Abstract:We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors. The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping. The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation. To facilitate the training of the proposed system, we clean and caption the largest open-source 3D dataset, Objaverse, by combining the power of vision language models and large language models. Experiment results are reported qualitatively and quantitatively to show the performance of the proposed system. Our codes and models are available at https://github.com/3DTopia/3DTopia

* Code available at https://github.com/3DTopia/3DTopia

Via

Access Paper or Ask Questions

Rate-Splitting Multiple Access for Transmissive Reconfigurable Intelligent Surface Transceiver Empowered ISAC System

Feb 19, 2024

Ziwei Liu, Wen Chen, Qingqing Wu, Jinhong Yuan, Shanshan Zhang, Zhendong Li, Jun Li

Abstract:In this paper, a novel transmissive reconfigurable intelligent surface (TRIS) transceiver empowered integrated sensing and communications (ISAC) system is proposed for future multi-demand terminals. To address interference management, we implement rate-splitting multiple access (RSMA), where the common stream is independently designed for the sensing service. We introduce the sensing quality of service (QoS) criteria based on this structure and construct an optimization problem with the sensing QoS criteria as the objective function to optimize the sensing stream precoding matrix and the communication stream precoding matrix. Due to the coupling of optimization variables, the formulated problem is a non-convex optimization problem that cannot be solved directly. To tackle the above-mentioned challenging problem, alternating optimization (AO) is utilized to decouple the optimization variables. Specifically, the problem is decoupled into three subproblems about the sensing stream precoding matrix, the communication stream precoding matrix, and the auxiliary variables, which is solved alternatively through AO until the convergence is reached. For solving the problem, successive convex approximation (SCA) is applied to deal with the sum-rate threshold constraints on communications, and difference-of-convex (DC) programming is utilized to solve rank-one non-convex constraints. Numerical simulation results verify the superiority of the proposed scheme in terms of improving the communication and sensing QoS.

Via

Access Paper or Ask Questions

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Feb 07, 2024

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu

Abstract:3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: 1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.

* Project page: https://me.kiui.moe/lgm/

Via

Access Paper or Ask Questions

A Comprehensive Survey on 3D Content Generation

Feb 02, 2024

Jian Liu, Xiaoshui Huang, Tianyu Huang, Lu Chen, Yuenan Hou, Shixiang Tang, Ziwei Liu, Wanli Ouyang, Wangmeng Zuo, Junjun Jiang(+1 more)

Figure 1 for A Comprehensive Survey on 3D Content Generation

Figure 2 for A Comprehensive Survey on 3D Content Generation

Figure 3 for A Comprehensive Survey on 3D Content Generation

Abstract:Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e.g., text, image, video, audio and 3D. The 3D is the most close visual modality to real-world 3D environment and carries enormous knowledge. The 3D content generation shows both academic and practical values while also presenting formidable technical challenges. This review aims to consolidate developments within the burgeoning domain of 3D content generation. Specifically, a new taxonomy is proposed that categorizes existing approaches into three types: 3D native generative methods, 2D prior-based 3D generative methods, and hybrid 3D generative methods. The survey covers approximately 60 papers spanning the major techniques. Besides, we discuss limitations of current 3D content generation techniques, and point out open challenges as well as promising directions for future work. Accompanied with this survey, we have established a project website where the resources on 3D content generation research are provided. The project page is available at https://github.com/hitcslj/Awesome-AIGC-3D.

Via

Access Paper or Ask Questions

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Jan 18, 2024

Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu(+1 more)

Figure 1 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Figure 2 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Figure 3 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Figure 4 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Abstract:We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process. This approach overcomes the limitations of traditional video inpainting methods that depend on manually labeled binary masks, a process often tedious and labor-intensive. We present the Remove Objects from Videos by Instructions (ROVI) dataset, containing 5,650 videos and 9,091 inpainting results, to support training and evaluation for this task. We also propose a novel diffusion-based language-driven video inpainting framework, the first end-to-end baseline for this task, integrating Multimodal Large Language Models to understand and execute complex language-based inpainting requests effectively. Our comprehensive results showcase the dataset's versatility and the model's effectiveness in various language-instructed inpainting scenarios. We will make datasets, code, and models publicly available.

* Project Page: https://jianzongwu.github.io/projects/rovi

Via

Access Paper or Ask Questions