Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bolei Zhou

A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation

Mar 11, 2024

Pan He, Quanyi Li, Xiaoyong Yuan, Bolei Zhou

Figure 1 for A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation

Figure 2 for A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation

Figure 3 for A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation

Figure 4 for A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation

Abstract:Traffic signal control (TSC) is crucial for reducing traffic congestion that leads to smoother traffic flow, reduced idling time, and mitigated CO2 emissions. In this study, we explore the computer vision approach for TSC that modulates on-road traffic flows through visual observation. Unlike traditional feature-based approaches, vision-based methods depend much less on heuristics and predefined features, bringing promising potentials for end-to-end learning and optimization of traffic signals. Thus, we introduce a holistic traffic simulation framework called TrafficDojo towards vision-based TSC and its benchmarking by integrating the microscopic traffic flow provided in SUMO into the driving simulator MetaDrive. This proposed framework offers a versatile traffic environment for in-depth analysis and comprehensive evaluation of traffic signal controllers across diverse traffic conditions and scenarios. We establish and compare baseline algorithms including both traditional and Reinforecment Learning (RL) approaches. This work sheds insights into the design and development of vision-based TSC approaches and open up new research opportunities. All the code and baselines will be made publicly available.

* Under review for IEEE publications

Via

Access Paper or Ask Questions

Unsupervised Discovery of Steerable Factors When Graph Deep Generative Models Are Entangled

Jan 29, 2024

Shengchao Liu, Chengpeng Wang, Jiarui Lu, Weili Nie, Hanchen Wang, Zhuoxinran Li, Bolei Zhou, Jian Tang

Abstract:Deep generative models (DGMs) have been widely developed for graph data. However, much less investigation has been carried out on understanding the latent space of such pretrained graph DGMs. These understandings possess the potential to provide constructive guidelines for crucial tasks, such as graph controllable generation. Thus in this work, we are interested in studying this problem and propose GraphCG, a method for the unsupervised discovery of steerable factors in the latent space of pretrained graph DGMs. We first examine the representation space of three pretrained graph DGMs with six disentanglement metrics, and we observe that the pretrained representation space is entangled. Motivated by this observation, GraphCG learns the steerable factors via maximizing the mutual information between semantic-rich directions, where the controlled graph moving along the same direction will share the same steerable factors. We quantitatively verify that GraphCG outperforms four competitive baselines on two graph DGMs pretrained on two molecule datasets. Additionally, we qualitatively illustrate seven steerable factors learned by GraphCG on five pretrained DGMs over five graph datasets, including two for molecules and three for point clouds.

Via

Access Paper or Ask Questions

SceneWiz3D: Towards Text-guided 3D Scene Composition

Dec 13, 2023

Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee

Figure 1 for SceneWiz3D: Towards Text-guided 3D Scene Composition

Figure 2 for SceneWiz3D: Towards Text-guided 3D Scene Composition

Figure 3 for SceneWiz3D: Towards Text-guided 3D Scene Composition

Figure 4 for SceneWiz3D: Towards Text-guided 3D Scene Composition

Abstract:We are witnessing significant breakthroughs in the technology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes, however, remains very challenging as a scene contains multiple 3D objects, diverse and scattered. In this work, we introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text. We marry the locality of objects with globality of scenes by introducing a hybrid 3D representation: explicit for objects and implicit for scenes. Remarkably, an object, being represented explicitly, can be either generated from text using conventional text-to-3D approaches, or provided by users. To configure the layout of the scene and automatically place objects, we apply the Particle Swarm Optimization technique during the optimization process. Furthermore, it is difficult for certain parts of the scene (e.g., corners, occlusion) to receive multi-view supervision, leading to inferior geometry. We incorporate an RGBD panorama diffusion model to mitigate it, resulting in high-quality geometry. Extensive evaluation supports that our approach achieves superior quality over previous approaches, enabling the generation of detailed and view-consistent 3D scenes.

* Project page: https://zqh0253.github.io/SceneWiz3D/

Via

Access Paper or Ask Questions

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Dec 12, 2023

Sicheng Mo, Fangzhou Mu, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, Bolei Zhou

Figure 1 for FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Figure 2 for FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Figure 3 for FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Figure 4 for FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Abstract:Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-image (T2I) diffusion models. However, auxiliary modules have to be trained for each type of spatial condition, model architecture, and checkpoint, putting them at odds with the diverse intents and preferences a human designer would like to convey to the AI models during the content creation process. In this work, we present FreeControl, a training-free approach for controllable T2I generation that supports multiple conditions, architectures, and checkpoints simultaneously. FreeControl designs structure guidance to facilitate the structure alignment with a guidance image, and appearance guidance to enable the appearance sharing between images generated using the same seed. Extensive qualitative and quantitative experiments demonstrate the superior performance of FreeControl across a variety of pre-trained T2I models. In particular, FreeControl facilitates convenient training-free control over many different architectures and checkpoints, allows the challenging input conditions on which most of the existing training-free methods fail, and achieves competitive synthesis quality with training-based approaches.

* Project Page: https://genforce.github.io/freecontrol/

Via

Access Paper or Ask Questions

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

Dec 04, 2023

Qihang Zhang, Yinghao Xu, Yujun Shen, Bo Dai, Bolei Zhou, Ceyuan Yang

Abstract:Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis technique since 3D scenes usually hold complex spatial configurations and consist of a number of objects at varying scales. We thus propose a practical and efficient 3D representation that incorporates an equivariant radiance field with the guidance of a bird's-eye view (BEV) map. Concretely, objects of synthesized 3D scenes could be easily manipulated through steering the corresponding BEV maps. Moreover, by adequately incorporating positional encoding and low-pass filters into the generator, the representation becomes equivariant to the given BEV map. Such equivariance allows us to produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency. Extensive experiments on 3D scene datasets demonstrate the effectiveness of our approach. Our project website is at https://zqh0253.github.io/BerfScene/.

Via

Access Paper or Ask Questions

CAT: Closed-loop Adversarial Training for Safe End-to-End Driving

Oct 19, 2023

Linrui Zhang, Zhenghao Peng, Quanyi Li, Bolei Zhou

Abstract:Driving safety is a top priority for autonomous vehicles. Orthogonal to prior work handling accident-prone traffic events by algorithm designs at the policy level, we investigate a Closed-loop Adversarial Training (CAT) framework for safe end-to-end driving in this paper through the lens of environment augmentation. CAT aims to continuously improve the safety of driving agents by training the agent on safety-critical scenarios that are dynamically generated over time. A novel resampling technique is developed to turn log-replay real-world driving scenarios into safety-critical ones via probabilistic factorization, where the adversarial traffic generation is modeled as the multiplication of standard motion prediction sub-problems. Consequently, CAT can launch more efficient physical attacks compared to existing safety-critical scenario generation methods and yields a significantly less computational cost in the iterative learning pipeline. We incorporate CAT into the MetaDrive simulator and validate our approach on hundreds of driving scenarios imported from real-world driving datasets. Experimental results demonstrate that CAT can effectively generate adversarial scenarios countering the agent being trained. After training, the agent can achieve superior driving safety in both log-replay and safety-critical traffic scenarios on the held-out test set. Code and data are available at https://metadriverse.github.io/cat.

* 7th Conference on Robot Learning (CoRL 2023)

Via

Access Paper or Ask Questions

In-Domain GAN Inversion for Faithful Reconstruction and Editability

Sep 25, 2023

Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen, Bolei Zhou

Figure 1 for In-Domain GAN Inversion for Faithful Reconstruction and Editability

Figure 2 for In-Domain GAN Inversion for Faithful Reconstruction and Editability

Figure 3 for In-Domain GAN Inversion for Faithful Reconstruction and Editability

Figure 4 for In-Domain GAN Inversion for Faithful Reconstruction and Editability

Abstract:Generative Adversarial Networks (GANs) have significantly advanced image synthesis through mapping randomly sampled latent codes to high-fidelity synthesized images. However, applying well-trained GANs to real image editing remains challenging. A common solution is to find an approximate latent code that can adequately recover the input image to edit, which is also known as GAN inversion. To invert a GAN model, prior works typically focus on reconstructing the target image at the pixel level, yet few studies are conducted on whether the inverted result can well support manipulation at the semantic level. This work fills in this gap by proposing in-domain GAN inversion, which consists of a domain-guided encoder and a domain-regularized optimizer, to regularize the inverted code in the native latent space of the pre-trained GAN model. In this way, we manage to sufficiently reuse the knowledge learned by GANs for image reconstruction, facilitating a wide range of editing applications without any retraining. We further make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property. Such a trade-off sheds light on how a GAN model represents an image with various semantics encoded in the learned latent distribution. Code, models, and demo are available at the project page: https://genforce.github.io/idinvert/.

Via

Access Paper or Ask Questions

Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation

Jul 23, 2023

Haoyue Bai, Ceyuan Yang, Yinghao Xu, S. -H. Gary Chan, Bolei Zhou

Abstract:Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data. However, their performance deteriorates significantly when handling out-of-distribution (OoD) data, where the training and test are drawn from different distributions. In this paper, we explore utilizing the generative models as a data augmentation source for improving out-of-distribution robustness of neural classifiers. Specifically, we develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples. Training a generative model directly on the source domains tends to suffer from mode collapse and sometimes amplifies the data bias. Instead, we first train a StyleGAN model on one source domain and then fine-tune it on the other domains, resulting in many correlated generators where their model parameters have the same initialization thus are aligned. We then linearly interpolate the model parameters of the generators to spawn new sets of generators. Such interpolated generators are used as an extra data augmentation source to train the classifiers. The interpolation coefficients can flexibly control the augmentation direction and strength. In addition, a style-mixing mechanism is applied to further improve the diversity of the generated OoD samples. Our experiments show that the proposed method explicitly increases the diversity of training domains and achieves consistent improvements over baselines across datasets and multiple different distribution shifts.

Via

Access Paper or Ask Questions

Efficient 3D Articulated Human Generation with Layered Surface Volumes

Jul 11, 2023

Yinghao Xu, Wang Yifan, Alexander W. Bergman, Menglei Chai, Bolei Zhou, Gordon Wetzstein

Abstract:Access to high-quality and diverse 3D articulated digital human assets is crucial in various applications, ranging from virtual reality to social platforms. Generative approaches, such as 3D generative adversarial networks (GANs), are rapidly replacing laborious manual content creation tools. However, existing 3D GAN frameworks typically rely on scene representations that leverage either template meshes, which are fast but offer limited quality, or volumes, which offer high capacity but are slow to render, thereby limiting the 3D fidelity in GAN settings. In this work, we introduce layered surface volumes (LSVs) as a new 3D object representation for articulated digital humans. LSVs represent a human body using multiple textured mesh layers around a conventional template. These layers are rendered using alpha compositing with fast differentiable rasterization, and they can be interpreted as a volumetric representation that allocates its capacity to a manifold of finite thickness around the template. Unlike conventional single-layer templates that struggle with representing fine off-surface details like hair or accessories, our surface volumes naturally capture such details. LSVs can be articulated, and they exhibit exceptional efficiency in GAN settings, where a 2D generator learns to synthesize the RGBA textures for the individual layers. Trained on unstructured, single-view 2D image datasets, our LSV-GAN generates high-quality and view-consistent 3D articulated digital humans without the need for view-inconsistent 2D upsampling networks.

* Project page: https://www.computationalimaging.org/publications/lsv/ Demo: https://www.youtube.com/watch?v=vahgMFCM3j4

Via

Access Paper or Ask Questions

ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling

Jul 02, 2023

Quanyi Li, Zhenghao Peng, Lan Feng, Zhizheng Liu, Chenda Duan, Wenjie Mo, Bolei Zhou

Figure 1 for ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling

Figure 2 for ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling

Figure 3 for ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling

Figure 4 for ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling

Abstract:Large-scale driving datasets such as Waymo Open Dataset and nuScenes substantially accelerate autonomous driving research, especially for perception tasks such as 3D detection and trajectory forecasting. Since the driving logs in these datasets contain HD maps and detailed object annotations which accurately reflect the real-world complexity of traffic behaviors, we can harvest a massive number of complex traffic scenarios and recreate their digital twins in simulation. Compared to the hand-crafted scenarios often used in existing simulators, data-driven scenarios collected from the real world can facilitate many research opportunities in machine learning and autonomous driving. In this work, we present ScenarioNet, an open-source platform for large-scale traffic scenario modeling and simulation. ScenarioNet defines a unified scenario description format and collects a large-scale repository of real-world traffic scenarios from the heterogeneous data in various driving datasets including Waymo, nuScenes, Lyft L5, and nuPlan datasets. These scenarios can be further replayed and interacted with in multiple views from Bird-Eye-View layout to realistic 3D rendering in MetaDrive simulator. This provides a benchmark for evaluating the safety of autonomous driving stacks in simulation before their real-world deployment. We further demonstrate the strengths of ScenarioNet on large-scale scenario generation, imitation learning, and reinforcement learning in both single-agent and multi-agent settings. Code, demo videos, and website are available at https://metadriverse.github.io/scenarionet.

Via

Access Paper or Ask Questions