Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Zhou

Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis

Dec 06, 2024

Rui Zhou, Yanxia Zhang, Chenyang Yuan, Frank Permenter, Nikos Arechiga, Matt Klenk, Faez Ahmed

Figure 1 for Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis

Figure 2 for Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis

Figure 3 for Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis

Figure 4 for Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis

Abstract:This paper introduces a generative model designed for multimodal control over text-to-image foundation generative AI models such as Stable Diffusion, specifically tailored for engineering design synthesis. Our model proposes parametric, image, and text control modalities to enhance design precision and diversity. Firstly, it handles both partial and complete parametric inputs using a diffusion model that acts as a design autocomplete co-pilot, coupled with a parametric encoder to process the information. Secondly, the model utilizes assembly graphs to systematically assemble input component images, which are then processed through a component encoder to capture essential visual data. Thirdly, textual descriptions are integrated via CLIP encoding, ensuring a comprehensive interpretation of design intent. These diverse inputs are synthesized through a multimodal fusion technique, creating a joint embedding that acts as the input to a module inspired by ControlNet. This integration allows the model to apply robust multimodal control to foundation models, facilitating the generation of complex and precise engineering designs. This approach broadens the capabilities of AI-driven design tools and demonstrates significant advancements in precise control based on diverse data modalities for enhanced design generation.

Via

Access Paper or Ask Questions

GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model

Nov 28, 2024

Rui Zhou, Jingbin Liu, Junbin Xie, Jianyu Zhang, Yingze Hu, Jiele Zhao

Figure 1 for GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model

Figure 2 for GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model

Figure 3 for GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model

Figure 4 for GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model

Abstract:Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.

Via

Access Paper or Ask Questions

Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model

Oct 31, 2024

Wenjia Xie, Hao Wang, Luankang Zhang, Rui Zhou, Defu Lian, Enhong Chen

Figure 1 for Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model

Figure 2 for Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model

Figure 3 for Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model

Figure 4 for Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model

Abstract:Sequential recommendation (SR) aims to predict items that users may be interested in based on their historical behavior sequences. We revisit SR from a novel information-theoretic perspective and find that conventional sequential modeling methods fail to adequately capture the randomness and unpredictability of user behavior. Inspired by fuzzy information processing theory, this paper introduces the DDSR model, which uses fuzzy sets of interaction sequences to overcome the limitations and better capture the evolution of users' real interests. Formally based on diffusion transition processes in discrete state spaces, which is unlike common diffusion models such as DDPM that operate in continuous domains. It is better suited for discrete data, using structured transitions instead of arbitrary noise introduction to avoid information loss. Additionally, to address the inefficiency of matrix transformations due to the vast discrete space, we use semantic labels derived from quantization or RQ-VAE to replace item IDs, enhancing efficiency and improving cold start issues. Testing on three public benchmark datasets shows that DDSR outperforms existing state-of-the-art methods in various settings, demonstrating its potential and effectiveness in handling SR tasks.

* NeurIPS'2024, 10 pages

Via

Access Paper or Ask Questions

Multi-Robot Pursuit in Parameterized Formation via Imitation Learning

Oct 31, 2024

Jinyong Chen, Rui Zhou, Zhaozong Wang, Yunjie Zhang, Guibin Sun

Figure 1 for Multi-Robot Pursuit in Parameterized Formation via Imitation Learning

Figure 2 for Multi-Robot Pursuit in Parameterized Formation via Imitation Learning

Figure 3 for Multi-Robot Pursuit in Parameterized Formation via Imitation Learning

Figure 4 for Multi-Robot Pursuit in Parameterized Formation via Imitation Learning

Abstract:This paper studies the problem of multi-robot pursuit of how to coordinate a group of defending robots to capture a faster attacker before it enters a protected area. Such operation for defending robots is challenging due to the unknown avoidance strategy and higher speed of the attacker, coupled with the limited communication capabilities of defenders. To solve this problem, we propose a parameterized formation controller that allows defending robots to adapt their formation shape using five adjustable parameters. Moreover, we develop an imitation-learning based approach integrated with model predictive control to optimize these shape parameters. We make full use of these two techniques to enhance the capture capabilities of defending robots through ongoing training. Both simulation and experiment are provided to verify the effectiveness and robustness of our proposed controller. Simulation results show that defending robots can rapidly learn an effective strategy for capturing the attacker, and moreover the learned strategy remains effective across varying numbers of defenders. Experiment results on real robot platforms further validated these findings.

Via

Access Paper or Ask Questions

OoDIS: Anomaly Instance Segmentation Benchmark

Jun 17, 2024

Alexey Nekrasov, Rui Zhou, Miriam Ackermann, Alexander Hermans, Bastian Leibe, Matthias Rottmann

Figure 1 for OoDIS: Anomaly Instance Segmentation Benchmark

Figure 2 for OoDIS: Anomaly Instance Segmentation Benchmark

Figure 3 for OoDIS: Anomaly Instance Segmentation Benchmark

Figure 4 for OoDIS: Anomaly Instance Segmentation Benchmark

Abstract:Autonomous vehicles require a precise understanding of their environment to navigate safely. Reliable identification of unknown objects, especially those that are absent during training, such as wild animals, is critical due to their potential to cause serious accidents. Significant progress in semantic segmentation of anomalies has been driven by the availability of out-of-distribution (OOD) benchmarks. However, a comprehensive understanding of scene dynamics requires the segmentation of individual objects, and thus the segmentation of instances is essential. Development in this area has been lagging, largely due to the lack of dedicated benchmarks. To address this gap, we have extended the most commonly used anomaly segmentation benchmarks to include the instance segmentation task. Our evaluation of anomaly instance segmentation methods shows that this challenge remains an unsolved problem. The benchmark website and the competition page can be found at: https://vision.rwth-aachen.de/oodis .

* Accepted at the VAND 2.0 Workshop at CVPR 2024. Project page: https://vision.rwth-aachen.de/oodis

Via

Access Paper or Ask Questions

Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models

Jun 17, 2024

Rui Zhou, Chenyang Yuan, Frank Permenter, Yanxia Zhang, Nikos Arechiga, Matt Klenk, Faez Ahmed

Figure 1 for Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models

Figure 2 for Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models

Figure 3 for Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models

Figure 4 for Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models

Abstract:This study introduces a generative imputation model leveraging graph attention networks and tabular diffusion models for completing missing parametric data in engineering designs. This model functions as an AI design co-pilot, providing multiple design options for incomplete designs, which we demonstrate using the bicycle design CAD dataset. Through comparative evaluations, we demonstrate that our model significantly outperforms existing classical methods, such as MissForest, hotDeck, PPCA, and tabular generative method TabCSDI in both the accuracy and diversity of imputation options. Generative modeling also enables a broader exploration of design possibilities, thereby enhancing design decision-making by allowing engineers to explore a variety of design completions. The graph model combines GNNs with the structural information contained in assembly graphs, enabling the model to understand and predict the complex interdependencies between different design parameters. The graph model helps accurately capture and impute complex parametric interdependencies from an assembly graph, which is key for design problems. By learning from an existing dataset of designs, the imputation capability allows the model to act as an intelligent assistant that autocompletes CAD designs based on user-defined partial parametric design, effectively bridging the gap between ideation and realization. The proposed work provides a pathway to not only facilitate informed design decisions but also promote creative exploration in design.

* IDETC 2024 Accepted

Via

Access Paper or Ask Questions

MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts

Mar 09, 2024

Zhuo Xu, Rui Zhou, Yida Yin, Huidong Gao, Masayoshi Tomizuka, Jiachen Li

Figure 1 for MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts

Figure 2 for MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts

Figure 3 for MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts

Figure 4 for MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts

Abstract:Data-driven methods have great advantages in modeling complicated human behavioral dynamics and dealing with many human-robot interaction applications. However, collecting massive and annotated real-world human datasets has been a laborious task, especially for highly interactive scenarios. On the other hand, algorithmic data generation methods are usually limited by their model capacities, making them unable to offer realistic and diverse data needed by various application users. In this work, we study trajectory-level data generation for multi-human or human-robot interaction scenarios and propose a learning-based automatic trajectory generation model, which we call Multi-Agent TRajectory generation with dIverse conteXts (MATRIX). MATRIX is capable of generating interactive human behaviors in realistic diverse contexts. We achieve this goal by modeling the explicit and interpretable objectives so that MATRIX can generate human motions based on diverse destinations and heterogeneous behaviors. We carried out extensive comparison and ablation studies to illustrate the effectiveness of our approach across various metrics. We also presented experiments that demonstrate the capability of MATRIX to serve as data augmentation for imitation-based motion planning.

* IEEE International Conference on Robotics and Automation (ICRA 2024)

Via

Access Paper or Ask Questions

Mel-FullSubNet: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Feb 22, 2024

Rui Zhou, Xian Li, Ying Fang, Xiaofei Li

Figure 1 for Mel-FullSubNet: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Figure 2 for Mel-FullSubNet: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Figure 3 for Mel-FullSubNet: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Figure 4 for Mel-FullSubNet: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Abstract:In this work, we propose Mel-FullSubNet, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance. Mel-FullSubNet takes as input the noisy and reverberant Mel-spectrogram and predicts the corresponding clean Mel-spectrogram. The enhanced Mel-spectrogram can be either transformed to speech waveform with a neural vocoder or directly used for ASR. Mel-FullSubNet encapsulates interleaved full-band and sub-band networks, for learning the full-band spectral pattern of signals and the sub-band/narrow-band properties of signals, respectively. Compared to linear-frequency domain or time-domain speech enhancement, the major advantage of Mel-spectrogram enhancement is that Mel-frequency presents speech in a more compact way and thus is easier to learn, which will benefit both speech quality and ASR. Experimental results demonstrate a significant improvement in both speech quality and ASR performance achieved by the proposed model.

Via

Access Paper or Ask Questions

Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

Feb 21, 2024

Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han(+4 more)

Figure 1 for Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

Figure 2 for Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

Figure 3 for Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

Figure 4 for Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

Abstract:Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, making longitudinal infant brain atlas construction and developmental trajectory delineation quite challenging. Thanks to the development of an AI-based generative model, neuroimage completion has become a powerful technique to retain as much available data as possible. However, current image completion methods usually suffer from inconsistency within each individual subject in the time dimension, compromising the overall quality. To solve this problem, our paper proposed a two-stage cascaded diffusion model, Cas-DiffCom, for dense and longitudinal 3D infant brain MRI completion and super-resolution. We applied our proposed method to the Baby Connectome Project (BCP) dataset. The experiment results validate that Cas-DiffCom achieves both individual consistency and high fidelity in longitudinal infant brain image completion. We further applied the generated infant brain images to two downstream tasks, brain tissue segmentation and developmental trajectory delineation, to declare its task-oriented potential in the neuroscience field.

Via

Access Paper or Ask Questions

A Novel Field-Free SOT Magnetic Tunnel Junction With Local VCMA-Induced Switching

Dec 24, 2023

Rui Zhou, Haiyang Zhang, Hao Wang, Jin He, Qijun Huang, Sheng Chang

Abstract:By integrating the local voltage-controlled magnetic anisotropy (VCMA) effect, Dzyaloshinskii-Moriya interaction (DMI) effect, and spin-orbit torque (SOT) effect, we propose a novel device structure for field-free magnetic tunnel junction (MTJ). Micromagnetic simulation shows that the device utilizes the chiral symmetry breaking caused by the DMI effect to induce a non-collinear spin texture under the influence of SOT current. This, combined with the perpendicular magnetic anisotropy (PMA) gradient generated by the local VCMA effect, enables deterministic switching of the MTJ state without an external field. The impact of variations in DMI strength and PMA gradient on the magnetization dynamics is analyzed.

Via

Access Paper or Ask Questions