Generative Artificial Intelligence (AI) has pioneered new methodological paradigms in architectural design, significantly expanding the innovative potential and efficiency of the design process. This paper explores the extensive applications of generative AI technologies in architectural design, a trend that has benefited from the rapid development of deep generative models. This article provides a comprehensive review of the basic principles of generative AI and large-scale models and highlights the applications in the generation of 2D images, videos, and 3D models. In addition, by reviewing the latest literature from 2020, this paper scrutinizes the impact of generative AI technologies at different stages of architectural design, from generating initial architectural 3D forms to producing final architectural imagery. The marked trend of research growth indicates an increasing inclination within the architectural design community towards embracing generative AI, thereby catalyzing a shared enthusiasm for research. These research cases and methodologies have not only proven to enhance efficiency and innovation significantly but have also posed challenges to the conventional boundaries of architectural creativity. Finally, we point out new directions for design innovation and articulate fresh trajectories for applying generative AI in the architectural domain. This article provides the first comprehensive literature review about generative AI for architectural design, and we believe this work can facilitate more research work on this significant topic in architecture.
Using knowledge graphs to assist deep learning models in making recommendation decisions has recently been proven to effectively improve the model's interpretability and accuracy. This paper introduces an end-to-end deep learning model, named RKGCN, which dynamically analyses each user's preferences and makes a recommendation of suitable items. It combines knowledge graphs on both the item side and user side to enrich their representations to maximize the utilization of the abundant information in knowledge graphs. RKGCN is able to offer more personalized and relevant recommendations in three different scenarios. The experimental results show the superior effectiveness of our model over 5 baseline models on three real-world datasets including movies, books, and music.
Prognosis of the reactor accident is a crucial way to ensure appropriate strategies are adopted to avoid radioactive releases. However, there is very limited research in the field of nuclear industry. In this paper, we propose a method for accident prognosis based on the Temporal Fusion Transformer (TFT) model with multi-headed self-attention and gating mechanisms. The method utilizes multiple covariates to improve prediction accuracy on the one hand, and quantile regression methods for uncertainty assessment on the other. The method proposed in this paper is applied to the prognosis after loss of coolant accidents (LOCAs) in HPR1000 reactor. Extensive experimental results show that the method surpasses novel deep learning-based prediction methods in terms of prediction accuracy and confidence. Furthermore, the interference experiments with different signal-to-noise ratios and the ablation experiments for static covariates further illustrate that the robustness comes from the ability to extract the features of static and historical covariates. In summary, this work for the first time applies the novel composite deep learning model TFT to the prognosis of key parameters after a reactor accident, and makes a positive contribution to the establishment of a more intelligent and staff-light maintenance method for reactor systems.
With the mass construction of Gen III nuclear reactors, it is a popular trend to use deep learning (DL) techniques for fast and effective diagnosis of possible accidents. To overcome the common problems of previous work in diagnosing reactor accidents using deep learning theory, this paper proposes a diagnostic process that ensures robustness to noisy and crippled data and is interpretable. First, a novel Denoising Padded Autoencoder (DPAE) is proposed for representation extraction of monitoring data, with representation extractor still effective on disturbed data with signal-to-noise ratios up to 25.0 and monitoring data missing up to 40.0%. Secondly, a diagnostic framework using DPAE encoder for extraction of representations followed by shallow statistical learning algorithms is proposed, and such stepwise diagnostic approach is tested on disturbed datasets with 41.8% and 80.8% higher classification and regression task evaluation metrics, in comparison with the end-to-end diagnostic approaches. Finally, a hierarchical interpretation algorithm using SHAP and feature ablation is presented to analyze the importance of the input monitoring parameters and validate the effectiveness of the high importance parameters. The outcomes of this study provide a referential method for building robust and interpretable intelligent reactor anomaly diagnosis systems in scenarios with high safety requirements.
With the increasing use of high-precision system analysis programs in nuclear engineering, the number of high-fidelity computational data for accident simulation is exploding. Therefore, an algorithm that can achieve both automatic extraction of low-dimensional features from the data and guarantee the validity of the features is needed to improve the performance and confidence of the accident diagnosis system. This study proposes an autoencoder-based autonomous learning framework, namely Padded Auto-Encoder (PAE), which is able to automatically encode accident monitoring data that has been noise-added and with partially missing data into low-dimensional feature vectors via a Vision Transformer-based encoder, and to decode the feature vectors into noise-free and complete reconstructed monitoring data. Thus, the encoder part of the framework is able to automatically infer valid representations from partially missing and noisy monitoring data that reflect the complete and noise-free original data, and the representation vectors can be used for downstream tasks for accident diagnosis or else. In this paper, LOCA of HPR1000 was used as the study object, and the PAE was trained by an unsupervised method using cases with different break locations and sizes as the dataset. The encoder part of the pre-trained PAE was subsequently used as the feature extractor for the monitoring data, and several basic statistical learning algorithms for predicting the break locations and sizes. The results of the study show that the pre-trained diagnostic model with two stages has a better performance in break location and size diagnostic capability with an improvement of 41.62% and 80.86% in the metrics respectively, compared to the diagnostic model with end-to-end model structure.
Segmenting each moving object instance in a scene is essential for many applications. But like many other computer vision tasks, this task performs well in optimal weather, but then adverse weather tends to fail. To be robust in weather conditions, the usual way is to train network in data of given weather pattern or to fuse multiple sensors. We focus on a new possibility, that is, to improve its resilience to weather interference through the network's structural design. First, we propose a novel FPN structure called RiWFPN with a progressive top-down interaction and attention refinement module. RiWFPN can directly replace other FPN structures to improve the robustness of the network in non-optimal weather conditions. Then we extend SOLOV2 to capture temporal information in video to learn motion information, and propose a moving object instance segmentation network with RiWFPN called RiWNet. Finally, in order to verify the effect of moving instance segmentation in different weather disturbances, we propose a VKTTI-moving dataset which is a moving instance segmentation dataset based on the VKTTI dataset, taking into account different weather scenes such as rain, fog, sunset, morning as well as overcast. The experiment proves how RiWFPN improves the network's resilience to adverse weather effects compared to other FPN structures. We compare RiWNet to several other state-of-the-art methods in some challenging datasets, and RiWNet shows better performance especially under adverse weather conditions.
Taking the deep learning-based algorithms into account has become a crucial way to boost object detection performance in aerial images. While various neural network representations have been developed, previous works are still inefficient to investigate the noise-resilient performance, especially on aerial images with noise taken by the cameras with telephoto lenses, and most of the research is concentrated in the field of denoising. Of course, denoising usually requires an additional computational burden to obtain higher quality images, while noise-resilient is more of a description of the robustness of the network itself to different noises, which is an attribute of the algorithm itself. For this reason, the work will be started by analyzing the noise-resilient performance of the neural network, and then propose two hypotheses to build a noise-resilient structure. Based on these hypotheses, we compare the noise-resilient ability of the Oct-ResNet with frequency division processing and the commonly used ResNet. In addition, previous feature pyramid networks used for aerial object detection tasks are not specifically designed for the frequency division feature maps of the Oct-ResNet, and they usually lack attention to bridging the semantic gap between diverse feature maps from different depths. On the basis of this, a novel octave convolution-based semantic attention feature pyramid network (OcSaFPN) is proposed to get higher accuracy in object detection with noise. The proposed algorithm tested on three datasets demonstrates that the proposed OcSaFPN achieves a state-of-the-art detection performance with Gaussian noise or multiplicative noise. In addition, more experiments have proved that the OcSaFPN structure can be easily added to existing algorithms, and the noise-resilient ability can be effectively improved.
Most scenes in practical applications are dynamic scenes containing moving objects, so segmenting accurately moving objects is crucial for many computer vision applications. In order to efficiently segment out all moving objects in the scene, regardless of whether the object has a predefined semantic label, we propose a two-level nested Octave U-structure network with a multiscale attention mechanism called U2-ONet. Each stage of U2-ONet is filled with our newly designed Octave ReSidual U-block (ORSU) to enhance the ability to obtain more context information at different scales while reducing spatial redundancy of feature maps. In order to efficiently train our multi-scale deep network, we introduce a hierarchical training supervision strategy that calculates the loss at each level while adding a knowledge matching loss to keep the optimization consistency. Experimental results show that our method achieves state-of-the-art performance in several general moving objects segmentation datasets.
Most SLAM algorithms are based on the assumption that the scene is static. However, in practice, most scenes are dynamic which usually contains moving objects, these methods are not suitable. In this paper, we introduce DymSLAM, a dynamic stereo visual SLAM system being capable of reconstructing a 4D (3D + time) dynamic scene with rigid moving objects. The only input of DymSLAM is stereo video, and its output includes a dense map of the static environment, 3D model of the moving objects and the trajectories of the camera and the moving objects. We at first detect and match the interesting points between successive frames by using traditional SLAM methods. Then the interesting points belonging to different motion models (including ego-motion and motion models of rigid moving objects) are segmented by a multi-model fitting approach. Based on the interesting points belonging to the ego-motion, we are able to estimate the trajectory of the camera and reconstruct the static background. The interesting points belonging to the motion models of rigid moving objects are then used to estimate their relative motion models to the camera and reconstruct the 3D models of the objects. We then transform the relative motion to the trajectories of the moving objects in the global reference frame. Finally, we then fuse the 3D models of the moving objects into the 3D map of the environment by considering their motion trajectories to obtain a 4D (3D+time) sequence. DymSLAM obtains information about the dynamic objects instead of ignoring them and is suitable for unknown rigid objects. Hence, the proposed system allows the robot to be employed for high-level tasks, such as obstacle avoidance for dynamic objects. We conducted experiments in a real-world environment where both the camera and the objects were moving in a wide range.