Multimodal magnetic resonance imaging (MRI) can reveal different patterns of human tissue and is crucial for clinical diagnosis. However, limited by cost, noise and manual labeling, obtaining diverse and reliable multimodal MR images remains a challenge. For the same lesion, different MRI manifestations have great differences in background information, coarse positioning and fine structure. In order to obtain better generation and segmentation performance, a coordination-spatial attention generation adversarial network (CASP-GAN) based on the cycle-consistent generative adversarial network (CycleGAN) is proposed. The performance of the generator is optimized by introducing the Coordinate Attention (CA) module and the Spatial Attention (SA) module. The two modules can make full use of the captured location information, accurately locating the interested region, and enhancing the generator model network structure. The ability to extract the structure information and the detailed information of the original medical image can help generate the desired image with higher quality. There exist some problems in the original CycleGAN that the training time is long, the parameter amount is too large, and it is difficult to converge. In response to this problem, we introduce the Coordinate Attention (CA) module to replace the Res Block to reduce the number of parameters, and cooperate with the spatial information extraction network above to strengthen the information extraction ability. On the basis of CASP-GAN, an attentional generative cross-modality segmentation (AGCMS) method is further proposed. This method inputs the modalities generated by CASP-GAN and the real modalities into the segmentation network for brain tumor segmentation. Experimental results show that CASP-GAN outperforms CycleGAN and some state-of-the-art methods in PSNR, SSMI and RMSE in most tasks.
Geometric relational embeddings map relational data as geometric objects that combine vector information suitable for machine learning and structured/relational information for structured/relational reasoning, typically in low dimensions. Their preservation of relational structures and their appealing properties and interpretability have led to their uptake for tasks such as knowledge graph completion, ontology and hierarchy reasoning, logical query answering, and hierarchical multi-label classification. We survey methods that underly geometric relational embeddings and categorize them based on (i) the embedding geometries that are used to represent the data; and (ii) the relational reasoning tasks that they aim to improve. We identify the desired properties (i.e., inductive biases) of each kind of embedding and discuss some potential future work.
The goal of a speech-to-image transform is to produce a photo-realistic picture directly from a speech signal. Recently, various studies have focused on this task and have achieved promising performance. However, current speech-to-image approaches are based on a stacked modular framework that suffers from three vital issues: 1) Training separate networks is time-consuming as well as inefficient and the convergence of the final generative model strongly depends on the previous generators; 2) The quality of precursor images is ignored by this architecture; 3) Multiple discriminator networks are required to be trained. To this end, we propose an efficient and effective single-stage framework called Fusion-S2iGan to yield perceptually plausible and semantically consistent image samples on the basis of given spoken descriptions. Fusion-S2iGan introduces a visual+speech fusion module (VSFM), constructed with a pixel-attention module (PAM), a speech-modulation module (SMM) and a weighted-fusion module (WFM), to inject the speech embedding from a speech encoder into the generator while improving the quality of synthesized pictures. Fusion-S2iGan spreads the bimodal information over all layers of the generator network to reinforce the visual feature maps at various hierarchical levels in the architecture. We conduct a series of experiments on four benchmark data sets, i.e., CUB birds, Oxford-102, Flickr8k and Places-subset. The experimental results demonstrate the superiority of the presented Fusion-S2iGan compared to the state-of-the-art models with a multi-stage architecture and a performance level that is close to traditional text-to-image approaches.
Deep saliency prediction algorithms complement the object recognition features, they typically rely on additional information, such as scene context, semantic relationships, gaze direction, and object dissimilarity. However, none of these models consider the temporal nature of gaze shifts during image observation. We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals by exploiting human temporal attention patterns. Our approach locally modulates the saliency predictions by combining the learned temporal maps. Our experiments show that our method outperforms the state-of-the-art models, including a multi-duration saliency model, on the SALICON benchmark. Our code will be publicly available on GitHub.
Recent works reveal that adversarial augmentation benefits the generalization of neural networks (NNs) if used in an appropriate manner. In this paper, we introduce Temporal Adversarial Augmentation (TA), a novel video augmentation technique that utilizes temporal attention. Unlike conventional adversarial augmentation, TA is specifically designed to shift the attention distributions of neural networks with respect to video clips by maximizing a temporal-related loss function. We demonstrate that TA will obtain diverse temporal views, which significantly affect the focus of neural networks. Training with these examples remedies the flaw of unbalanced temporal information perception and enhances the ability to defend against temporal shifts, ultimately leading to better generalization. To leverage TA, we propose Temporal Video Adversarial Fine-tuning (TAF) framework for improving video representations. TAF is a model-agnostic, generic, and interpretability-friendly training strategy. We evaluate TAF with four powerful models (TSM, GST, TAM, and TPN) over three challenging temporal-related benchmarks (Something-something V1&V2 and diving48). Experimental results demonstrate that TAF effectively improves the test accuracy of these models with notable margins without introducing additional parameters or computational costs. As a byproduct, TAF also improves the robustness under out-of-distribution (OOD) settings. Code is available at https://github.com/jinhaoduan/TAF.
Spectral Embedding (SE) has often been used to map data points from non-linear manifolds to linear subspaces for the purpose of classification and clustering. Despite significant advantages, the subspace structure of data in the original space is not preserved in the embedding space. To address this issue subspace clustering has been proposed by replacing the SE graph affinity with a self-expression matrix. It works well if the data lies in a union of linear subspaces however, the performance may degrade in real-world applications where data often spans non-linear manifolds. To address this problem we propose a novel structure-aware deep spectral embedding by combining a spectral embedding loss and a structure preservation loss. To this end, a deep neural network architecture is proposed that simultaneously encodes both types of information and aims to generate structure-aware spectral embedding. The subspace structure of the input data is encoded by using attention-based self-expression learning. The proposed algorithm is evaluated on six publicly available real-world datasets. The results demonstrate the excellent clustering performance of the proposed algorithm compared to the existing state-of-the-art methods. The proposed algorithm has also exhibited better generalization to unseen data points and it is scalable to larger datasets without requiring significant computational resources.
Objective: The Electroencephalogram (EEG) is gaining popularity as a physiological measure for neuroergonomics in human factor studies because it is objective, less prone to bias, and capable of assessing the dynamics of cognitive states. This study investigated the associations between memory workload and EEG during participants' typical office tasks on a single-monitor and dual-monitor arrangement. We expect a higher memory workload for the single-monitor arrangement. Approach: We designed an experiment that mimics the scenario of a subject performing some office work and examined whether the subjects experienced various levels of memory workload in two different office setups: 1) a single-monitor setup and 2) a dual-monitor setup. We used EEG band power, mutual information, and coherence as features to train machine learning models to classify high versus low memory workload states. Main results: The study results showed that these characteristics exhibited significant differences that were consistent across all participants. We also verified the robustness and consistency of these EEG signatures in a different data set collected during a Sternberg task in a prior study. Significance: The study found the EEG correlates of memory workload across individuals, demonstrating the effectiveness of using EEG analysis in conducting real-world neuroergonomic studies.
While the transport of matter by wheeled vehicles or legged robots can be guaranteed in engineered landscapes like roads or rails, locomotion prediction in complex environments like collapsed buildings or crop fields remains challenging. Inspired by principles of information transmission which allow signals to be reliably transmitted over noisy channels, we develop a ``matter transport" framework demonstrating that non-inertial locomotion can be provably generated over ``noisy" rugose landscapes (heterogeneities on the scale of locomotor dimensions). Experiments confirm that sufficient spatial redundancy in the form of serially-connected legged robots leads to reliable transport on such terrain without requiring sensing and control. Further analogies from communication theory coupled to advances in gaits (coding) and sensor-based feedback control (error detection/correction) can lead to agile locomotion in complex terradynamic regimes.
Developments in three-dimensional real worlds promote the integration of geoinformation and building information models (BIM) known as GeoBIM in urban construction. Light detection and ranging (LiDAR) integrated with global navigation satellite systems can provide geo-referenced spatial information. However, constructing detailed urban GeoBIM poses challenges in terms of LiDAR data quality. BIM models designed from software are rich in geometrical information but often lack accurate geo-referenced locations. In this paper, we propose a complementary strategy that integrates LiDAR point clouds with as-designed BIM models for reconstructing urban scenes. A state-of-the-art deep learning framework and graph theory are first combined for LiDAR point cloud segmentation. A coarse-to-fine matching program is then developed to integrate object point clouds with corresponding BIM models. Results show the overall segmentation accuracy of LiDAR datasets reaches up to 90%, and average positioning accuracies of BIM models are 0.023 m for pole-like objects and 0.156 m for buildings, demonstrating the effectiveness of the method in segmentation and matching processes. This work offers a practical solution for rapid and accurate urban GeoBIM construction.
Machine learning has recently made significant strides in reducing design cycle time for complex products. Ship design, which currently involves years long cycles and small batch production, could greatly benefit from these advancements. By developing a machine learning tool for ship design that learns from the design of many different types of ships, tradeoffs in ship design could be identified and optimized. However, the lack of publicly available ship design datasets currently limits the potential for leveraging machine learning in generalized ship design. To address this gap, this paper presents a large dataset of thirty thousand ship hulls, each with design and functional performance information, including parameterization, mesh, point cloud, and image representations, as well as thirty two hydrodynamic drag measures under different operating conditions. The dataset is structured to allow human input and is also designed for computational methods. Additionally, the paper introduces a set of twelve ship hulls from publicly available CAD repositories to showcase the proposed parameterizations ability to accurately reconstruct existing hulls. A surrogate model was developed to predict the thirty two wave drag coefficients, which was then implemented in a genetic algorithm case study to reduce the total drag of a hull by sixty percent while maintaining the shape of the hulls cross section and the length of the parallel midbody. Our work provides a comprehensive dataset and application examples for other researchers to use in advancing data driven ship design.