Ensemble methods have been widely used to improve the performance of machine learning methods in terms of generalization and uncertainty calibration, while they struggle to use in deep learning systems, as training an ensemble of deep neural networks (DNNs) and then deploying them for online prediction incur an extremely higher computational overhead of model training and test-time predictions. Recently, several advanced techniques, such as fast geometric ensembling (FGE) and snapshot ensemble, have been proposed. These methods can train the model ensembles in the same time as a single model, thus getting around the hurdle of training time. However, their overhead of model recording and test-time computations remains much higher than their single model based counterparts. Here we propose a parsimonious FGE (PFGE) that employs a lightweight ensemble of higher-performing DNNs generated by several successively-performed procedures of stochastic weight averaging. Experimental results across different advanced DNN architectures on different datasets, namely CIFAR-$\{$10,100$\}$ and Imagenet, demonstrate its performance. Results show that, compared with state-of-the-art methods, PFGE achieves better generalization performance and satisfactory calibration capability, while the overhead of model recording and test-time predictions is significantly reduced.
Internet-of-vehicle (IoV) is a general concept referring to, e.g., autonomous drive based vehicle-to-everything (V2X) communications or moving relays. Here, high rate and reliability demands call for advanced multi-antenna techniques and millimeter-wave (mmw) based communications. However, the sensitivity of the mmw signals to blockage may limit the system performance, especially in highways/rural areas with limited building reflectors/base station deployments and high-speed devices. To avoid the blockage, various techniques have been proposed among which reconfigurable intelligent surface (RIS) is a candidate. RIS, however, has been mainly of interest in stationary/low mobility scenarios, due to the associated channel state information acquisition and beam management overhead as well as imperfect reflection. In this article, we study the potentials and challenges of RIS-assisted dynamic blockage avoidance in IoV networks. Particularly, by designing region-based RIS pre-selection as well as blockage prediction schemes, we show that RIS-assisted communication has the potential to boost the performance of IoV networks. However, there are still issues to be solved before RIS can be practically deployed in IoV networks.
Stochastic weight averaging (SWA) is recognized as a simple while one effective approach to improve the generalization of stochastic gradient descent (SGD) for training deep neural networks (DNNs). A common insight to explain its success is that averaging weights following an SGD process equipped with cyclical or high constant learning rates can discover wider optima, which then lead to better generalization. We give a new insight that does not concur with the above one. We characterize that SWA's performance is highly dependent on to what extent the SGD process that runs before SWA converges, and the operation of weight averaging only contributes to variance reduction. This new insight suggests practical guides on better algorithm design. As an instantiation, we show that following an SGD process with insufficient convergence, running SWA more times leads to continual incremental benefits in terms of generalization. Our findings are corroborated by extensive experiments across different network architectures, including a baseline CNN, PreResNet-164, WideResNet-28-10, VGG16, ResNet-50, ResNet-152, DenseNet-161, and different datasets including CIFAR-{10,100}, and Imagenet.
Recent non-local self-attention methods have proven to be effective in capturing long-range dependencies for semantic segmentation. These methods usually form a similarity map of RC*C (by compressing spatial dimensions) or RHW*HW (by compressing channels) to describe the feature relations along either channel or spatial dimensions, where C is the number of channels, H and W are the spatial dimensions of the input feature map. However, such practices tend to condense feature dependencies along the other dimensions,hence causing attention missing, which might lead to inferior results for small/thin categories or inconsistent segmentation inside large objects. To address this problem, we propose anew approach, namely Fully Attentional Network (FLANet),to encode both spatial and channel attentions in a single similarity map while maintaining high computational efficiency. Specifically, for each channel map, our FLANet can harvest feature responses from all other channel maps, and the associated spatial positions as well, through a novel fully attentional module. Our new method has achieved state-of-the-art performance on three challenging semantic segmentation datasets,i.e., 83.6%, 46.99%, and 88.5% on the Cityscapes test set,the ADE20K validation set, and the PASCAL VOC test set,respectively.
The non-local network has become a widely used technique for semantic segmentation, which computes an attention map to measure the relationships of each pixel pair. However, most of the current popular non-local models tend to ignore the phenomenon that the calculated attention map appears to be very noisy, containing inter-class and intra-class inconsistencies, which lowers the accuracy and reliability of the non-local methods. In this paper, we figuratively denote these inconsistencies as attention noises and explore the solutions to denoise them. Specifically, we inventively propose a Denoised Non-Local Network (Denoised NL), which consists of two primary modules, i.e., the Global Rectifying (GR) block and the Local Retention (LR) block, to eliminate the inter-class and intra-class noises respectively. First, GR adopts the class-level predictions to capture a binary map to distinguish whether the selected two pixels belong to the same category. Second, LR captures the ignored local dependencies and further uses them to rectify the unwanted hollows in the attention map. The experimental results on two challenging semantic segmentation datasets demonstrate the superior performance of our model. Without any external training data, our proposed Denoised NL can achieve the state-of-the-art performance of 83.5\% and 46.69\% mIoU on Cityscapes and ADE20K, respectively.
Moving relay (MR), which is a candidate solution for supporting in-vehicle users, has been investigated in different studies. Due to the mobile nature of the MR, acquiring channel state information at the transmitter side (CSIT) is challenging because of the fast-changing environment around the vehicle. On top of an MR, one can use predictor antenna (PA), i.e., an additional antenna in front of the receive antenna (RA), to obtain CSIT, and recent works have investigated the benefits of such a set up. PA-aided CSIT acquisition normally works with the help of different content information such as the location and the velocity of the MR. In this paper, we study the effect of velocity awareness on the PA system, and develop adaptive antenna selection schemes in PA-assisted MRs. Results show that, compared to no-CSIT schemes, a velocity-aware antenna selection-based PA system can improve the end-to-end throughput by an order of magnitude.
In future wireless networks, one of the use-cases of interest is Internet-of-vehicles (IoV). Here, IoV refers to two different functionalities, namely, serving the in-vehicle users and supporting the connected-vehicle functionalities, where both can be well provided by the transceivers installed on top of vehicles. Such dual functionality of on-vehicle transceivers, however, implies strict rate and reliability requirements, for which one may need to utilize large bandwidths/beamforming, acquire up-to-date channel state information (CSI) and avoid blockages. In this article, we incorporate the recently proposed concept of predictor antennas (PAs) into a \textit{large-scale cooperative PA (LSCPA)} setup where both temporal blockages and CSI out-dating are avoided via base stations (BSs)/vehicles cooperation. Summarizing the ongoing standardization progress enabling IoV communications, we present the potentials and challenges of the LSCPA setup, and compare the effect of cooperative and non-cooperative schemes on the performance of IoV links. As we show, the BSs cooperation and blockage/CSI prediction can boost the performance of IoV links remarkably.
Ocean fronts can cause the accumulation of nutrients and affect the propagation of underwater sound, so high-precision ocean front detection is of great significance to the marine fishery and national defense fields. However, the current ocean front detection methods either have low detection accuracy or most can only detect the occurrence of ocean front by binary classification, rarely considering the differences of the characteristics of multiple ocean fronts in different sea areas. In order to solve the above problems, we propose a semantic segmentation network called location and seasonality enhanced network (LSENet) for multi-class ocean fronts detection at pixel level. In this network, we first design a channel supervision unit structure, which integrates the seasonal characteristics of the ocean front itself and the contextual information to improve the detection accuracy. We also introduce a location attention mechanism to adaptively assign attention weights to the fronts according to their frequently occurred sea area, which can further improve the accuracy of multi-class ocean front detection. Compared with other semantic segmentation methods and current representative ocean front detection method, the experimental results demonstrate convincingly that our method is more effective.
Many AI-related tasks involve the interactions of data in multiple modalities. It has been a new trend to merge multi-modal information into knowledge graph(KG), resulting in multi-modal knowledge graphs (MMKG). However, MMKGs usually suffer from low coverage and incompleteness. To mitigate this problem, a viable approach is to integrate complementary knowledge from other MMKGs. To this end, although existing entity alignment approaches could be adopted, they operate in the Euclidean space, and the resulting Euclidean entity representations can lead to large distortion of KG's hierarchical structure. Besides, the visual information has yet not been well exploited. In response to these issues, in this work, we propose a novel multi-modal entity alignment approach, Hyperbolic multi-modal entity alignment(HMEA), which extends the Euclidean representation to hyperboloid manifold. We first adopt the Hyperbolic Graph Convolutional Networks (HGCNs) to learn structural representations of entities. Regarding the visual information, we generate image embeddings using the densenet model, which are also projected into the hyperbolic space using HGCNs. Finally, we combine the structure and visual representations in the hyperbolic space and use the aggregated embeddings to predict potential alignment results. Extensive experiments and ablation studies demonstrate the effectiveness of our proposed model and its components.