Image and Point Clouds provide different information for robots. Finding the correspondences between data from different sensors is crucial for various tasks such as localization, mapping, and navigation. Learning-based descriptors have been developed for single sensors; there is little work on cross-modal features. This work treats learning cross-modal features as a dense contrastive learning problem. We propose a Tuple-Circle loss function for cross-modality feature learning. Furthermore, to learn good features and not lose generality, we developed a variant of widely used PointNet++ architecture for point cloud and U-Net CNN architecture for images. Moreover, we conduct experiments on a real-world dataset to show the effectiveness of our loss function and network structure. We show that our models indeed learn information from both images as well as LiDAR by visualizing the features.
A variable-phase-shifter (VPS) architecture with hybrid precoding for mixture use of phase shifters and switches, is proposed for millimeter wave massive multiple-input multiple-output communications. For the VPS architecture, a hybrid precoding design (HPD) scheme, called VPS-HPD, is proposed to optimize the phases according to the channel state information by alternately optimizing the analog precoder and digital precoder. To reduce the computational complexity of the VPS-HPD scheme, a low-complexity HPD scheme for the VPS architecture (VPS-LC-HPD) including alternating optimization in three stages is then proposed, where each stage has a closed-form solution and can be efficiently implemented. To reduce the hardware complexity introduced by the large number of switches, we consider a group-connected VPS architecture and propose a HPD scheme, where the HPD problem is divided into multiple independent subproblems with each subproblem flexibly solved by the VPS-HPD or VPS-LC-HPD scheme. Simulation results verify the effectiveness of the propose schemes and show that the proposed schemes can achieve satisfactory spectral efficiency performance with reduced computational complexity or hardware complexity.
Most graph neural network models rely on a particular message passing paradigm, where the idea is to iteratively propagate node representations of a graph to each node in the direct neighborhood. While very prominent, this paradigm leads to information propagation bottlenecks, as information is repeatedly compressed at intermediary node representations, which causes loss of information, making it practically impossible to gather meaningful signals from distant nodes. To address this issue, we propose shortest path message passing neural networks, where the node representations of a graph are propagated to each node in the shortest path neighborhoods. In this setting, nodes can directly communicate between each other even if they are not neighbors, breaking the information bottleneck and hence leading to more adequately learned representations. Theoretically, our framework generalizes message passing neural networks, resulting in provably more expressive models. Empirically, we verify the capacity of a basic model of this framework on dedicated synthetic experiments, and on real-world graph classification and regression benchmarks, obtaining several state-of-the-art results.
Circuit breakers (CBs) play an important role in modern society because they make the power transmission and distribution systems reliable and resilient. Therefore, it is important to maintain their reliability and to monitor their operation. A key to ensure a reliable operation of CBs is to monitor their condition. In this work, we performed an accelerated life testing for mechanical failures of a vacuum circuit breaker (VCB) by performing close-open operations continuously until failure. We recorded data for each operation and made the collected run-to-failure dataset publicly available. In our experiments, the VCB operated more than 26000 close-open operations without current load with the time span of five months. The run-to-failure long-term monitoring enables us to monitor the evolution of the VCB condition and the degradation over time. To monitor CB condition, closing time is one of the indicators, which is usually measured when the CB is taken out of operation and is completely disconnected from the network. We propose an algorithm that enables to infer the same information on the closing time from a non-intrusive sensor. By utilizing the short-time energy (STE) of the vibration signal, it is possible to identify the key moments when specific events happen including the time when the latch starts to move, and the closing time. The effectiveness of the proposed algorithm is evaluated on the VCB dataset and is also compared to the binary segmentation (BS) change point detection algorithm. This research highlights the potential for continuous online condition monitoring, which is the basis for applying future predictive maintenance strategies.
Entity linking aims to link ambiguous mentions to their corresponding entities in a knowledge base, which is significant and fundamental for various downstream applications, e.g., knowledge base completion, question answering, and information extraction. While great efforts have been devoted to this task, most of these studies follow the assumption that large-scale labeled data is available. However, when the labeled data is insufficient for specific domains due to labor-intensive annotation work, the performance of existing algorithms will suffer an intolerable decline. In this paper, we endeavor to solve the problem of few-shot entity linking, which only requires a minimal amount of in-domain labeled data and is more practical in real situations. Specifically, we firstly propose a novel weak supervision strategy to generate non-trivial synthetic entity-mention pairs based on mention rewriting. Since the quality of the synthetic data has a critical impact on effective model training, we further design a meta-learning mechanism to assign different weights to each synthetic entity-mention pair automatically. Through this way, we can profoundly exploit rich and precious semantic information to derive a well-trained entity linking model under the few-shot setting. The experiments on real-world datasets show that the proposed method can extensively improve the state-of-the-art few-shot entity linking model and achieve impressive performance when only a small amount of labeled data is available. Moreover, we also demonstrate the outstanding ability of the model's transferability.
Learning node representation that incorporating information from graph structure benefits wide range of tasks on graph. Majority of existing graph neural networks (GNNs) have limited power in capturing position information for a given node. The idea of positioning nodes with selected anchors has been exploit, yet mainly rely on explicit labeling of distance information. Here we propose Graph Inference Representation (GIR), an anchor based GNN encoding path information related to anchors for each node. Abilities to get position-aware embedding are theoretically and experimentally investigated on GIRs and its core variants. Further, the complementary characteristic of GIRs and typical GNNs embeddings are demonstrated. We show that GIRs get outperformed results on position-aware scenario, and could improve GNNs results by fuse GIRs embedding.
Recently, significant progress has been made in masked image modeling to catch up to masked language modeling. However, unlike words in NLP, the lack of semantic decomposition of images still makes masked autoencoding (MAE) different between vision and language. In this paper, we explore a potential visual analogue of words, i.e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy. Compared to widely adopted random masking, our masking strategy can gradually guide the network to learn various information, i.e., from intra-part patterns to inter-part relations. In particular, we achieve this in two steps. 1) Semantic part learning: we design a self-supervised part learning method to obtain semantic parts by leveraging and refining the multi-head attention of a ViT-based encoder. 2) Semantic-guided MAE (SemMAE) training: we design a masking strategy that varies from masking a portion of patches in each part to masking a portion of (whole) parts in an image. Extensive experiments on various vision tasks show that SemMAE can learn better image representation by integrating semantic information. In particular, SemMAE achieves 84.5% fine-tuning accuracy on ImageNet-1k, which outperforms the vanilla MAE by 1.4%. In the semantic segmentation and fine-grained recognition tasks, SemMAE also brings significant improvements and yields the state-of-the-art performance.
We consider the inverse problem for the Partial Differential Equations (PDEs) such that the parameters of the dependency structure can exhibit random changepoints over time. This can arise, for example, when the physical system is either under malicious attack (e.g., hacker attacks on power grids and internet networks) or subject to extreme external conditions (e.g., weather conditions impacting electricity grids or large market movements impacting valuations of derivative contracts). For that purpose, we employ Physics Informed Neural Networks (PINNs) -- universal approximators that can incorporate prior information from any physical law described by a system of PDEs. This prior knowledge acts in the training of the neural network as a regularization that limits the space of admissible solutions and increases the correctness of the function approximation. We show that when the true data generating process exhibits changepoints in the PDE dynamics, this regularization can lead to a complete miss-calibration and a failure of the model. Therefore, we propose an extension of PINNs using a Total-Variation penalty which accommodates (multiple) changepoints in the PDE dynamics. These changepoints can occur at random locations over time, and they are estimated together with the solutions. We propose an additional refinement algorithm that combines changepoints detection with a reduced dynamic programming method that is feasible for the computationally intensive PINNs methods, and we demonstrate the benefits of the proposed model empirically using examples of different equations with changes in the parameters. In case of no changepoints in the data, the proposed model reduces to the original PINNs model. In the presence of changepoints, it leads to improvements in parameter estimation, better model fitting, and a lower training error compared to the original PINNs model.
Chinese Spell Checking (CSC) task aims to detect and correct Chinese spelling errors. In recent years, related researches focus on introducing the character similarity from confusion set to enhance the CSC models, ignoring the context of characters that contain richer information. To make better use of contextual similarity, we propose a simple yet effective curriculum learning framework for the CSC task. With the help of our designed model-agnostic framework, existing CSC models will be trained from easy to difficult as humans learn Chinese characters and achieve further performance improvements. Extensive experiments and detailed analyses on widely used SIGHAN datasets show that our method outperforms previous state-of-the-art methods.
Deformable image registration provides dynamic information about the image and is essential in medical image analysis. However, due to the different characteristics of single-temporal brain MR images and multi-temporal echocardiograms, it is difficult to accurately register them using the same algorithm or model. We propose an unsupervised multi-scale correlation iterative registration network (SearchMorph), and the model has three highlights. (1)We introduced cost volumes to strengthen feature correlations and constructed correlation pyramids to complement multi-scale correlation information. (2) We designed the search module to search for the registration of features in multi-scale pyramids. (3) We use the GRU module for iterative refinement of the deformation field. The proposed network in this paper shows leadership in common single-temporal registration tasks and solves multi-temporal motion estimation tasks. The experimental results show that our proposed method achieves higher registration accuracy and a lower folding point ratio than the state-of-the-art methods.