This paper concerns the research problem of point cloud registration to find the rigid transformation to optimally align the source point set with the target one. Learning robust point cloud registration models with deep neural networks has emerged as a powerful paradigm, offering promising performance in predicting the global geometric transformation for a pair of point sets. Existing methods firstly leverage an encoder to regress a latent shape embedding, which is then decoded into a shape-conditioned transformation via concatenation-based conditioning. However, different regions of a 3D shape vary in their geometric structures which makes it more sense that we have a region-conditioned transformation instead of the shape-conditioned one. In this paper we present a \underline{R}egion-\underline{A}ware point cloud \underline{R}egistration, denoted as RAR, to predict transformation for pairwise point sets in the self-supervised learning fashion. More specifically, we develop a novel region-aware decoder (RAD) module that is formed with an implicit neural region representation parameterized by neural networks. The implicit neural region representation is learned with a self-supervised 3D shape reconstruction loss without the need for region labels. Consequently, the region-aware decoder (RAD) module guides the training of the region-aware transformation (RAT) module and region-aware weight (RAW) module, which predict the transforms and weights for different regions respectively. The global geometric transformation from source point set to target one is then formed by the weighted fusion of region-aware transforms. Compared to the state-of-the-art approaches, our experiments show that our RAR achieves superior registration performance over various benchmark datasets (e.g. ModelNet40).
With the popularity of 3D sensors in self-driving and other robotics applications, extensive research has focused on designing novel neural network architectures for accurate 3D point cloud completion. However, unlike in point cloud classification and reconstruction, the role of adversarial samples in3D point cloud completion has seldom been explored. In this work, we show that training with adversarial samples can improve the performance of neural networks on 3D point cloud completion tasks. We propose a novel approach to generate adversarial samples that benefit both the performance of clean and adversarial samples. In contrast to the PGD-k attack, our method generates adversarial samples that keep the geometric features in clean samples and contain few outliers. In particular, we use principal directions to constrain the adversarial perturbations for each input point. The gradient components in the mean direction of principal directions are taken as adversarial perturbations. In addition, we also investigate the effect of using the minimum curvature direction. Besides, we adopt attack strength accumulation and auxiliary Batch Normalization layers method to speed up the training process and alleviate the distribution mismatch between clean and adversarial samples. Experimental results show that training with the adversarial samples crafted by our method effectively enhances the performance of PCN on the ShapeNet dataset.
Lane segmentation is a challenging issue in autonomous driving system designing because lane marks show weak textural consistency due to occlusion or extreme illumination but strong geometric continuity in traffic images, from which general convolution neural networks (CNNs) are not capable of learning semantic objects. To empower conventional CNNs in learning geometric clues of lanes, we propose a deep network named ContinuityLearner to better learn geometric prior within lane. Specifically, our proposed CNN-based paradigm involves a novel Context-encoding image feature learning network to generate class-dependent image feature maps and a new encoding layer to exploit the geometric continuity feature representation by fusing both spatial and visual information of lane together. The ContinuityLearner, performing on the geometric continuity feature of lanes, is trained to directly predict the lane in traffic scenarios with integrated and continuous instance semantic. The experimental results on the CULane dataset and the Tusimple benchmark demonstrate that our ContinuityLearner has superior performance over other state-of-the-art techniques in lane segmentation.
The conventional LoRa system is not able to sustain long-range communication over fading channels. To resolve the challenging issue, this paper investigates a two-hop opportunistic amplify-and-forward relaying LoRa system. Based on the best relay-selection protocol, the analytical and asymptotic bit error rate (BER), achievable diversity order, coverage probability, and throughput of the proposed system are derived over the Nakagamim fading channel. Simulative and numerical results show that although the proposed system reduces the throughput compared to the conventional LoRa system, it can significantly improve BER and coverage probability. Hence, the proposed system can be considered as a promising platform for low-power, long-range and highly reliable wireless-communication applications.
Recent research has seen numerous supervised learning-based methods for 3D shape segmentation and remarkable performance has been achieved on various benchmark datasets. These supervised methods require a large amount of annotated data to train deep neural networks to ensure the generalization ability on the unseen test set. In this paper, we introduce a meta-learning-based method for few-shot 3D shape segmentation where only a few labeled samples are provided for the unseen classes. To achieve this, we treat the shape segmentation as a point labeling problem in the metric space. Specifically, we first design a meta-metric learner to transform input shapes into embedding space and our model learns to learn a proper metric space for each object class based on point embeddings. Then, for each class, we design a metric learner to extract part-specific prototype representations from a few support shapes and our model performs per-point segmentation over the query shapes by matching each point to its nearest prototype in the learned metric space. A metric-based loss function is used to dynamically modify distances between point embeddings thus maximizes in-part similarity while minimizing inter-part similarity. A dual segmentation branch is adopted to make full use of the support information and implicitly encourages consistency between the support and query prototypes. We demonstrate the superior performance of our proposed on the ShapeNet part dataset under the few-shot scenario, compared with well-established baseline and state-of-the-art semi-supervised methods.
Analyzing the structure of proteins is a key part of understanding their functions and thus their role in biology at the molecular level. In addition, design new proteins in a methodical way is a major engineering challenge. In this work, we introduce a joint geometric-neural networks approach for comparing, deforming and generating 3D protein structures. Viewing protein structures as 3D open curves, we adopt the Square Root Velocity Function (SRVF) representation and leverage its suitable geometric properties along with Deep Residual Networks (ResNets) for a joint registration and comparison. Our ResNets handle better large protein deformations while being more computationally efficient. On top of the mathematical framework, we further design a Geometric Variational Auto-Encoder (G-VAE), that once trained, maps original, previously unseen structures, into a low-dimensional (latent) hyper-sphere. Motivated by the spherical structure of the pre-shape space, we naturally adopt the von Mises-Fisher (vMF) distribution to model our hidden variables. We test the effectiveness of our models by generating novel protein structures and predicting completions of corrupted protein structures. Experimental results show that our method is able to generate plausible structures, different from the structures in the training data.
Non-linear (large) time warping is a challenging source of nuisance in time-series analysis. In this paper, we propose a novel diffeomorphic temporal transformer network for both pairwise and joint time-series alignment. Our ResNet-TW (Deep Residual Network for Time Warping) tackles the alignment problem by compositing a flow of incremental diffeomorphic mappings. Governed by the flow equation, our Residual Network (ResNet) builds smooth, fluid and regular flows of velocity fields and consequently generates smooth and invertible transformations (i.e. diffeomorphic warping functions). Inspired by the elegant Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework, the final transformation is built by the flow of time-dependent vector fields which are none other than the building blocks of our Residual Network. The latter is naturally viewed as an Eulerian discretization schema of the flow equation (an ODE). Once trained, our ResNet-TW aligns unseen data by a single inexpensive forward pass. As we show in experiments on both univariate (84 datasets from UCR archive) and multivariate time-series (MSR Action-3D, Florence-3D and MSR Daily Activity), ResNet-TW achieves competitive performance in joint alignment and classification.
As an attempt to tackle the low-data-rate issue of the conventional LoRa systems, we propose two novel frequency-bin-index (FBI) LoRa schemes. In scheme I, the indices of starting frequency bins (SFBs) are utilized to carry the information bits. To facilitate the actual implementation, the SFBs of each LoRa signal are divided into several groups prior to the modulation process in the proposed FBI-LoRa system. To further improve the system flexibility, we formulate a generalized modulation scheme and propose scheme II by treating the SFB groups as an additional type of transmission entity. In scheme II, the combination of SFB indices and that of SFB group indices are both exploited to carry the information bits. We derive the theoretical expressions for bit-error-rate (BER) and throughput of the proposed FBI-LoRa system with two modulation schemes over additive white Gaussian noise (AWGN) and Rayleigh fading channels. Theoretical and simulation results show that the proposed FBI-LoRa schemes can significantly increases the transmission throughput compared with the existing LoRa systems at the expense of a slight loss in BER performance. Thanks to the appealing superiorities, the proposed FBI-LoRa system is a promising alternative for high-data-rate Internet of Things (IoT) applications.
In this work, we develop a pair of rate-diverse encoder and decoder for a two-user Gaussian multiple access channel (GMAC). The proposed scheme enables the users to transmit with the same codeword length but different coding rates under diverse user channel conditions. First, we propose the row-combining (RC) method and row-extending (RE) method to design practical low-density parity-check (LDPC) channel codes for rate-diverse GMAC. Second, we develop an iterative rate-diverse joint user messages decoding (RDJD) algorithm for GMAC, where all user messages are decoded with a single parity-check matrix. In contrast to the conventional network-coded multiple access (NCMA) and compute-forward multiple access (CFMA) schemes that first recover a linear combination of the transmitted codewords and then decode both user messages, this work can decode both the user messages simultaneously. Extrinsic information transfer (EXIT) chart analysis and simulation results indicate that RDJD can achieve gains up to 1.0 dB over NCMA and CFMA in the two-user GMAC. In particular, we show that there exists an optimal rate allocation for the two users to achieve the best decoding performance given the channel conditions and sum rate.
For short distance traveling in crowded urban areas, bike share services are becoming popular owing to the flexibility and convenience. To expand the service coverage, one of the key tasks is to seek new service ports, which requires to well understand the underlying features of the existing service ports. In this paper, we propose a new model, named for Efficient and Semantic Location Embedding (ESLE), which carries both geospatial and semantic information of the geo-locations. To generate ESLE, we first train a multi-label model with a deep Convolutional Neural Network (CNN) by feeding the static map-tile images and then extract location embedding vectors from the model. Compared to most recent relevant literature, ESLE is not only much cheaper in computation, but also easier to interpret via a systematic semantic analysis. Finally, we apply ESLE to seek new service ports for NTT DOCOMO's bike share services operated in Japan. The initial results demonstrate the effectiveness of ESLE, and provide a few insights that might be difficult to discover by using the conventional approaches.