Generating a set of high-quality correspondences or matches is one of the most critical steps in point cloud registration. This paper proposes a learning framework COTReg by jointly considering the pointwise and structural matchings to predict correspondences of 3D point cloud registration. Specifically, we transform the two matchings into a Wasserstein distance-based and a Gromov-Wasserstein distance-based optimizations, respectively. Thus the task of establishing the correspondences can be naturally reshaped to a coupled optimal transport problem. Furthermore, we design a network to predict the confidence score of being an inlier for each point of the point clouds, which provides the overlap region information to generate correspondences. Our correspondence prediction pipeline can be easily integrated into either learning-based features like FCGF or traditional descriptors like FPFH. We conducted comprehensive experiments on 3DMatch, KITTI, 3DCSR, and ModelNet40 benchmarks, showing the state-of-art performance of the proposed method.
Aiming to generalize the model trained in source domains to unseen target domains, domain generalization (DG) has attracted lots of attention recently. The key issue of DG is how to prevent overfitting to the observed source domains because target domain is unavailable during training. We investigate that overfitting not only causes the inferior generalization ability to unseen target domains but also leads unstable prediction in the test stage. In this paper, we observe that both sampling multiple tasks in training stage and generating augmented images in test stage largely benefit generalization performance. Thus, by treating tasks and images as different views, we propose a novel multi-view DG framework. Specifically, in training stage, to enhance generalization ability, we develop a multi-view regularized meta-learning algorithm that employs multiple tasks to produce a suitable optimization direction during updating model. In test stage, to alleviate unstable prediction, we utilize multiple augmented images to yield multi-view prediction, which significantly promotes model reliability via fusing the results of different views of a test image. Extensive experiments on three benchmark datasets validate our method outperforms several state-of-the-art approaches.
The Jaccard index, also known as Intersection-over-Union (IoU), is one of the most critical evaluation metrics in image semantic segmentation. However, direct optimization of IoU score is very difficult because the learning objective is neither differentiable nor decomposable. Although some algorithms have been proposed to optimize its surrogates, there is no guarantee provided for the generalization ability. In this paper, we propose a margin calibration method, which can be directly used as a learning objective, for an improved generalization of IoU over the data-distribution, underpinned by a rigid lower bound. This scheme theoretically ensures a better segmentation performance in terms of IoU score. We evaluated the effectiveness of the proposed margin calibration method on seven image datasets, showing substantial improvements in IoU score over other learning objectives using deep segmentation models.
Hyperspectral imaging is an essential imaging modality for a wide range of applications, especially in remote sensing, agriculture, and medicine. Inspired by existing hyperspectral cameras that are either slow, expensive, or bulky, reconstructing hyperspectral images (HSIs) from a low-budget snapshot measurement has drawn wide attention. By mapping a truncated numerical optimization algorithm into a network with a fixed number of phases, recent deep unfolding networks (DUNs) for spectral snapshot compressive sensing (SCI) have achieved remarkable success. However, DUNs are far from reaching the scope of industrial applications limited by the lack of cross-phase feature interaction and adaptive parameter adjustment. In this paper, we propose a novel Hyperspectral Explicable Reconstruction and Optimal Sampling deep Network for SCI, dubbed HerosNet, which includes several phases under the ISTA-unfolding framework. Each phase can flexibly simulate the sensing matrix and contextually adjust the step size in the gradient descent step, and hierarchically fuse and interact the hidden states of previous phases to effectively recover current HSI frames in the proximal mapping step. Simultaneously, a hardware-friendly optimal binary mask is learned end-to-end to further improve the reconstruction performance. Finally, our HerosNet is validated to outperform the state-of-the-art methods on both simulation and real datasets by large margins.
Accurate and efficient point cloud registration is a challenge because the noise and a large number of points impact the correspondence search. This challenge is still a remaining research problem since most of the existing methods rely on correspondence search. To solve this challenge, we propose a new data-driven registration algorithm by investigating deep generative neural networks to point cloud registration. Given two point clouds, the motivation is to generate the aligned point clouds directly, which is very useful in many applications like 3D matching and search. We design an end-to-end generative neural network for aligned point clouds generation to achieve this motivation, containing three novel components. Firstly, a point multi-perception layer (MLP) mixer (PointMixer) network is proposed to efficiently maintain both the global and local structure information at multiple levels from the self point clouds. Secondly, a feature interaction module is proposed to fuse information from cross point clouds. Thirdly, a parallel and differential sample consensus method is proposed to calculate the transformation matrix of the input point clouds based on the generated registration results. The proposed generative neural network is trained in a GAN framework by maintaining the data distribution and structure similarity. The experiments on both ModelNet40 and 7Scene datasets demonstrate that the proposed algorithm achieves state-of-the-art accuracy and efficiency. Notably, our method reduces $2\times$ in registration error (CD) and $12\times$ running time compared to the state-of-the-art correspondence-based algorithm.
With the rapid development of deep learning techniques, various recent work has tried to apply graph neural networks (GNNs) to solve NP-hard problems such as Boolean Satisfiability (SAT), which shows the potential in bridging the gap between machine learning and symbolic reasoning. However, the quality of solutions predicted by GNNs has not been well investigated in the literature. In this paper, we study the capability of GNNs in learning to solve Maximum Satisfiability (MaxSAT) problem, both from theoretical and practical perspectives. We build two kinds of GNN models to learn the solution of MaxSAT instances from benchmarks, and show that GNNs have attractive potential to solve MaxSAT problem through experimental evaluation. We also present a theoretical explanation of the effect that GNNs can learn to solve MaxSAT problem to some extent for the first time, based on the algorithmic alignment theory.
This paper introduces a notation of $\varepsilon$-weakened robustness for analyzing the reliability and stability of deep neural networks (DNNs). Unlike the conventional robustness, which focuses on the "perfect" safe region in the absence of adversarial examples, $\varepsilon$-weakened robustness focuses on the region where the proportion of adversarial examples is bounded by user-specified $\varepsilon$. Smaller $\varepsilon$ means a smaller chance of failure. Under such robustness definition, we can give conclusive results for the regions where conventional robustness ignores. We prove that the $\varepsilon$-weakened robustness decision problem is PP-complete and give a statistical decision algorithm with user-controllable error bound. Furthermore, we derive an algorithm to find the maximum $\varepsilon$-weakened robustness radius. The time complexity of our algorithms is polynomial in the dimension and size of the network. So, they are scalable to large real-world networks. Besides, We also show its potential application in analyzing quality issues.
In this work we address the inverse kinetics problem of motion planning of the soft actuators driven by three chambers. Although the mathematical model describing inverse dynamics of this kind of actuator can been employed, this model is still a complex system. On the one hand, the differential equations are nonlinear, therefore, it is very difficult and time consuming to get the analytical solutions. Since the exact solutions of the mechanical model are not available, the elements of the Jacobian matrix cannot be calculated. On the other hand, material model is a complicated system with significant nonlinearity, non-stationarity, and uncertainty, making it challenging to develop an appropriate system model. To overcome these intrinsic problems, we propose a back-propagation (BP) neural network learning the inverse kinetics of the soft manipulator moving in three-dimensional space. After the training, the BP neural network model can represent the relation between the manipulator tip position and the pressures applied to the chambers. The proposed algorithm is very precise, and computationally efficient. The results show that a desired terminal position can be achieved with a degree of accuracy of 2.59% relative average error with respect to the total actuator length, demonstrate the ability of the model to realize inverse kinematic control.
Mapping a truncated optimization method into a deep neural network, deep unfolding network (DUN) has attracted growing attention in compressive sensing (CS) due to its good interpretability and high performance. Each stage in DUNs corresponds to one iteration in optimization. By understanding DUNs from the perspective of the human brain's memory processing, we find there exists two issues in existing DUNs. One is the information between every two adjacent stages, which can be regarded as short-term memory, is usually lost seriously. The other is no explicit mechanism to ensure that the previous stages affect the current stage, which means memory is easily forgotten. To solve these issues, in this paper, a novel DUN with persistent memory for CS is proposed, dubbed Memory-Augmented Deep Unfolding Network (MADUN). We design a memory-augmented proximal mapping module (MAPMM) by combining two types of memory augmentation mechanisms, namely High-throughput Short-term Memory (HSM) and Cross-stage Long-term Memory (CLM). HSM is exploited to allow DUNs to transmit multi-channel short-term memory, which greatly reduces information loss between adjacent stages. CLM is utilized to develop the dependency of deep information across cascading stages, which greatly enhances network representation capability. Extensive CS experiments on natural and MR images show that with the strong ability to maintain and balance information our MADUN outperforms existing state-of-the-art methods by a large margin. The source code is available at https://github.com/jianzhangcs/MADUN/.