In pace with the electronic technology development and the production technology improvement, industrial robot Give Scope to the Advantage in social services and industrial production. However, due to long-term mechanical wear and structural deformation, the absolute positioning accuracy is low, which greatly hinders the development of manufacturing industry. Calibrating the kinematic parameters of the robot is an effective way to address it. However, the main measuring equipment such as laser trackers and coordinate measuring machines are expensive and need special personnel to operate. Additionally, in the measurement process, due to the influence of many environmental factors, measurement noises are generated, which will affect the calibration accuracy of the robot. Basing on these, we have done the following work: a) developing a robot calibration method based on plane constraint to simplify measurement steps; b) employing Square-root Culture Kalman Filter (SCKF) algorithm for reducing the influence of measurement noises; c) proposing a novel algorithm for identifying kinematic parameters based on SCKF algorithm and Levenberg Marquardt (LM) algorithm to achieve the high calibration accuracy; d) adopting the dial indicator as the measuring equipment for slashing costs. The enough experiments verify the effectiveness of the proposed calibration algorithm and experimental platform.
Recently, an innovative matrix CFAR detection scheme based on information geometry, also referred to as the geometric detector, has been developed speedily and exhibits distinct advantages in several practical applications. These advantages benefit from the geometry of the Toeplitz Hermitian positive definite (HPD) manifold $\mathcal{M}_{\mathcal{T}H_{++}}$, but the sophisticated geometry also results in some challenges for geometric detectors, such as the implementation of the enhanced detector to improve the SCR (signal-to-clutter ratio) and the analysis of the detection performance. To meet these challenges, this paper develops the dual power spectrum manifold $\mathcal{M}_{\text{P}}$ as the dual space of $\mathcal{M}_{\mathcal{T}H_{++}}$. For each affine invariant geometric measure on $\mathcal{M}_{\mathcal{T}H_{++}}$, we show that there exists an equivalent function named induced potential function on $\mathcal{M}_{\text{P}}$. By the induced potential function, the measurements of the dissimilarity between two matrices can be implemented on $\mathcal{M}_{\text{P}}$, and the geometric detectors can be reformulated as the form related to the power spectrum. The induced potential function leads to two contributions: 1) The enhancement of the geometric detector, which is formulated as an optimization problem concerning $\mathcal{M}_{\mathcal{T}H_{++}}$, is transformed to an equivalent and simpler optimization on $\mathcal{M}_{\text{P}}$. In the presented example of the enhancement, the closed-form solution, instead of the gradient descent method, is provided through the equivalent optimization. 2) The detection performance is analyzed based on $\mathcal{M}_{\text{P}}$, and the advantageous characteristics, which benefit the detection performance, can be deduced by analyzing the corresponding power spectrum to the maximal point of the induced potential function.
Image restoration and enhancement is a process of improving the image quality by removing degradations, such as noise, blur, and resolution degradation. Deep learning (DL) has recently been applied to image restoration and enhancement. Due to its ill-posed property, plenty of works have explored priors to facilitate training deep neural networks (DNNs). However, the importance of priors has not been systematically studied and analyzed by far in the research community. Therefore, this paper serves as the first study that provides a comprehensive overview of recent advancements of priors for deep image restoration and enhancement. Our work covers five primary contents: (1) A theoretical analysis of priors for deep image restoration and enhancement; (2) A hierarchical and structural taxonomy of priors commonly used in the DL-based methods; (3) An insightful discussion on each prior regarding its principle, potential, and applications; (4) A summary of crucial problems by highlighting the potential future directions to spark more research in the community; (5) An open-source repository that provides a taxonomy of all mentioned works and code links.
Driven by the ever-increasing requirements of autonomous vehicles, such as traffic monitoring and driving assistant, deep learning-based object detection (DL-OD) has been increasingly attractive in intelligent transportation systems. However, it is difficult for the existing DL-OD schemes to realize the responsible, cost-saving, and energy-efficient autonomous vehicle systems due to low their inherent defects of low timeliness and high energy consumption. In this paper, we propose an object detection (OD) system based on edge-cloud cooperation and reconstructive convolutional neural networks, which is called Edge YOLO. This system can effectively avoid the excessive dependence on computing power and uneven distribution of cloud computing resources. Specifically, it is a lightweight OD framework realized by combining pruning feature extraction network and compression feature fusion network to enhance the efficiency of multi-scale prediction to the largest extent. In addition, we developed an autonomous driving platform equipped with NVIDIA Jetson for system-level verification. We experimentally demonstrate the reliability and efficiency of Edge YOLO on COCO2017 and KITTI data sets, respectively. According to COCO2017 standard datasets with a speed of 26.6 frames per second (FPS), the results show that the number of parameters in the entire network is only 25.67 MB, while the accuracy (mAP) is up to 47.3%.
Registration is a basic yet crucial task in point cloud processing. In correspondence-based point cloud registration, matching correspondences by point feature techniques may lead to an extremely high outlier ratio. Current methods still suffer from low efficiency, accuracy, and recall rate. We use a simple and intuitive method to describe the 6-DOF (degree of freedom) curtailment process in point cloud registration and propose an outlier removal strategy based on the reliability of the correspondence graph. The method constructs the corresponding graph according to the given correspondences and designs the concept of the reliability degree of the graph node for optimal candidate selection and the reliability degree of the graph edge to obtain the global maximum consensus set. The presented method could achieve fast and accurate outliers removal along with gradual aligning parameters estimation. Extensive experiments on simulations and challenging real-world datasets demonstrate that the proposed method can still perform effective point cloud registration even the correspondence outlier ratio is over 99%, and the efficiency is better than the state-of-the-art. Code is available at https://github.com/WPC-WHU/GROR.
Recent years have witnessed increasing interest in code representation learning, which aims to represent the semantics of source code into distributed vectors. Currently, various works have been proposed to represent the complex semantics of source code from different views, including plain text, Abstract Syntax Tree (AST), and several kinds of code graphs (e.g., Control/Data Flow Graph). However, most of them only consider a single view of source code independently, ignoring the correspondences among different views. In this paper, we propose to integrate different views with the natural-language description of source code into a unified framework with Multi-View contrastive Pre-training, and name our model as CODE-MVP. Specifically, we first extract multiple code views using compiler tools, and learn the complementary information among them under a contrastive learning framework. Inspired by the type checking in compilation, we also design a fine-grained type inference objective in the pre-training. Experiments on three downstream tasks over five datasets demonstrate the superiority of CODE-MVP when compared with several state-of-the-art baselines. For example, we achieve 2.4/2.3/1.1 gain in terms of MRR/MAP/Accuracy metrics on natural language code retrieval, code similarity, and code defect detection tasks, respectively.
We introduce a sampling based machine learning approach, Monte Carlo physics informed neural networks (MC-PINNs), for solving forward and inverse fractional partial differential equations (FPDEs). As a generalization of physics informed neural networks (PINNs), our method relies on deep neural network surrogates in addition to a stochastic approximation strategy for computing the fractional derivatives of the DNN outputs. A key ingredient in our MC-PINNs is to construct an unbiased estimation of the physical soft constraints in the loss function. Our directly sampling approach can yield less overall computational cost compared to fPINNs proposed in \cite{pang2019fpinns} and thus provide an opportunity for solving high dimensional fractional PDEs. We validate the performance of MC-PINNs method via several examples that include high dimensional integral fractional Laplacian equations, parametric identification of time-space fractional PDEs, and fractional diffusion equation with random inputs. The results show that MC-PINNs is flexible and promising to tackle high-dimensional FPDEs.
Optical time-domain reflectometry (OTDR) is the basis for distributed time-domain optical fiber sensing techniques. By injecting pulse light into an optical fiber, the distance information of an event can be obtained based on the time of light flight. The minimum distinguishable event separation along the fiber length is called the spatial resolution, which is determined by the optical pulse width. By reducing the pulse width, the spatial resolution can be improved. However, at the same time, the signal-to-noise ratio of the system is degraded, and higher speed equipment is required. To solve this problem, data processing methods such as iterative subdivision, deconvolution, and neural networks have been proposed. However, they all have some shortcomings and thus have not been widely applied. Here, we propose and experimentally demonstrate an OTDR deconvolution neural network based on deep convolutional neural networks. A simplified OTDR model is built to generate a large amount of training data. By optimizing the network structure and training data, an effective OTDR deconvolution is achieved. The simulation and experimental results show that the proposed neural network can achieve more accurate deconvolution than the conventional deconvolution algorithm with a higher signal-to-noise ratio.
Automatically generating compilable programs with (or without) natural language descriptions has always been a touchstone problem for computational linguistics and automated software engineering. Existing deep-learning approaches model code generation as text generation, either constrained by grammar structures in decoder, or driven by pre-trained language models on large-scale code corpus (e.g., CodeGPT, PLBART, and CodeT5). However, few of them account for compilability of the generated programs. To improve compilability of the generated programs, this paper proposes COMPCODER, a three-stage pipeline utilizing compiler feedback for compilable code generation, including language model fine-tuning, compilability reinforcement, and compilability discrimination. Comprehensive experiments on two code generation tasks demonstrate the effectiveness of our proposed approach, improving the success rate of compilation from 44.18 to 89.18 in code completion on average and from 70.3 to 96.2 in text-to-code generation, respectively, when comparing with the state-of-the-art CodeGPT.
Pioneering dual-encoder pre-training works (e.g., CLIP and ALIGN) have revealed the potential of aligning multi-modal representations with contrastive learning. However, these works require a tremendous amount of data and computational resources (e.g., billion-level web data and hundreds of GPUs), which prevent researchers with limited resources from reproduction and further exploration. To this end, we explore a stack of simple but effective heuristics, and provide a comprehensive training guidance, which allows us to conduct dual-encoder multi-modal representation alignment with limited resources. We provide a reproducible strong baseline of competitive results, namely ZeroVL, with only 14M publicly accessible academic datasets and 8 V100 GPUs. Additionally, we collect 100M web data for pre-training, and achieve comparable or superior results than state-of-the-art methods, further proving the effectiveness of our method on large-scale data. We hope that this work will provide useful data points and experience for future research in multi-modal pre-training. Our code is available at https://github.com/zerovl/ZeroVL.