Current monocular-based 6D object pose estimation methods generally achieve less competitive results than RGBD-based methods, mostly due to the lack of 3D information. To make up this gap, this paper proposes a 3D geometric volume based pose estimation method with a short baseline two-view setting. By constructing a geometric volume in the 3D space, we combine the features from two adjacent images to the same 3D space. Then a network is trained to learn the distribution of the position of object keypoints in the volume, and a robust soft RANSAC solver is deployed to solve the pose in closed form. To balance accuracy and cost, we propose a coarse-to-fine framework to improve the performance in an iterative way. The experiments show that our method outperforms state-of-the-art monocular-based methods, and is robust in different objects and scenes, especially in serious occlusion situations.
With the recent advance of deep learning based object recognition and estimation, it is possible to consider object level SLAM where the pose of each object is estimated in the SLAM process. In this paper, based on a novel Lie group structure, a right invariant extended Kalman filter (RI-EKF) for object based SLAM is proposed. The observability analysis shows that the proposed algorithm automatically maintains the correct unobservable subspace, while standard EKF (Std-EKF) based SLAM algorithm does not. This results in a better consistency for the proposed algorithm comparing to Std-EKF. Finally, simulations and real world experiments validate not only the consistency and accuracy of the proposed algorithm, but also the practicability of the proposed RI-EKF for object based SLAM problem. The MATLAB code of the algorithm is made publicly available.
It is reported that the number of online payment users in China has reached 854 million; with the emergence of community e-commerce platforms, the trend of integration of e-commerce and social applications is increasingly intense. Community e-commerce is not a mature and sound comprehensive e-commerce with fewer categories and low brand value. To effectively retain community users and fully explore customer value has become an important challenge for community e-commerce operators. Given the above problems, this paper uses the data-driven method to study the prediction of community e-commerce customers' repurchase behaviour. The main research contents include 1. Given the complex problem of feature engineering, the classic model RFM in the field of customer relationship management is improved, and an improved model is proposed to describe the characteristics of customer buying behaviour, which includes five indicators. 2. In view of the imbalance of machine learning training samples in SMOTE-ENN, a training sample balance using SMOTE-ENN is proposed. The experimental results show that the machine learning model can be trained more effectively on balanced samples. 3. Aiming at the complexity of the parameter adjustment process, an automatic hyperparameter optimization method based on the TPE method was proposed. Compared with other methods, the model's prediction performance is improved, and the training time is reduced by more than 450%. 4. Aiming at the weak prediction ability of a single model, the soft voting based RF-LightgBM model was proposed. The experimental results show that the RF-LighTGBM model proposed in this paper can effectively predict customer repurchase behaviour, and the F1 value is 0.859, which is better than the single model and previous research results.
The smartphone and laptop can be unlocked by face or fingerprint recognition, while neural networks which confront numerous requests every day have little capability to distinguish between untrustworthy and credible users. It makes model risky to be traded as a commodity. Existed research either focuses on the intellectual property rights ownership of the commercialized model, or traces the source of the leak after pirated models appear. Nevertheless, active identifying users legitimacy before predicting output has not been considered yet. In this paper, we propose Model-Lock (M-LOCK) to realize an end-to-end neural network with local dynamic access control, which is similar to the automatic locking function of the smartphone to prevent malicious attackers from obtaining available performance actively when you are away. Three kinds of model training strategy are essential to achieve the tremendous performance divergence between certified and suspect input in one neural network. Extensive experiments based on MNIST, FashionMNIST, CIFAR10, CIFAR100, SVHN and GTSRB datasets demonstrated the feasibility and effectiveness of the proposed scheme.
The fog radio access network (Fog-RAN) has been considered a promising wireless access architecture to help shorten the communication delay and relieve the large data delivery burden over the backhaul links. However, limited by conventional inflexible communication design, Fog-RAN cannot be used in some complex communication scenarios. In this study, we focus on investigating a more intelligent Fog-RAN to assist the communication in a high-speed railway environment. Due to the train's continuously moving, the communication should be designed intelligently to adapt to channel variation. Specifically, we dynamically optimize the power allocation in the remote radio heads (RRHs) to minimize the total network power cost considering multiple quality-of-service (QoS) requirements and channel variation. The impact of caching on the power allocation is considered. The dynamic power optimization is analyzed to obtain a closed-form solution in certain cases. The inherent tradeoff among the total network cost, delay and delivery content size is further discussed. To evaluate the performance of the proposed dynamic power allocation, we present an invariant power allocation counterpart as a performance comparison benchmark. The result of our simulation reveals that dynamic power allocation can significantly outperform the invariant power allocation scheme, especially with a random caching strategy or limited caching resources at the RRHs.
With the aim of matching a pair of instances from two different modalities, cross modality mapping has attracted growing attention in the computer vision community. Existing methods usually formulate the mapping function as the similarity measure between the pair of instance features, which are embedded to a common space. However, we observe that the relationships among the instances within a single modality (intra relations) and those between the pair of heterogeneous instances (inter relations) are insufficiently explored in previous approaches. Motivated by this, we redefine the mapping function with relational reasoning via graph modeling, and further propose a GCN-based Relational Reasoning Network (RR-Net) in which inter and intra relations are efficiently computed to universally resolve the cross modality mapping problem. Concretely, we first construct two kinds of graph, i.e., Intra Graph and Inter Graph, to respectively model intra relations and inter relations. Then RR-Net updates all the node features and edge features in an iterative manner for learning intra and inter relations simultaneously. Last, RR-Net outputs the probabilities over the edges which link a pair of heterogeneous instances to estimate the mapping results. Extensive experiments on three example tasks, i.e., image classification, social recommendation and sound recognition, clearly demonstrate the superiority and universality of our proposed model.
The ever-increasing data traffic, various delay-sensitive services, and the massive deployment of energy-limited Internet of Things (IoT) devices have brought huge challenges to the current communication networks, motivating academia and industry to move to the sixth-generation (6G) network. With the powerful capability of data transmission and processing, 6G is considered as an enabler for IoT communication with low latency and energy cost. In this paper, we propose an artificial intelligence (AI) and intelligent reflecting surface (IRS) empowered energy-efficiency communication system for 6G IoT. First, we design a smart and efficient communication architecture including the IRS-aided data transmission and the AI-driven network resource management mechanisms. Second, an energy efficiency-maximizing model under given transmission latency for 6G IoT system is formulated, which jointly optimizes the settings of all communication participants, i.e. IoT transmission power, IRS-reflection phase shift, and BS detection matrix. Third, a deep reinforcement learning (DRL) empowered network resource control and allocation scheme is proposed to solve the formulated optimization model. Based on the network and channel status, the DRL-enabled scheme facilities the energy-efficiency and low-latency communication. Finally, experimental results verified the effectiveness of our proposed communication system for 6G IoT.
Real-time proprioception is a challenging problem for soft robots, which have almost infinite degrees-of-freedom in body deformation. When multiple actuators are used, it becomes more difficult as deformation can also occur on actuators caused by interaction between each other. To tackle this problem, we present a method in this paper to sense and reconstruct 3D deformation on pneumatic soft robots by first integrating multiple low-cost sensors inside the chambers of pneumatic actuators and then using machine learning to convert the captured signals into shape parameters of soft robots. An exterior motion capture system is employed to generate the datasets for both training and testing. With the help of good shape parameterization, the 3D shape of a soft robot can be accurately reconstructed from signals obtained from multiple sensors. We demonstrate the effectiveness of this approach on two designs of soft robots -- a robotic joint and a deformable membrane. After parameterizing the deformation of these soft robots into compact shape parameters, we can effectively train the neural networks to reconstruct the 3D deformation from the sensor signals. The sensing and shape prediction pipeline can run at 50Hz in real-time on a consumer-level device.
Recommender systems are popular tools for information retrieval tasks on a large variety of web applications and personalized products. In this work, we propose a Generative Adversarial Network based recommendation framework using a positive-unlabeled sampling strategy. Specifically, we utilize the generator to learn the continuous distribution of user-item tuples and design the discriminator to be a binary classifier that outputs the relevance score between each user and each item. Meanwhile, positive-unlabeled sampling is applied in the learning procedure of the discriminator. Theoretical bounds regarding positive-unlabeled sampling and optimalities of convergence for the discriminators and the generators are provided. We show the effectiveness and efficiency of our framework on three publicly accessible data sets with eight ranking-based evaluation metrics in comparison with thirteen popular baselines.
Autonomous navigation has played an increasingly significant role in quadruped robots system. However, existing works on path planning used traditional search-based or sample-based methods which did not consider the kinodynamic characteristics of quadruped robots. And paths generated by these methods contain kinodynamically infeasible parts, which are difficult to track. In the present work, we introduced a complete navigation system considering the omnidirectional abilities of quadruped robots. First, we use kinodynamic path finding method to obtain smooth, dynamically feasible, time-optimal initial paths and added collision cost as a soft constraint to ensure safety. Then the trajectory is refined by timed elastic band (TEB) method based on the omnidirectional model of quadruped robot. The superior performance of our work is demonstrated through simulated comparisons and by using our quadruped robot Jueying Mini in our experiments.