Alert button
Picture for Xuguang Lan

Xuguang Lan

Alert button

ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint

Jul 29, 2023
Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jicong Fan, Jie Zhang, Jia Jia, Ning Hu, Xingyu Chen, Xuguang Lan

Figure 1 for ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint
Figure 2 for ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint
Figure 3 for ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint
Figure 4 for ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint

Large-scale online recommender system spreads all over the Internet being in charge of two basic tasks: Click-Through Rate (CTR) and Post-Click Conversion Rate (CVR) estimations. However, traditional CVR estimators suffer from well-known Sample Selection Bias and Data Sparsity issues. Entire space models were proposed to address the two issues via tracing the decision-making path of "exposure_click_purchase". Further, some researchers observed that there are purchase-related behaviors between click and purchase, which can better draw the user's decision-making intention and improve the recommendation performance. Thus, the decision-making path has been extended to "exposure_click_in-shop action_purchase" and can be modeled with conditional probability approach. Nevertheless, we observe that the chain rule of conditional probability does not always hold. We report Probability Space Confusion (PSC) issue and give a derivation of difference between ground-truth and estimation mathematically. We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue. Specifically, we handle "exposure_click_in-shop action" and "in-shop action_purchase" separately in the light of characteristics of in-shop action. The first path is still treated with conditional probability while the second one is treated with parameter constraint strategy. Experiments on both offline and online environments in a large-scale recommendation system illustrate the superiority of our proposed methods over state-of-the-art models. The real-world datasets will be released.

Viaarxiv icon

MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Apr 25, 2023
Han Wang, Jiayuan Zhang, Lipeng Wan, Xingyu Chen, Xuguang Lan, Nanning Zheng

Figure 1 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes
Figure 2 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes
Figure 3 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes
Figure 4 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Manipulation relationship detection (MRD) aims to guide the robot to grasp objects in the right order, which is important to ensure the safety and reliability of grasping in object stacked scenes. Previous works infer manipulation relationship by deep neural network trained with data collected from a predefined view, which has limitation in visual dislocation in unstructured environments. Multi-view data provide more comprehensive information in space, while a challenge of multi-view MRD is domain shift. In this paper, we propose a novel multi-view fusion framework, namely multi-view MRD network (MMRDN), which is trained by 2D and 3D multi-view data. We project the 2D data from different views into a common hidden space and fit the embeddings with a set of Von-Mises-Fisher distributions to learn the consistent representations. Besides, taking advantage of position information within the 3D data, we select a set of $K$ Maximum Vertical Neighbors (KMVN) points from the point cloud of each object pair, which encodes the relative position of these two objects. Finally, the features of multi-view 2D and 3D data are concatenated to predict the pairwise relationship of objects. Experimental results on the challenging REGRAD dataset show that MMRDN outperforms the state-of-the-art methods in multi-view MRD tasks. The results also demonstrate that our model trained by synthetic data is capable to transfer to real-world scenarios.

Viaarxiv icon

Prioritized Planning for Target-Oriented Manipulation via Hierarchical Stacking Relationship Prediction

Mar 14, 2023
Zewen Wu, Jian Tang, Xingyu Chen, Chengzhong Ma, Xuguang Lan, Nanning Zheng

Figure 1 for Prioritized Planning for Target-Oriented Manipulation via Hierarchical Stacking Relationship Prediction
Figure 2 for Prioritized Planning for Target-Oriented Manipulation via Hierarchical Stacking Relationship Prediction
Figure 3 for Prioritized Planning for Target-Oriented Manipulation via Hierarchical Stacking Relationship Prediction
Figure 4 for Prioritized Planning for Target-Oriented Manipulation via Hierarchical Stacking Relationship Prediction

In scenarios involving the grasping of multiple targets, the learning of stacking relationships between objects is fundamental for robots to execute safely and efficiently. However, current methods lack subdivision for the hierarchy of stacking relationship types. In scenes where objects are mostly stacked in an orderly manner, they are incapable of performing human-like and high-efficient grasping decisions. This paper proposes a perception-planning method to distinguish different stacking types between objects and generate prioritized manipulation order decisions based on given target designations. We utilize a Hierarchical Stacking Relationship Network (HSRN) to discriminate the hierarchy of stacking and generate a refined Stacking Relationship Tree (SRT) for relationship description. Considering that objects with high stacking stability can be grasped together if necessary, we introduce an elaborate decision-making planner based on the Partially Observable Markov Decision Process (POMDP), which leverages observations and generates the least grasp-consuming decision chain with robustness and is suitable for simultaneously specifying multiple targets. To verify our work, we set the scene to the dining table and augment the REGRAD dataset with a set of common tableware models for network training. Experiments show that our method effectively generates grasping decisions that conform to human requirements, and improves the implementation efficiency compared with existing methods on the basis of guaranteeing the success rate.

* 8 pages, 8 figures 
Viaarxiv icon

Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

Nov 22, 2022
Lipeng Wan, Zeyang Liu, Xingyu Chen, Xuguang Lan, Nanning Zheng

Figure 1 for Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning
Figure 2 for Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning
Figure 3 for Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning
Figure 4 for Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning methods with linear value decomposition (LVD) or monotonic value decomposition (MVD) suffer from relative overgeneralization. As a result, they can not ensure optimal consistency (i.e., the correspondence between individual greedy actions and the maximal true Q value). In this paper, we derive the expression of the joint Q value function of LVD and MVD. According to the expression, we draw a transition diagram, where each self-transition node (STN) is a possible convergence. To ensure optimal consistency, the optimal node is required to be the unique STN. Therefore, we propose the greedy-based value representation (GVR), which turns the optimal node into an STN via inferior target shaping and further eliminates the non-optimal STNs via superior experience replay. In addition, GVR achieves an adaptive trade-off between optimality and stability. Our method outperforms state-of-the-art baselines in experiments on various benchmarks. Theoretical proofs and empirical results on matrix games demonstrate that GVR ensures optimal consistency under sufficient exploration.

* arXiv admin note: substantial text overlap with arXiv:2112.04454 
Viaarxiv icon

Robotic Grasping from Classical to Modern: A Survey

Feb 08, 2022
Hanbo Zhang, Jian Tang, Shiguang Sun, Xuguang Lan

Figure 1 for Robotic Grasping from Classical to Modern: A Survey
Figure 2 for Robotic Grasping from Classical to Modern: A Survey
Figure 3 for Robotic Grasping from Classical to Modern: A Survey
Figure 4 for Robotic Grasping from Classical to Modern: A Survey

Robotic Grasping has always been an active topic in robotics since grasping is one of the fundamental but most challenging skills of robots. It demands the coordination of robotic perception, planning, and control for robustness and intelligence. However, current solutions are still far behind humans, especially when confronting unstructured scenarios. In this paper, we survey the advances of robotic grasping, starting from the classical formulations and solutions to the modern ones. By reviewing the history of robotic grasping, we want to provide a complete view of this community, and perhaps inspire the combination and fusion of different ideas, which we think would be helpful to touch and explore the essence of robotic grasping problems. In detail, we firstly give an overview of the analytic methods for robotic grasping. After that, we provide a discussion on the recent state-of-the-art data-driven grasping approaches rising in recent years. With the development of computer vision, semantic grasping is being widely investigated and can be the basis of intelligent manipulation and skill learning for autonomous robotic systems in the future. Therefore, in our survey, we also briefly review the recent progress in this topic. Finally, we discuss the open problems and the future research directions that may be important for the human-level robustness, autonomy, and intelligence of robots.

Viaarxiv icon

Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

Sep 24, 2021
Deyu Yang, Hanbo Zhang, Xuguang Lan, Jishiyu Ding

Figure 1 for Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards
Figure 2 for Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards
Figure 3 for Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards
Figure 4 for Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

Multi-goal reinforcement learning (RL) aims to qualify the agent to accomplish multi-goal tasks, which is of great importance in learning scalable robotic manipulation skills. However, reward engineering always requires strenuous efforts in multi-goal RL. Moreover, it will introduce inevitable bias causing the suboptimality of the final policy. The sparse reward provides a simple yet efficient way to overcome such limits. Nevertheless, it harms the exploration efficiency and even hinders the policy from convergence. In this paper, we propose a density-based curriculum learning method for efficient exploration with sparse rewards and better generalization to desired goal distribution. Intuitively, our method encourages the robot to gradually broaden the frontier of its ability along the directions to cover the entire desired goal space as much and quickly as possible. To further improve data efficiency and generality, we augment the goals and transitions within the allowed region during training. Finally, We evaluate our method on diversified variants of benchmark manipulation tasks that are challenging for existing methods. Empirical results show that our method outperforms the state-of-the-art baselines in terms of both data efficiency and success rate.

* 8 pages, 7 figures 
Viaarxiv icon

MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection

Aug 29, 2021
Xun Tan, Xingyu Chen, Guowei Zhang, Jishiyu Ding, Xuguang Lan

Figure 1 for MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection
Figure 2 for MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection
Figure 3 for MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection
Figure 4 for MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection

Point clouds and images could provide complementary information when representing 3D objects. Fusing the two kinds of data usually helps to improve the detection results. However, it is challenging to fuse the two data modalities, due to their different characteristics and the interference from the non-interest areas. To solve this problem, we propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection. The proposed detector has two stages. In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion (AAF) modules to produce cross-modal fusion features from single-modal semantic features. In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement. A novel attention-based hybrid sampling strategy is also proposed for selecting key points in the downsampling process. We evaluate our approach on two widely used benchmark datasets including KITTI and SUN-RGBD. The experimental results demonstrate the advantages of our method over state-of-the-art approaches.

Viaarxiv icon

Probabilistic Human Motion Prediction via A Bayesian Neural Network

Jul 14, 2021
Jie Xu, Xingyu Chen, Xuguang Lan, Nanning Zheng

Figure 1 for Probabilistic Human Motion Prediction via A Bayesian Neural Network
Figure 2 for Probabilistic Human Motion Prediction via A Bayesian Neural Network
Figure 3 for Probabilistic Human Motion Prediction via A Bayesian Neural Network
Figure 4 for Probabilistic Human Motion Prediction via A Bayesian Neural Network

Human motion prediction is an important and challenging topic that has promising prospects in efficient and safe human-robot-interaction systems. Currently, the majority of the human motion prediction algorithms are based on deterministic models, which may lead to risky decisions for robots. To solve this problem, we propose a probabilistic model for human motion prediction in this paper. The key idea of our approach is to extend the conventional deterministic motion prediction neural network to a Bayesian one. On one hand, our model could generate several future motions when given an observed motion sequence. On the other hand, by calculating the Epistemic Uncertainty and the Heteroscedastic Aleatoric Uncertainty, our model could tell the robot if the observation has been seen before and also give the optimal result among all possible predictions. We extensively validate our approach on a large scale benchmark dataset Human3.6m. The experiments show that our approach performs better than deterministic methods. We further evaluate our approach in a Human-Robot-Interaction (HRI) scenario. The experimental results show that our approach makes the interaction more efficient and safer.

* Accepted at ICRA 2021 
Viaarxiv icon

REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter

May 31, 2021
Hanbo Zhang, Deyu Yang, Han Wang, Binglei Zhao, Xuguang Lan, Jishiyu Ding, Nanning Zheng

Figure 1 for REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter
Figure 2 for REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter
Figure 3 for REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter
Figure 4 for REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter

Despite the impressive progress achieved in robust grasp detection, robots are not skilled in sophisticated grasping tasks (e.g. search and grasp a specific object in clutter). Such tasks involve not only grasping, but comprehensive perception of the visual world (e.g. the relationship between objects). Recently, the advanced deep learning techniques provide a promising way for understanding the high-level visual concepts. It encourages robotic researchers to explore solutions for such hard and complicated fields. However, deep learning usually means data-hungry. The lack of data severely limits the performance of deep-learning-based algorithms. In this paper, we present a new dataset named \regrad to sustain the modeling of relationships among objects and grasps. We collect the annotations of object poses, segmentations, grasps, and relationships in each image for comprehensive perception of grasping. Our dataset is collected in both forms of 2D images and 3D point clouds. Moreover, since all the data are generated automatically, users are free to import their own object models for the generation of as many data as they want. We have released our dataset and codes. A video that demonstrates the process of data generation is also available.

Viaarxiv icon