Alert button
Picture for Weijian Xie

Weijian Xie

Alert button

Depth Completion with Multiple Balanced Bases and Confidence for Dense Monocular SLAM

Sep 20, 2023
Weijian Xie, Guanyi Chu, Quanhao Qian, Yihao Yu, Hai Li, Danpeng Chen, Shangjin Zhai, Nan Wang, Hujun Bao, Guofeng Zhang

Dense SLAM based on monocular cameras does indeed have immense application value in the field of AR/VR, especially when it is performed on a mobile device. In this paper, we propose a novel method that integrates a light-weight depth completion network into a sparse SLAM system using a multi-basis depth representation, so that dense mapping can be performed online even on a mobile phone. Specifically, we present a specifically optimized multi-basis depth completion network, called BBC-Net, tailored to the characteristics of traditional sparse SLAM systems. BBC-Net can predict multiple balanced bases and a confidence map from a monocular image with sparse points generated by off-the-shelf keypoint-based SLAM systems. The final depth is a linear combination of predicted depth bases that can be optimized by tuning the corresponding weights. To seamlessly incorporate the weights into traditional SLAM optimization and ensure efficiency and robustness, we design a set of depth weight factors, which makes our network a versatile plug-in module, facilitating easy integration into various existing sparse SLAM systems and significantly enhancing global depth consistency through bundle adjustment. To verify the portability of our method, we integrate BBC-Net into two representative SLAM systems. The experimental results on various datasets show that the proposed method achieves better performance in monocular dense mapping than the state-of-the-art methods. We provide an online demo running on a mobile phone, which verifies the efficiency and mapping quality of the proposed method in real-world scenarios.

Viaarxiv icon

VIP-SLAM: An Efficient Tightly-Coupled RGB-D Visual Inertial Planar SLAM

Jul 04, 2022
Danpeng Chen, Shuai Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Hujun Bao, Guofeng Zhang

Figure 1 for VIP-SLAM: An Efficient Tightly-Coupled RGB-D Visual Inertial Planar SLAM
Figure 2 for VIP-SLAM: An Efficient Tightly-Coupled RGB-D Visual Inertial Planar SLAM
Figure 3 for VIP-SLAM: An Efficient Tightly-Coupled RGB-D Visual Inertial Planar SLAM
Figure 4 for VIP-SLAM: An Efficient Tightly-Coupled RGB-D Visual Inertial Planar SLAM

In this paper, we propose a tightly-coupled SLAM system fused with RGB, Depth, IMU and structured plane information. Traditional sparse points based SLAM systems always maintain a mass of map points to model the environment. Huge number of map points bring us a high computational complexity, making it difficult to be deployed on mobile devices. On the other hand, planes are common structures in man-made environment especially in indoor environments. We usually can use a small number of planes to represent a large scene. So the main purpose of this article is to decrease the high complexity of sparse points based SLAM. We build a lightweight back-end map which consists of a few planes and map points to achieve efficient bundle adjustment (BA) with an equal or better accuracy. We use homography constraints to eliminate the parameters of numerous plane points in the optimization and reduce the complexity of BA. We separate the parameters and measurements in homography and point-to-plane constraints and compress the measurements part to further effectively improve the speed of BA. We also integrate the plane information into the whole system to realize robust planar feature extraction, data association, and global consistent planar reconstruction. Finally, we perform an ablation study and compare our method with similar methods in simulation and real environment data. Our system achieves obvious advantages in accuracy and efficiency. Even if the plane parameters are involved in the optimization, we effectively simplify the back-end map by using planar structures. The global bundle adjustment is nearly 2 times faster than the sparse points based SLAM algorithm.

Viaarxiv icon

Procedural Text Understanding via Scene-Wise Evolution

Mar 15, 2022
Jialong Tang, Hongyu Lin, Meng Liao, Yaojie Lu, Xianpei Han, Le Sun, Weijian Xie, Jin Xu

Figure 1 for Procedural Text Understanding via Scene-Wise Evolution
Figure 2 for Procedural Text Understanding via Scene-Wise Evolution
Figure 3 for Procedural Text Understanding via Scene-Wise Evolution
Figure 4 for Procedural Text Understanding via Scene-Wise Evolution

Procedural text understanding requires machines to reason about entity states within the dynamical narratives. Current procedural text understanding approaches are commonly \textbf{entity-wise}, which separately track each entity and independently predict different states of each entity. Such an entity-wise paradigm does not consider the interaction between entities and their states. In this paper, we propose a new \textbf{scene-wise} paradigm for procedural text understanding, which jointly tracks states of all entities in a scene-by-scene manner. Based on this paradigm, we propose \textbf{S}cene \textbf{G}raph \textbf{R}easoner (\textbf{SGR}), which introduces a series of dynamically evolving scene graphs to jointly formulate the evolution of entities, states and their associations throughout the narrative. In this way, the deep interactions between all entities and states can be jointly captured and simultaneously derived from scene graphs. Experiments show that SGR not only achieves the new state-of-the-art performance but also significantly accelerates the speed of reasoning.

* AAAI 2022  
* 9 pages, 2 figures 
Viaarxiv icon

From Discourse to Narrative: Knowledge Projection for Event Relation Extraction

Jun 16, 2021
Jialong Tang, Hongyu Lin, Meng Liao, Yaojie Lu, Xianpei Han, Le Sun, Weijian Xie, Jin Xu

Figure 1 for From Discourse to Narrative: Knowledge Projection for Event Relation Extraction
Figure 2 for From Discourse to Narrative: Knowledge Projection for Event Relation Extraction
Figure 3 for From Discourse to Narrative: Knowledge Projection for Event Relation Extraction
Figure 4 for From Discourse to Narrative: Knowledge Projection for Event Relation Extraction

Current event-centric knowledge graphs highly rely on explicit connectives to mine relations between events. Unfortunately, due to the sparsity of connectives, these methods severely undermine the coverage of EventKGs. The lack of high-quality labelled corpora further exacerbates that problem. In this paper, we propose a knowledge projection paradigm for event relation extraction: projecting discourse knowledge to narratives by exploiting the commonalities between them. Specifically, we propose Multi-tier Knowledge Projection Network (MKPNet), which can leverage multi-tier discourse knowledge effectively for event relation extraction. In this way, the labelled data requirement is significantly reduced, and implicit event relations can be effectively extracted. Intrinsic experimental results show that MKPNet achieves the new state-of-the-art performance, and extrinsic experimental results verify the value of the extracted event relations.

* ACL 2021  
* 11 pages 
Viaarxiv icon

CLUE: A Chinese Language Understanding Evaluation Benchmark

Apr 14, 2020
Liang Xu, Xuanwei Zhang, Lu Li, Hai Hu, Chenjie Cao, Weitang Liu, Junyi Li, Yudong Li, Kai Sun, Yechen Xu, Yiming Cui, Cong Yu, Qianqian Dong, Yin Tian, Dian Yu, Bo Shi, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Zhenzhong Lan

Figure 1 for CLUE: A Chinese Language Understanding Evaluation Benchmark
Figure 2 for CLUE: A Chinese Language Understanding Evaluation Benchmark
Figure 3 for CLUE: A Chinese Language Understanding Evaluation Benchmark
Figure 4 for CLUE: A Chinese Language Understanding Evaluation Benchmark

We introduce CLUE, a Chinese Language Understanding Evaluation benchmark. It contains eight different tasks, including single-sentence classification, sentence pair classification, and machine reading comprehension. We evaluate CLUE on a number of existing full-network pre-trained models for Chinese. We also include a small hand-crafted diagnostic test set designed to probe specific linguistic phenomena using different models, some of which are unique to Chinese. Along with CLUE, we release a large clean crawled raw text corpus that can be used for model pre-training. We release CLUE, baselines and pre-training dataset on Github.

* 9 pages, 4 figures 
Viaarxiv icon

Multi-task Sentence Encoding Model for Semantic Retrieval in Question Answering Systems

Nov 18, 2019
Qiang Huang, Jianhui Bu, Weijian Xie, Shengwen Yang, Weijia Wu, Liping Liu

Figure 1 for Multi-task Sentence Encoding Model for Semantic Retrieval in Question Answering Systems
Figure 2 for Multi-task Sentence Encoding Model for Semantic Retrieval in Question Answering Systems
Figure 3 for Multi-task Sentence Encoding Model for Semantic Retrieval in Question Answering Systems
Figure 4 for Multi-task Sentence Encoding Model for Semantic Retrieval in Question Answering Systems

Question Answering (QA) systems are used to provide proper responses to users' questions automatically. Sentence matching is an essential task in the QA systems and is usually reformulated as a Paraphrase Identification (PI) problem. Given a question, the aim of the task is to find the most similar question from a QA knowledge base. In this paper, we propose a Multi-task Sentence Encoding Model (MSEM) for the PI problem, wherein a connected graph is employed to depict the relation between sentences, and a multi-task learning model is applied to address both the sentence matching and sentence intent classification problem. In addition, we implement a general semantic retrieval framework that combines our proposed model and the Approximate Nearest Neighbor (ANN) technology, which enables us to find the most similar question from all available candidates very quickly during online serving. The experiments show the superiority of our proposed method as compared with the existing sentence matching models.

* IJCNN 2019 - International Joint Conference on Neural Networks, Budapest Hungary, 14-19 July 2019 
Viaarxiv icon