Alert button
Picture for Jian Yao

Jian Yao

Alert button

A boundary-aware point clustering approach in Euclidean and embedding spaces for roof plane segmentation

Sep 07, 2023
Li Li, Qingqing Li, Guozheng Xu, Pengwei Zhou, Jingmin Tu, Jie Li, Jian Yao

Figure 1 for A boundary-aware point clustering approach in Euclidean and embedding spaces for roof plane segmentation
Figure 2 for A boundary-aware point clustering approach in Euclidean and embedding spaces for roof plane segmentation
Figure 3 for A boundary-aware point clustering approach in Euclidean and embedding spaces for roof plane segmentation
Figure 4 for A boundary-aware point clustering approach in Euclidean and embedding spaces for roof plane segmentation

Roof plane segmentation from airborne LiDAR point clouds is an important technology for 3D building model reconstruction. One of the key issues of plane segmentation is how to design powerful features that can exactly distinguish adjacent planar patches. The quality of point feature directly determines the accuracy of roof plane segmentation. Most of existing approaches use handcrafted features to extract roof planes. However, the abilities of these features are relatively low, especially in boundary area. To solve this problem, we propose a boundary-aware point clustering approach in Euclidean and embedding spaces constructed by a multi-task deep network for roof plane segmentation. We design a three-branch network to predict semantic labels, point offsets and extract deep embedding features. In the first branch, we classify the input data as non-roof, boundary and plane points. In the second branch, we predict point offsets for shifting each point toward its respective instance center. In the third branch, we constrain that points of the same plane instance should have the similar embeddings. We aim to ensure that points of the same plane instance are close as much as possible in both Euclidean and embedding spaces. However, although deep network has strong feature representative ability, it is still hard to accurately distinguish points near plane instance boundary. Therefore, we first group plane points into many clusters in the two spaces, and then we assign the rest boundary points to their closest clusters to generate final complete roof planes. In this way, we can effectively reduce the influence of unreliable boundary points. In addition, we construct a synthetic dataset and a real dataset to train and evaluate our approach. The experiments results show that the proposed approach significantly outperforms the existing state-of-the-art approaches.

Viaarxiv icon

Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera

Aug 28, 2023
Jun Yang, Jian Yao, Steven L. Waslander

Figure 1 for Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera
Figure 2 for Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera
Figure 3 for Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera
Figure 4 for Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera

6D pose estimation of textureless shiny objects has become an essential problem in many robotic applications. Many pose estimators require high-quality depth data, often measured by structured light cameras. However, when objects have shiny surfaces (e.g., metal parts), these cameras fail to sense complete depths from a single viewpoint due to the specular reflection, resulting in a significant drop in the final pose accuracy. To mitigate this issue, we present a complete active vision framework for 6D object pose refinement and next-best-view prediction. Specifically, we first develop an optimization-based pose refinement module for the structured light camera. Our system then selects the next best camera viewpoint to collect depth measurements by minimizing the predicted uncertainty of the object pose. Compared to previous approaches, we additionally predict measurement uncertainties of future viewpoints by online rendering, which significantly improves the next-best-view prediction performance. We test our approach on the challenging real-world ROBI dataset. The results demonstrate that our pose refinement method outperforms the traditional ICP-based approach when given the same input depth data, and our next-best-view strategy can achieve high object pose accuracy with significantly fewer viewpoints than the heuristic-based policies.

Viaarxiv icon

Policy Space Diversity for Non-Transitive Games

Jun 29, 2023
Jian Yao, Weiming Liu, Haobo Fu, Yaodong Yang, Stephen McAleer, Qiang Fu, Wei Yang

Figure 1 for Policy Space Diversity for Non-Transitive Games
Figure 2 for Policy Space Diversity for Non-Transitive Games
Figure 3 for Policy Space Diversity for Non-Transitive Games
Figure 4 for Policy Space Diversity for Non-Transitive Games

Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness in existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving in PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO. Empirically, extensive experiments on various games demonstrate that PSD-PSRO is more effective in producing significantly less exploitable policies than state-of-the-art PSRO variants.

Viaarxiv icon

A Lightweight Reconstruction Network for Surface Defect Inspection

Dec 25, 2022
Chao Hu, Jian Yao, Weijie Wu, Weibin Qiu, Liqiang Zhu

Figure 1 for A Lightweight Reconstruction Network for Surface Defect Inspection
Figure 2 for A Lightweight Reconstruction Network for Surface Defect Inspection
Figure 3 for A Lightweight Reconstruction Network for Surface Defect Inspection
Figure 4 for A Lightweight Reconstruction Network for Surface Defect Inspection

Currently, most deep learning methods cannot solve the problem of scarcity of industrial product defect samples and significant differences in characteristics. This paper proposes an unsupervised defect detection algorithm based on a reconstruction network, which is realized using only a large number of easily obtained defect-free sample data. The network includes two parts: image reconstruction and surface defect area detection. The reconstruction network is designed through a fully convolutional autoencoder with a lightweight structure. Only a small number of normal samples are used for training so that the reconstruction network can be A defect-free reconstructed image is generated. A function combining structural loss and $\mathit{L}1$ loss is proposed as the loss function of the reconstruction network to solve the problem of poor detection of irregular texture surface defects. Further, the residual of the reconstructed image and the image to be tested is used as the possible region of the defect, and conventional image operations can realize the location of the fault. The unsupervised defect detection algorithm of the proposed reconstruction network is used on multiple defect image sample sets. Compared with other similar algorithms, the results show that the unsupervised defect detection algorithm of the reconstructed network has strong robustness and accuracy.

* 2023 Journal of Mathematical Imaging and Vision(2023 JMIV)  
* Journal of Mathematical Imaging and Vision(JMIV) 
Viaarxiv icon

Self-supervised Amodal Video Object Segmentation

Oct 23, 2022
Jian Yao, Yuxin Hong, Chiyu Wang, Tianjun Xiao, Tong He, Francesco Locatello, David Wipf, Yanwei Fu, Zheng Zhang

Figure 1 for Self-supervised Amodal Video Object Segmentation
Figure 2 for Self-supervised Amodal Video Object Segmentation
Figure 3 for Self-supervised Amodal Video Object Segmentation
Figure 4 for Self-supervised Amodal Video Object Segmentation

Amodal perception requires inferring the full shape of an object that is partially occluded. This task is particularly challenging on two levels: (1) it requires more information than what is contained in the instant retina or imaging sensor, (2) it is difficult to obtain enough well-annotated amodal labels for supervision. To this end, this paper develops a new framework of Self-supervised amodal Video object segmentation (SaVos). Our method efficiently leverages the visual information of video temporal sequences to infer the amodal mask of objects. The key intuition is that the occluded part of an object can be explained away if that part is visible in other frames, possibly deformed as long as the deformation can be reasonably learned. Accordingly, we derive a novel self-supervised learning paradigm that efficiently utilizes the visible object parts as the supervision to guide the training on videos. In addition to learning type prior to complete masks for known types, SaVos also learns the spatiotemporal prior, which is also useful for the amodal task and could generalize to unseen types. The proposed framework achieves the state-of-the-art performance on the synthetic amodal segmentation benchmark FISHBOWL and the real world benchmark KINS-Video-Car. Further, it lends itself well to being transferred to novel distributions using test-time adaptation, outperforming existing models even after the transfer to a new distribution.

* accepted in Neurips2022 
Viaarxiv icon

DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view Structure from Motion

Oct 11, 2022
Yuxi Xiao, Li Li, Xiaodi Li, Jian Yao

Figure 1 for DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view Structure from Motion
Figure 2 for DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view Structure from Motion
Figure 3 for DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view Structure from Motion
Figure 4 for DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view Structure from Motion

Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM). Many existing end-to-end learning-based methods usually formulate it as a brute regression problem. However, the inadequate utilization of traditional geometry model makes the model not robust in unseen environments. To improve the generalization capability and robustness of end-to-end two-view SfM network, we formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE. First, we propose to take the deep multi-scale correlation maps to depict the visual similarities of 2D image matches decided by ego-motion. In addition, in order to increase the robustness of our framework, we formulate the likelihood function of the correlations of 2D image matches as a Gaussian and Uniform mixture distribution which takes the uncertainty caused by illumination changes, image noise and moving objects into account. Meanwhile, an uncertainty prediction module is presented to predict the pixel-wise distribution parameters. Finally, we iteratively refine the depth and relative camera pose using the gradient-like information to maximize the likelihood function of the correlations. Extensive experimental results on several datasets prove that our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.

* 8 pages, Accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2022) 
Viaarxiv icon

Practical Issues and Challenges in CSI-based Integrated Sensing and Communication

Mar 18, 2022
Daqing Zhang, Dan Wu, Kai Niu, Xuanzhi Wang, Fusang Zhang, Jian Yao, Dajie Jiang, Fei Qin

Next-generation mobile communication network (i.e., 6G) has been envisioned to go beyond classical communication functionality and provide integrated sensing and communication (ISAC) capability to enable more emerging applications, such as smart cities, connected vehicles, AIoT and health care/elder care. Among all the ISAC proposals, the most practical and promising approach is to empower existing wireless network (e.g., WiFi, 4G/5G) with the augmented ability to sense the surrounding human and environment, and evolve wireless communication networks into intelligent communication and sensing network (e.g., 6G). In this paper, based on our experience on CSI-based wireless sensing with WiFi/4G/5G signals, we intend to identify ten major practical and theoretical problems that hinder real deployment of ISAC applications, and provide possible solutions to those critical challenges. Hopefully, this work will inspire further research to evolve existing WiFi/4G/5G networks into next-generation intelligent wireless network (i.e., 6G).

* ICC 2022 workshop on integrated sensing and communication (ISAC) 
Viaarxiv icon

PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for Weakly-Textured Surface Reconstruction

Mar 04, 2022
Haonan Dong, Jian Yao

Figure 1 for PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for Weakly-Textured Surface Reconstruction
Figure 2 for PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for Weakly-Textured Surface Reconstruction
Figure 3 for PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for Weakly-Textured Surface Reconstruction
Figure 4 for PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for Weakly-Textured Surface Reconstruction

Learning-based multi-view stereo (MVS) has gained fine reconstructions on popular datasets. However, supervised learning methods require ground truth for training, which is hard to be collected, especially for the large-scale datasets. Though nowadays unsupervised learning methods have been proposed and have gotten gratifying results, those methods still fail to reconstruct intact results in challenging scenes, such as weakly-textured surfaces, as those methods primarily depend on pixel-wise photometric consistency which is subjected to various illuminations. To alleviate matching ambiguity in those challenging scenes, this paper proposes robust loss functions leveraging constraints beneath multi-view images: 1) Patch-wise photometric consistency loss, which expands the receptive field of the features in multi-view similarity measuring, 2) Robust twoview geometric consistency, which includes a cross-view depth consistency checking with the minimum occlusion. Our unsupervised strategy can be implemented with arbitrary depth estimation frameworks and can be trained with arbitrary large-scale MVS datasets. Experiments show that our method can decrease the matching ambiguity and particularly improve the completeness of weakly-textured reconstruction. Moreover, our method reaches the performance of the state-of-the-art methods on popular benchmarks, like DTU, Tanks and Temples and ETH3D. The code will be released soon.

Viaarxiv icon

DDR-Net: Learning Multi-Stage Multi-View Stereo With Dynamic Depth Range

Mar 26, 2021
Puyuan Yi, Shengkun Tang, Jian Yao

Figure 1 for DDR-Net: Learning Multi-Stage Multi-View Stereo With Dynamic Depth Range
Figure 2 for DDR-Net: Learning Multi-Stage Multi-View Stereo With Dynamic Depth Range
Figure 3 for DDR-Net: Learning Multi-Stage Multi-View Stereo With Dynamic Depth Range
Figure 4 for DDR-Net: Learning Multi-Stage Multi-View Stereo With Dynamic Depth Range

To obtain high-resolution depth maps, some previous learning-based multi-view stereo methods build a cost volume pyramid in a coarse-to-fine manner. These approaches leverage fixed depth range hypotheses to construct cascaded plane sweep volumes. However, it is inappropriate to set identical range hypotheses for each pixel since the uncertainties of previous per-pixel depth predictions are spatially varying. Distinct from these approaches, we propose a Dynamic Depth Range Network (DDR-Net) to determine the depth range hypotheses dynamically by applying a range estimation module (REM) to learn the uncertainties of range hypotheses in the former stages. Specifically, in our DDR-Net, we first build an initial depth map at the coarsest resolution of an image across the entire depth range. Then the range estimation module (REM) leverages the probability distribution information of the initial depth to estimate the depth range hypotheses dynamically for the following stages. Moreover, we develop a novel loss strategy, which utilizes learned dynamic depth ranges to generate refined depth maps, to keep the ground truth value of each pixel covered in the range hypotheses of the next stage. Extensive experimental results show that our method achieves superior performance over other state-of-the-art methods on the DTU benchmark and obtains comparable results on the Tanks and Temples benchmark. The code is available at https://github.com/Tangshengku/DDR-Net.

Viaarxiv icon

Deformable spatial propagation network for depth completion

Jul 08, 2020
Zheyuan Xu, Yingfu Wang, Jian Yao

Figure 1 for Deformable spatial propagation network for depth completion
Figure 2 for Deformable spatial propagation network for depth completion
Figure 3 for Deformable spatial propagation network for depth completion
Figure 4 for Deformable spatial propagation network for depth completion

Depth completion has attracted extensive attention recently due to the development of autonomous driving, which aims to recover dense depth map from sparse depth measurements. Convolutional spatial propagation network (CSPN) is one of the state-of-the-art methods in this task, which adopt a linear propagation model to refine coarse depth maps with local context. However, the propagation of each pixel occurs in a fixed receptive field. This may not be the optimal for refinement since different pixel needs different local context. To tackle this issue, in this paper, we propose a deformable spatial propagation network (DSPN) to adaptively generates different receptive field and affinity matrix for each pixel. It allows the network obtain information with much fewer but more relevant pixels for propagation. Experimental results on KITTI depth completion benchmark demonstrate that our proposed method achieves the state-of-the-art performance.

* 5 pages, 3 figures 
Viaarxiv icon