Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hong Zhang

Are We Ready for Robust and Resilient SLAM? A Framework For Quantitative Characterization of SLAM Datasets

Feb 23, 2022
Islam Ali, Hong Zhang

Figure 1 for Are We Ready for Robust and Resilient SLAM? A Framework For Quantitative Characterization of SLAM Datasets

Figure 2 for Are We Ready for Robust and Resilient SLAM? A Framework For Quantitative Characterization of SLAM Datasets

Figure 3 for Are We Ready for Robust and Resilient SLAM? A Framework For Quantitative Characterization of SLAM Datasets

Figure 4 for Are We Ready for Robust and Resilient SLAM? A Framework For Quantitative Characterization of SLAM Datasets

Reliability of SLAM systems is considered one of the critical requirements in many modern autonomous systems. This directed the efforts to developing many state-of-the-art systems, creating challenging datasets, and introducing rigorous metrics to measure SLAM system performance. However, the link between datasets and performance in the robustness/resilience context has rarely been explored. In order to fill this void, characterization the operating conditions of SLAM systems is essential in order to provide an environment for quantitative measurement of robustness and resilience. In this paper, we argue that for proper evaluation of SLAM performance, the characterization of SLAM datasets serves as a critical first step. The study starts by reviewing previous efforts for quantitative characterization of SLAM datasets. Then, the problem of perturbations characterization is discussed and the linkage to SLAM robustness/resilience is established. After that, we propose a novel, generic and extendable framework for quantitative analysis and comparison of SLAM datasets. Additionally, a description of different characterization parameters is provided. Finally, we demonstrate the application of our framework by presenting the characterization results of three SLAM datasets: KITTI, EuroC-MAV, and TUM-VI highlighting the level of insights achieved by the proposed framework.

Via

Access Paper or Ask Questions

MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

Jan 26, 2022
Zhiyu Chong, Xinzhu Ma, Hong Zhang, Yuxin Yue, Haojie Li, Zhihui Wang, Wanli Ouyang

Figure 1 for MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

Figure 2 for MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

Figure 3 for MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

Figure 4 for MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

3D object detection is a fundamental and challenging task for 3D scene understanding, and the monocular-based methods can serve as an economical alternative to the stereo-based or LiDAR-based methods. However, accurately detecting objects in the 3D space from a single image is extremely difficult due to the lack of spatial cues. To mitigate this issue, we propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase. In particular, we first project the LiDAR signals into the image plane and align them with the RGB images. After that, we use the resulting data to train a 3D detector (LiDAR Net) with the same architecture as the baseline model. Finally, this LiDAR Net can serve as the teacher to transfer the learned knowledge to the baseline model. Experimental results show that the proposed method can significantly boost the performance of the baseline model and ranks the $1^{st}$ place among all monocular-based methods on the KITTI benchmark. Besides, extensive ablation studies are conducted, which further prove the effectiveness of each part of our designs and illustrate what the baseline model has learned from the LiDAR Net. Our code will be released at \url{https://github.com/monster-ghost/MonoDistill}.

* Accepted by ICLR 2022

Via

Access Paper or Ask Questions

Improving Feature Extraction from Histopathological Images Through A Fine-tuning ImageNet Model

Jan 03, 2022
Xingyu Li, Min Cen, Jinfeng Xu, Hong Zhang, Xu Steven Xu

Due to lack of annotated pathological images, transfer learning has been the predominant approach in the field of digital pathology.Pre-trained neural networks based on ImageNet database are often used to extract "off the shelf" features, achieving great success in predicting tissue types, molecular features, and clinical outcomes, etc. We hypothesize that fine-tuning the pre-trained models using histopathological images could further improve feature extraction, and downstream prediction performance.We used 100,000 annotated HE image patches for colorectal cancer (CRC) to finetune a pretrained Xception model via a twostep approach.The features extracted from finetuned Xception (FTX2048) model and Imagepretrained (IMGNET2048) model were compared through: (1) tissue classification for HE images from CRC, same image type that was used for finetuning; (2) prediction of immunerelated gene expression and (3) gene mutations for lung adenocarcinoma (LUAD).Fivefold cross validation was used for model performance evaluation. The extracted features from the finetuned FTX2048 exhibited significantly higher accuracy for predicting tisue types of CRC compared to the off the shelf feature directly from Xception based on ImageNet database. Particularly, FTX2048 markedly improved the accuracy for stroma from 87% to 94%. Similarly, features from FTX2048 boosted the prediction of transcriptomic expression of immunerelated genesin LUAD. For the genes that had signigicant relationships with image fetures, the features fgrom the finetuned model imprroved the prediction for the majority of the genes. Inaddition, fetures from FTX2048 improved prediction of mutation for 5 out of 9 most frequently mutated genes in LUAD.

Via

Access Paper or Ask Questions

Online Mutual Adaptation of Deep Depth Prediction and Visual SLAM

Nov 27, 2021
Shing Yan Loo, Moein Shakeri, Sai Hong Tang, Syamsiah Mashohor, Hong Zhang

Figure 1 for Online Mutual Adaptation of Deep Depth Prediction and Visual SLAM

Figure 2 for Online Mutual Adaptation of Deep Depth Prediction and Visual SLAM

Figure 3 for Online Mutual Adaptation of Deep Depth Prediction and Visual SLAM

Figure 4 for Online Mutual Adaptation of Deep Depth Prediction and Visual SLAM

The ability of accurate depth prediction by a CNN is a major challenge for its wide use in practical visual SLAM applications, such as enhanced camera tracking and dense mapping. This paper is set out to answer the following question: Can we tune a depth prediction CNN with the help of a visual SLAM algorithm even if the CNN is not trained for the current operating environment in order to benefit the SLAM performance? To this end, we propose a novel online adaptation framework consisting of two complementary processes: a SLAM algorithm that is used to generate keyframes to fine-tune the depth prediction and another algorithm that uses the online adapted depth to improve map quality. Once the potential noisy map points are removed, we perform global photometric bundle adjustment (BA) to improve the overall SLAM performance. Experimental results on both benchmark datasets and a real robot in our own experimental environments show that our proposed method improves the overall SLAM accuracy. We demonstrate the use of regularization in the training loss as an effective means to prevent catastrophic forgetting. In addition, we compare our online adaptation framework against the state-of-the-art pre-trained depth prediction CNNs to show that our online adapted depth prediction CNN outperforms the depth prediction CNNs that have been trained on a large collection of datasets.

* 10 pages, 7 figures

Via

Access Paper or Ask Questions

Online Adaptation of Monocular Depth Prediction with Visual SLAM

Nov 07, 2021
Shing Yan Loo, Moein Shakeri, Sai Hong Tang, Syamsiah Mashohor, Hong Zhang

Figure 1 for Online Adaptation of Monocular Depth Prediction with Visual SLAM

Figure 2 for Online Adaptation of Monocular Depth Prediction with Visual SLAM

Figure 3 for Online Adaptation of Monocular Depth Prediction with Visual SLAM

Figure 4 for Online Adaptation of Monocular Depth Prediction with Visual SLAM

The ability of accurate depth prediction by a CNN is a major challenge for its wide use in practical visual SLAM applications, such as enhanced camera tracking and dense mapping. This paper is set out to answer the following question: Can we tune a depth prediction CNN with the help of a visual SLAM algorithm even if the CNN is not trained for the current operating environment in order to benefit the SLAM performance? To this end, we propose a novel online adaptation framework consisting of two complementary processes: a SLAM algorithm that is used to generate keyframes to fine-tune the depth prediction and another algorithm that uses the online adapted depth to improve map quality. Once the potential noisy map points are removed, we perform global photometric bundle adjustment (BA) to improve the overall SLAM performance. Experimental results on both benchmark datasets and a real robot in our own experimental environments show that our proposed method improves the SLAM reconstruction accuracy. We demonstrate the use of regularization in the training loss as an effective means to prevent catastrophic forgetting. In addition, we compare our online adaptation framework against the state-of-the-art pre-trained depth prediction CNNs to show that our online adapted depth prediction CNN outperforms the depth prediction CNNs that have been trained on a large collection of datasets.

* 9 pages, 8 figures

Via

Access Paper or Ask Questions

Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Nov 05, 2021
Sara Elkerdawy, Mostafa Elhoushi, Hong Zhang, Nilanjan Ray

Figure 1 for Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Figure 2 for Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Figure 3 for Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Figure 4 for Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment. However, current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss. This formulation introduces complexity in balancing different losses (e.g task loss, regularization loss). In addition, regularization based methods lack transparent tradeoff hyperparameter selection to realize computational budget. Our contribution is two-fold: 1) decoupled task and pruning training. 2) Simple hyperparameter selection that enables FLOPs reduction estimation before training. Inspired by the Hebbian theory in Neuroscience: "neurons that fire together wire together", we propose to predict a mask to process k filters in a layer based on the activation of its previous layer. We pose the problem as a self-supervised binary classification problem. Each mask predictor module is trained to predict if the log-likelihood for each filter in the current layer belongs to the top-k activated filters. The value k is dynamically estimated for each input based on a novel criterion using the mass of heatmaps. We show experiments on several neural architectures, such as VGG, ResNet and MobileNet on CIFAR and ImageNet datasets. On CIFAR, we reach similar accuracy to SOTA methods with 15% and 24% higher FLOPs reduction. Similarly in ImageNet, we achieve lower drop in accuracy with up to 13% improvement in FLOPs reduction.

Via

Access Paper or Ask Questions

A Retrospective Analysis using Deep-Learning Models for Prediction of Survival Outcome and Benefit of Adjuvant Chemotherapy in Stage II/III Colorectal Cancer

Nov 05, 2021
Xingyu Li, Jitendra Jonnagaddala, Shuhua Yang, Hong Zhang, Xu Steven Xu

Figure 1 for A Retrospective Analysis using Deep-Learning Models for Prediction of Survival Outcome and Benefit of Adjuvant Chemotherapy in Stage II/III Colorectal Cancer

Figure 2 for A Retrospective Analysis using Deep-Learning Models for Prediction of Survival Outcome and Benefit of Adjuvant Chemotherapy in Stage II/III Colorectal Cancer

Most early-stage colorectal cancer (CRC) patients can be cured by surgery alone, and only certain high-risk early-stage CRC patients benefit from adjuvant chemotherapies. However, very few validated biomarkers are available to accurately predict survival benefit from postoperative chemotherapy. We developed a novel deep-learning algorithm (CRCNet) using whole-slide images from Molecular and Cellular Oncology (MCO) to predict survival benefit of adjuvant chemotherapy in stage II/III CRC. We validated CRCNet both internally through cross-validation and externally using an independent cohort from The Cancer Genome Atlas (TCGA). We showed that CRCNet can accurately predict not only survival prognosis but also the treatment effect of adjuvant chemotherapy. The CRCNet identified high-risk subgroup benefits from adjuvant chemotherapy most and significant longer survival is observed among chemo-treated patients. Conversely, minimal chemotherapy benefit is observed in the CRCNet low- and medium-risk subgroups. Therefore, CRCNet can potentially be of great use in guiding treatments for Stage II/III CRC.

Via

Access Paper or Ask Questions

Relationship Oriented Affordance Learning through Manipulation Graph Construction

Nov 01, 2021
Chao Tang, Jingwen Yu, Weinan Chen, Hong Zhang

Figure 1 for Relationship Oriented Affordance Learning through Manipulation Graph Construction

Figure 2 for Relationship Oriented Affordance Learning through Manipulation Graph Construction

Figure 3 for Relationship Oriented Affordance Learning through Manipulation Graph Construction

Figure 4 for Relationship Oriented Affordance Learning through Manipulation Graph Construction

In this paper, we propose Manipulation Relationship Graph (MRG), a novel affordance representation which captures the underlying manipulation relationships of an arbitrary scene. To construct such a graph from raw visual observations, a deep nerual network named AR-Net is introduced. It consists of an Attribute module and a Context module, which guide the relationship learning at object and subgraph level respectively. We quantitatively validate our method on a novel manipulation relationship dataset named SMRD. To evaluate the performance of the proposed model and representation, both visual perception and physical manipulation experiments are conducted. Overall, AR-Net along with MRG outperforms all baselines, achieving the success rate of 88.89% on task relationship recognition (TRR) and 73.33% on task completion (TC)

Via

Access Paper or Ask Questions

Foresight of Graph Reinforcement Learning Latent Permutations Learnt by Gumbel Sinkhorn Network

Oct 23, 2021
Tianqi Shen, Hong Zhang, Ding Yuan, Jiaping Xiao, Yifan Yang

Figure 1 for Foresight of Graph Reinforcement Learning Latent Permutations Learnt by Gumbel Sinkhorn Network

Figure 2 for Foresight of Graph Reinforcement Learning Latent Permutations Learnt by Gumbel Sinkhorn Network

Figure 3 for Foresight of Graph Reinforcement Learning Latent Permutations Learnt by Gumbel Sinkhorn Network

Figure 4 for Foresight of Graph Reinforcement Learning Latent Permutations Learnt by Gumbel Sinkhorn Network

Vital importance has necessity to be attached to cooperation in multi-agent environments, as a result of which some reinforcement learning algorithms combined with graph neural networks have been proposed to understand the mutual interplay between agents. However, highly complicated and dynamic multi-agent environments require more ingenious graph neural networks, which can comprehensively represent not only the graph topology structure but also evolution process of the structure due to agents emerging, disappearing and moving. To tackle these difficulties, we propose Gumbel Sinkhorn graph attention reinforcement learning, where a graph attention network highly represents the underlying graph topology structure of the multi-agent environment, and can adapt to the dynamic topology structure of graph better with the help of Gumbel Sinkhorn network by learning latent permutations. Empirically, simulation results show how our proposed graph reinforcement learning methodology outperforms existing methods in the PettingZoo multi-agent environment by learning latent permutations.

Via

Access Paper or Ask Questions