Alert button
Picture for Hongyu Chen

Hongyu Chen

Alert button

Neural Informed RRT* with Point-based Network Guidance for Optimal Sampling-based Path Planning

Sep 26, 2023
Zhe Huang, Hongyu Chen, Katherine Driggs-Campbell

Sampling-based planning algorithms like Rapidly-exploring Random Tree (RRT) are versatile in solving path planning problems. RRT* offers asymptotical optimality but requires growing the tree uniformly over the free space, which leaves room for efficiency improvement. To accelerate convergence, informed approaches sample states in an ellipsoidal subset of the search space determined by current path cost during iteration. Learning-based alternatives model the topology of the search space and infer the states close to the optimal path to guide planning. We combine the strengths from both sides and propose Neural Informed RRT* with Point-based Network Guidance. We introduce Point-based Network to infer the guidance states, and integrate the network into Informed RRT* for guidance state refinement. We use Neural Connect to build connectivity of the guidance state set and further boost performance in challenging planning problems. Our method surpasses previous works in path planning benchmarks while preserving probabilistic completeness and asymptotical optimality. We demonstrate the deployment of our method on mobile robot navigation in the real world.

* 7 pages, 6 figures 
Viaarxiv icon

Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences

Jul 31, 2023
Dingyi Yang, Hongyu Chen, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin

Figure 1 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
Figure 2 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
Figure 3 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
Figure 4 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences

Stylized visual captioning aims to generate image or video descriptions with specific styles, making them more attractive and emotionally appropriate. One major challenge with this task is the lack of paired stylized captions for visual content, so most existing works focus on unsupervised methods that do not rely on parallel datasets. However, these approaches still require training with sufficient examples that have style labels, and the generated captions are limited to predefined styles. To address these limitations, we explore the problem of Few-Shot Stylized Visual Captioning, which aims to generate captions in any desired style, using only a few examples as guidance during inference, without requiring further training. We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module. Our two-step training scheme proceeds as follows: first, we train a style extractor to generate style representations on an unlabeled text-only corpus. Then, we freeze the extractor and enable our decoder to generate stylized descriptions based on the extracted style vector and projected visual content vectors. During inference, our model can generate desired stylized captions by deriving the style representation from user-supplied examples. Our automatic evaluation results for few-shot sentimental visual captioning outperform state-of-the-art approaches and are comparable to models that are fully trained on labeled style corpora. Human evaluations further confirm our model s ability to handle multiple styles.

* 9 pages, 6 figures 
Viaarxiv icon

Intriguing Properties of Quantization at Scale

May 30, 2023
Arash Ahmadian, Saurabh Dash, Hongyu Chen, Bharat Venkitesh, Stephen Gou, Phil Blunsom, Ahmet Üstün, Sara Hooker

Figure 1 for Intriguing Properties of Quantization at Scale
Figure 2 for Intriguing Properties of Quantization at Scale
Figure 3 for Intriguing Properties of Quantization at Scale
Figure 4 for Intriguing Properties of Quantization at Scale

Emergent properties have been widely adopted as a term to describe behavior not present in smaller models but observed in larger models. Recent work suggests that the trade-off incurred by quantization is also an emergent property, with sharp drops in performance in models over 6B parameters. In this work, we ask "are quantization cliffs in performance solely a factor of scale?" Against a backdrop of increased research focus on why certain emergent properties surface at scale, this work provides a useful counter-example. We posit that it is possible to optimize for a quantization friendly training recipe that suppresses large activation magnitude outliers. Here, we find that outlier dimensions are not an inherent product of scale, but rather sensitive to the optimization conditions present during pre-training. This both opens up directions for more efficient quantization, and poses the question of whether other emergent properties are inherent or can be altered and conditioned by optimization and architecture design choices. We successfully quantize models ranging in size from 410M to 52B with minimal degradation in performance.

* 32 pages, 14 figures 
Viaarxiv icon

Dual Feedback Attention Framework via Boundary-Aware Auxiliary and Progressive Semantic Optimization for Salient Object Detection in Optical Remote Sensing Imagery

Mar 06, 2023
Dejun Feng, Hongyu Chen, Suning Liu, Xingyu Shen, Ziyang Liao, Yakun Xie, Jun Zhu

Figure 1 for Dual Feedback Attention Framework via Boundary-Aware Auxiliary and Progressive Semantic Optimization for Salient Object Detection in Optical Remote Sensing Imagery
Figure 2 for Dual Feedback Attention Framework via Boundary-Aware Auxiliary and Progressive Semantic Optimization for Salient Object Detection in Optical Remote Sensing Imagery
Figure 3 for Dual Feedback Attention Framework via Boundary-Aware Auxiliary and Progressive Semantic Optimization for Salient Object Detection in Optical Remote Sensing Imagery
Figure 4 for Dual Feedback Attention Framework via Boundary-Aware Auxiliary and Progressive Semantic Optimization for Salient Object Detection in Optical Remote Sensing Imagery

Salient object detection in optical remote sensing image (ORSI-SOD) has gradually attracted attention thanks to the development of deep learning (DL) and salient object detection in natural scene image (NSI-SOD). However, NSI and ORSI are different in many aspects, such as large coverage, complex background, and large differences in target types and scales. Therefore, a new dedicated method is needed for ORSI-SOD. In addition, existing methods do not pay sufficient attention to the boundary of the object, and the completeness of the final saliency map still needs improvement. To address these issues, we propose a novel method called Dual Feedback Attention Framework via Boundary-Aware Auxiliary and Progressive Semantic Optimization (DFA-BASO). First, Boundary Protection Calibration (BPC) module is proposed to reduce the loss of edge position information during forward propagation and suppress noise in low-level features. Second, a Dual Feature Feedback Complementary (DFFC) module is proposed based on BPC module. It aggregates boundary-semantic dual features and provides effective feedback to coordinate features across different layers. Finally, a Strong Semantic Feedback Refinement (SSFR) module is proposed to obtain more complete saliency maps. This module further refines feature representation and eliminates feature differences through a unique feedback mechanism. Extensive experiments on two public datasets show that DFA-BASO outperforms 15 state-of-the-art methods. Furthermore, this paper strongly demonstrates the true contribution of DFA-BASO to ORSI-SOD by in-depth analysis of the visualization figure. All codes can be found at https://github.com/YUHsss/DFA-BASO.

Viaarxiv icon

An Empirical Study of Low Precision Quantization for TinyML

Mar 10, 2022
Shaojie Zhuo, Hongyu Chen, Ramchalam Kinattinkara Ramakrishnan, Tommy Chen, Chen Feng, Yicheng Lin, Parker Zhang, Liang Shen

Figure 1 for An Empirical Study of Low Precision Quantization for TinyML
Figure 2 for An Empirical Study of Low Precision Quantization for TinyML
Figure 3 for An Empirical Study of Low Precision Quantization for TinyML
Figure 4 for An Empirical Study of Low Precision Quantization for TinyML

Tiny machine learning (tinyML) has emerged during the past few years aiming to deploy machine learning models to embedded AI processors with highly constrained memory and computation capacity. Low precision quantization is an important model compression technique that can greatly reduce both memory consumption and computation cost of model inference. In this study, we focus on post-training quantization (PTQ) algorithms that quantize a model to low-bit (less than 8-bit) precision with only a small set of calibration data and benchmark them on different tinyML use cases. To achieve a fair comparison, we build a simulated quantization framework to investigate recent PTQ algorithms. Furthermore, we break down those algorithms into essential components and re-assembled a generic PTQ pipeline. With ablation study on different alternatives of components in the pipeline, we reveal key design choices when performing low precision quantization. We hope this work could provide useful data points and shed lights on the future research of low precision quantization.

* tinyML Research Symposium 2022 
Viaarxiv icon

2020 CATARACTS Semantic Segmentation Challenge

Oct 21, 2021
Imanol Luengo, Maria Grammatikopoulou, Rahim Mohammadi, Chris Walsh, Chinedu Innocent Nwoye, Deepak Alapatt, Nicolas Padoy, Zhen-Liang Ni, Chen-Chen Fan, Gui-Bin Bian, Zeng-Guang Hou, Heonjin Ha, Jiacheng Wang, Haojie Wang, Dong Guo, Lu Wang, Guotai Wang, Mobarakol Islam, Bharat Giddwani, Ren Hongliang, Theodoros Pissas, Claudio Ravasio Martin Huber, Jeremy Birch, Joan M. Nunez Do Rio, Lyndon da Cruz, Christos Bergeles, Hongyu Chen, Fucang Jia, Nikhil KumarTomar, Debesh Jha, Michael A. Riegler, Pal Halvorsen, Sophia Bano, Uddhav Vaghela, Jianyuan Hong, Haili Ye, Feihong Huang, Da-Han Wang, Danail Stoyanov

Figure 1 for 2020 CATARACTS Semantic Segmentation Challenge
Figure 2 for 2020 CATARACTS Semantic Segmentation Challenge
Figure 3 for 2020 CATARACTS Semantic Segmentation Challenge
Figure 4 for 2020 CATARACTS Semantic Segmentation Challenge

Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set.

Viaarxiv icon

Advanced Mapping Robot and High-Resolution Dataset

Jul 23, 2020
Hongyu Chen, Zhijie Yang, Xiting Zhao, Guangyuan Weng, Haochuan Wan, Jianwen Luo, Xiaoya Ye, Zehao Zhao, Zhenpeng He, Yongxia Shen, Sören Schwertfeger

Figure 1 for Advanced Mapping Robot and High-Resolution Dataset
Figure 2 for Advanced Mapping Robot and High-Resolution Dataset
Figure 3 for Advanced Mapping Robot and High-Resolution Dataset
Figure 4 for Advanced Mapping Robot and High-Resolution Dataset

This paper presents a fully hardware synchronized mapping robot with support for a hardware synchronized external tracking system, for super-precise timing and localization. Nine high-resolution cameras and two 32-beam 3D Lidars were used along with a professional, static 3D scanner for ground truth map collection. With all the sensors calibrated on the mapping robot, three datasets are collected to evaluate the performance of mapping algorithms within a room and between rooms. Based on these datasets we generate maps and trajectory data, which is then fed into evaluation algorithms. We provide the datasets for download and the mapping and evaluation procedures are made in a very easily reproducible manner for maximum comparability. We have also conducted a survey on available robotics-related datasets and compiled a big table with those datasets and a number of properties of them.

* arXiv admin note: substantial text overlap with arXiv:1905.09483 
Viaarxiv icon

Improving CNN-based Planar Object Detection with Geometric Prior Knowledge

Sep 23, 2019
Jianxiong Cai, Hongyu Chen, Laurent Kneip, Sören Schwertfeger

Figure 1 for Improving CNN-based Planar Object Detection with Geometric Prior Knowledge
Figure 2 for Improving CNN-based Planar Object Detection with Geometric Prior Knowledge
Figure 3 for Improving CNN-based Planar Object Detection with Geometric Prior Knowledge
Figure 4 for Improving CNN-based Planar Object Detection with Geometric Prior Knowledge

In this paper, we focus on the question: how might mobile robots take advantage of affordable RGB-D sensors for object detection? Although current CNN-based object detectors have achieved impressive results, there are three main drawbacks for practical usage on mobile robots: 1) It is hard and time-consuming to collect and annotate large-scale training sets. 2) It usually needs a long training time. 3) CNN-based object detection shows significant weakness in predicting location. We propose a novel approach for the detection of planar objects, which rectifies images with geometric information to compensate for the perspective distortion before feeding it to the CNN detector module, typically a CNN-based detector like YOLO or MASK RCNN. By dealing with the perspective distortion in advance, we eliminate the need for the CNN detector to learn that. Experiments show that this approach significantly boosts the detection performance. Besides, it effectively reduces the number of training images required. In addition to the novel detection framework proposed, we also release an RGB-D dataset for hazmat sign detection. To the best of our knowledge, this is the first public-available hazmat sign detection dataset with RGB-D sensors.

* Both authors are first author and denote equal contribution 
Viaarxiv icon

Heterogeneous Multi-sensor Calibration based on Graph Optimization

May 27, 2019
Hongyu Chen, Sören Schwertfeger

Figure 1 for Heterogeneous Multi-sensor Calibration based on Graph Optimization
Figure 2 for Heterogeneous Multi-sensor Calibration based on Graph Optimization
Figure 3 for Heterogeneous Multi-sensor Calibration based on Graph Optimization
Figure 4 for Heterogeneous Multi-sensor Calibration based on Graph Optimization

Many robotics and mapping systems contain multiple sensors to perceive the environment. Extrinsic parameter calibration, the identification of the position and rotation transform between the frames of the different sensors, is critical to fuse data from different sensors. When obtaining multiple camera to camera, lidar to camera and lidar to lidar calibration results, inconsistencies are likely. We propose a graph-based method to refine the relative poses of the different sensors. We demonstrate our approach using our mapping robot platform, which features twelve sensors that are to be calibrated. The experimental results confirm that the proposed algorithm yields great performance.

Viaarxiv icon

Towards Generation and Evaluation of Comprehensive Mapping Robot Datasets

May 23, 2019
Hongyu Chen, Xiting Zhao, Jianwen Luo, Zhijie Yang, Zehao Zhao, Haochuan Wan, Xiaoya Ye, Guangyuan Weng, Zhenpeng He, Tian Dong, Sören Schwertfeger

Figure 1 for Towards Generation and Evaluation of Comprehensive Mapping Robot Datasets
Figure 2 for Towards Generation and Evaluation of Comprehensive Mapping Robot Datasets
Figure 3 for Towards Generation and Evaluation of Comprehensive Mapping Robot Datasets
Figure 4 for Towards Generation and Evaluation of Comprehensive Mapping Robot Datasets

This paper presents a fully hardware synchronized mapping robot with support for a hardware synchronized external tracking system, for super-precise timing and localization. We also employ a professional, static 3D scanner for ground truth map collection. Three datasets are generated to evaluate the performance of mapping algorithms within a room and between rooms. Based on these datasets we generate maps and trajectory data, which is then fed into evaluation algorithms. The mapping and evaluation procedures are made in a very easily reproducible manner for maximum comparability. In the end we can draw a couple of conclusions about the tested SLAM algorithms.

Viaarxiv icon