Alert button
Picture for Zhiyong Zhang

Zhiyong Zhang

Alert button

Challenges of Indoor SLAM: A multi-modal multi-floor dataset for SLAM evaluation

Jun 14, 2023
Pushyami Kaveti, Aniket Gupta, Dennis Giaya, Madeline Karp, Colin Keil, Jagatpreet Nir, Zhiyong Zhang, Hanumant Singh

Figure 1 for Challenges of Indoor SLAM: A multi-modal multi-floor dataset for SLAM evaluation
Figure 2 for Challenges of Indoor SLAM: A multi-modal multi-floor dataset for SLAM evaluation
Figure 3 for Challenges of Indoor SLAM: A multi-modal multi-floor dataset for SLAM evaluation
Figure 4 for Challenges of Indoor SLAM: A multi-modal multi-floor dataset for SLAM evaluation

Robustness in Simultaneous Localization and Mapping (SLAM) remains one of the key challenges for the real-world deployment of autonomous systems. SLAM research has seen significant progress in the last two and a half decades, yet many state-of-the-art (SOTA) algorithms still struggle to perform reliably in real-world environments. There is a general consensus in the research community that we need challenging real-world scenarios which bring out different failure modes in sensing modalities. In this paper, we present a novel multi-modal indoor SLAM dataset covering challenging common scenarios that a robot will encounter and should be robust to. Our data was collected with a mobile robotics platform across multiple floors at Northeastern University's ISEC building. Such a multi-floor sequence is typical of commercial office spaces characterized by symmetry across floors and, thus, is prone to perceptual aliasing due to similar floor layouts. The sensor suite comprises seven global shutter cameras, a high-grade MEMS inertial measurement unit (IMU), a ZED stereo camera, and a 128-channel high-resolution lidar. Along with the dataset, we benchmark several SLAM algorithms and highlight the problems faced during the runs, such as perceptual aliasing, visual degradation, and trajectory drift. The benchmarking results indicate that parts of the dataset work well with some algorithms, while other data sections are challenging for even the best SOTA algorithms. The dataset is available at https://github.com/neufieldrobotics/NUFR-M3F.

Viaarxiv icon

An atrium segmentation network with location guidance and siamese adjustment

Jan 11, 2023
Yuhan Xie, Zhiyong Zhang, Shaolong Chen, Changzhen Qiu

Figure 1 for An atrium segmentation network with location guidance and siamese adjustment
Figure 2 for An atrium segmentation network with location guidance and siamese adjustment
Figure 3 for An atrium segmentation network with location guidance and siamese adjustment
Figure 4 for An atrium segmentation network with location guidance and siamese adjustment

The segmentation of atrial scan images is of great significance for the three-dimensional reconstruction of the atrium and the surgical positioning. Most of the existing segmentation networks adopt a 2D structure and only take original images as input, ignoring the context information of 3D images and the role of prior information. In this paper, we propose an atrium segmentation network LGSANet with location guidance and siamese adjustment, which takes adjacent three slices of images as input and adopts an end-to-end approach to achieve coarse-to-fine atrial segmentation. The location guidance(LG) block uses the prior information of the localization map to guide the encoding features of the fine segmentation stage, and the siamese adjustment(SA) block uses the context information to adjust the segmentation edges. On the atrium datasets of ACDC and ASC, sufficient experiments prove that our method can adapt to many classic 2D segmentation networks, so that it can obtain significant performance improvements.

* 17 pages,9 figures 
Viaarxiv icon

Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction

Dec 31, 2022
Jie Feng, Ruimin Feng, Qing Wu, Zhiyong Zhang, Yuyao Zhang, Hongjiang Wei

Figure 1 for Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction
Figure 2 for Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction
Figure 3 for Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction
Figure 4 for Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction

Supervised Deep-Learning (DL)-based reconstruction algorithms have shown state-of-the-art results for highly-undersampled dynamic Magnetic Resonance Imaging (MRI) reconstruction. However, the requirement of excessive high-quality ground-truth data hinders their applications due to the generalization problem. Recently, Implicit Neural Representation (INR) has appeared as a powerful DL-based tool for solving the inverse problem by characterizing the attributes of a signal as a continuous function of corresponding coordinates in an unsupervised manner. In this work, we proposed an INR-based method to improve dynamic MRI reconstruction from highly undersampled k-space data, which only takes spatiotemporal coordinates as inputs. Specifically, the proposed INR represents the dynamic MRI images as an implicit function and encodes them into neural networks. The weights of the network are learned from sparsely-acquired (k, t)-space data itself only, without external training datasets or prior images. Benefiting from the strong implicit continuity regularization of INR together with explicit regularization for low-rankness and sparsity, our proposed method outperforms the compared scan-specific methods at various acceleration factors. E.g., experiments on retrospective cardiac cine datasets show an improvement of 5.5 ~ 7.1 dB in PSNR for extremely high accelerations (up to 41.6-fold). The high-quality and inner continuity of the images provided by INR has great potential to further improve the spatiotemporal resolution of dynamic MRI, without the need of any training data.

* 9 pages, 5 figures 
Viaarxiv icon

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

Oct 25, 2022
Xulong Zhang, Jianzong Wang, Ning Cheng, Mengyuan Zhao, Zhiyong Zhang, Jing Xiao

Figure 1 for Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition
Figure 2 for Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition
Figure 3 for Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition
Figure 4 for Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

The recent emergence of joint CTC-Attention model shows significant improvement in automatic speech recognition (ASR). The improvement largely lies in the modeling of linguistic information by decoder. The decoder joint-optimized with an acoustic encoder renders the language model from ground-truth sequences in an auto-regressive manner during training. However, the training corpus of the decoder is limited to the speech transcriptions, which is far less than the corpus needed to train an acceptable language model. This leads to poor robustness of decoder. To alleviate this problem, we propose linguistic-enhanced transformer, which introduces refined CTC information to decoder during training process, so that the decoder can be more robust. Our experiments on AISHELL-1 speech corpus show that the character error rate (CER) is relatively reduced by up to 7%. We also find that in joint CTC-Attention ASR model, decoder is more sensitive to linguistic information than acoustic information.

* Accepted by ECAISS2022, The Fourth International Workshop on Edge Computing and Artificial Intelligence based Sensor-Cloud System 
Viaarxiv icon

Automatic segmentation of meniscus based on MAE self-supervision and point-line weak supervision paradigm

May 07, 2022
Yuhan Xie, Kexin Jiang, Zhiyong Zhang, Shaolong Chen, Xiaodong Zhang, Changzhen Qiu

Figure 1 for Automatic segmentation of meniscus based on MAE self-supervision and point-line weak supervision paradigm
Figure 2 for Automatic segmentation of meniscus based on MAE self-supervision and point-line weak supervision paradigm
Figure 3 for Automatic segmentation of meniscus based on MAE self-supervision and point-line weak supervision paradigm
Figure 4 for Automatic segmentation of meniscus based on MAE self-supervision and point-line weak supervision paradigm

Medical image segmentation based on deep learning is often faced with the problems of insufficient datasets and long time-consuming labeling. In this paper, we introduce the self-supervised method MAE(Masked Autoencoders) into knee joint images to provide a good initial weight for the segmentation model and improve the adaptability of the model to small datasets. Secondly, we propose a weakly supervised paradigm for meniscus segmentation based on the combination of point and line to reduce the time of labeling. Based on the weak label ,we design a region growing algorithm to generate pseudo-label. Finally we train the segmentation network based on pseudo-labels with weight transfer from self-supervision. Sufficient experimental results show that our proposed method combining self-supervision and weak supervision can almost approach the performance of purely fully supervised models while greatly reducing the required labeling time and dataset size.

* 8 pages,10 figures 
Viaarxiv icon

An Iterative Labeling Method for Annotating Fisheries Imagery

Apr 27, 2022
Zhiyong Zhang, Pushyami Kaveti, Hanumant Singh, Abigail Powell, Erica Fruh, M. Elizabeth Clarke

Figure 1 for An Iterative Labeling Method for Annotating Fisheries Imagery
Figure 2 for An Iterative Labeling Method for Annotating Fisheries Imagery
Figure 3 for An Iterative Labeling Method for Annotating Fisheries Imagery
Figure 4 for An Iterative Labeling Method for Annotating Fisheries Imagery

In this paper, we present a methodology for fisheries-related data that allows us to converge on a labeled image dataset by iterating over the dataset with multiple training and production loops that can exploit crowdsourcing interfaces. We present our algorithm and its results on two separate sets of image data collected using the Seabed autonomous underwater vehicle. The first dataset comprises of 2,026 completely unlabeled images, while the second consists of 21,968 images that were point annotated by experts. Our results indicate that training with a small subset and iterating on that to build a larger set of labeled data allows us to converge to a fully annotated dataset with a small number of iterations. Even in the case of a dataset labeled by experts, a single iteration of the methodology improves the labels by discovering additional complicated examples of labels associated with fish that overlap, are very small, or obscured by the contrast limitations associated with underwater imagery.

Viaarxiv icon

Towards A COLREGs Compliant Autonomous Surface Vessel in a Constrained Channel

Apr 27, 2022
James Connor Meyers, Thomas Sayre McCord, Zhiyong Zhang, Hanumant Singh

Figure 1 for Towards A COLREGs Compliant Autonomous Surface Vessel in a Constrained Channel
Figure 2 for Towards A COLREGs Compliant Autonomous Surface Vessel in a Constrained Channel
Figure 3 for Towards A COLREGs Compliant Autonomous Surface Vessel in a Constrained Channel
Figure 4 for Towards A COLREGs Compliant Autonomous Surface Vessel in a Constrained Channel

In this paper, we look at the role of autonomous navigation in the maritime domain. Specifically, we examine how an Autonomous Surface Vessel(ASV) can achieve obstacle avoidance based on the Convention on the International Regulations for Preventing Collisions at Sea (1972), or COLREGs, in real-world environments. Our ASV is equipped with a broadband marine radar, an Inertial Navigation System (INS), and uses official Electronic Navigational Charts (ENC). These sensors are used to provide situational awareness and, in series of well-defined steps, we can exclude land objects from the radar data, extract tracks associated with moving vessels within range of the radar, and then use a Kalman Filter to track and predict the motion of other moving vessels in the vicinity. A Constant Velocity model for the Kalman Filter allows us to solve the data association to build a consistent model between successive radar scans. We account for multiple COLREGs situations based on the predicted relative motion. Finally, an efficient path planning algorithm is presented to find a path and publish waypoints to perform real-time COLREGs compliant autonomous navigation within highly constrained environments. We demonstrate the results of our framework with operational results collected over the course of a 3.4 nautical mile mission on the Charles River in Boston in which the ASV encountered and successfully navigated multiple scenarios and encounters with other moving vessels at close quarters.

Viaarxiv icon

FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations

Nov 15, 2021
Yuyang Sun, Zhiyong Zhang, Changzhen Qiu, Liang Wang, Zekai Wang

Figure 1 for FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations
Figure 2 for FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations
Figure 3 for FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations
Figure 4 for FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations

With the rapid development of generation model, AI-based face manipulation technology, which called DeepFakes, has become more and more realistic. This means of face forgery can attack any target, which poses a new threat to personal privacy and property security. Moreover, the misuse of synthetic video shows potential dangers in many areas, such as identity harassment, pornography and news rumors. Inspired by the fact that the spatial coherence and temporal consistency of physiological signal are destroyed in the generated content, we attempt to find inconsistent patterns that can distinguish between real videos and synthetic videos from the variations of facial pixels, which are highly related to physiological information. Our approach first applies Eulerian Video Magnification (EVM) at multiple Gaussian scales to the original video to enlarge the physiological variations caused by the change of facial blood volume, and then transform the original video and magnified videos into a Multi-Scale Eulerian Magnified Spatial-Temporal map (MEMSTmap), which can represent time-varying physiological enhancement sequences on different octaves. Then, these maps are reshaped into frame patches in column units and sent to the vision Transformer to learn the spatio-time descriptors of frame levels. Finally, we sort out the feature embedding and output the probability of judging whether the video is real or fake. We validate our method on the FaceForensics++ and DeepFake Detection datasets. The results show that our model achieves excellent performance in forgery detection, and also show outstanding generalization capability in cross-data domain.

Viaarxiv icon

Large-scale Transfer Learning for Low-resource Spoken Language Understanding

Aug 13, 2020
Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao

Figure 1 for Large-scale Transfer Learning for Low-resource Spoken Language Understanding
Figure 2 for Large-scale Transfer Learning for Low-resource Spoken Language Understanding
Figure 3 for Large-scale Transfer Learning for Low-resource Spoken Language Understanding
Figure 4 for Large-scale Transfer Learning for Low-resource Spoken Language Understanding

End-to-end Spoken Language Understanding (SLU) models are made increasingly large and complex to achieve the state-ofthe-art accuracy. However, the increased complexity of a model can also introduce high risk of over-fitting, which is a major challenge in SLU tasks due to the limitation of available data. In this paper, we propose an attention-based SLU model together with three encoder enhancement strategies to overcome data sparsity challenge. The first strategy focuses on the transferlearning approach to improve feature extraction capability of the encoder. It is implemented by pre-training the encoder component with a quantity of Automatic Speech Recognition annotated data relying on the standard Transformer architecture and then fine-tuning the SLU model with a small amount of target labelled data. The second strategy adopts multitask learning strategy, the SLU model integrates the speech recognition model by sharing the same underlying encoder, such that improving robustness and generalization ability. The third strategy, learning from Component Fusion (CF) idea, involves a Bidirectional Encoder Representation from Transformer (BERT) model and aims to boost the capability of the decoder with an auxiliary network. It hence reduces the risk of over-fitting and augments the ability of the underlying encoder, indirectly. Experiments on the FluentAI dataset show that cross-language transfer learning and multi-task strategies have been improved by up to 4:52% and 3:89% respectively, compared to the baseline.

* will be presented in INTERSPEECH 2020 
Viaarxiv icon

Machine learning driven synthesis of few-layered WTe2

Oct 10, 2019
Manzhang Xu, Bijun Tang, Chao Zhu, Yuhao Lu, Chao Zhu, Lu Zheng, Jingyu Zhang, Nannan Han, Yuxi Guo, Jun Di, Pin Song, Yongmin He, Lixing Kang, Zhiyong Zhang, Wu Zhao, Cuntai Guan, Xuewen Wang, Zheng Liu

Figure 1 for Machine learning driven synthesis of few-layered WTe2
Figure 2 for Machine learning driven synthesis of few-layered WTe2
Figure 3 for Machine learning driven synthesis of few-layered WTe2
Figure 4 for Machine learning driven synthesis of few-layered WTe2

Reducing the lateral scale of two-dimensional (2D) materials to one-dimensional (1D) has attracted substantial research interest not only to achieve competitive electronic device applications but also for the exploration of fundamental physical properties. Controllable synthesis of high-quality 1D nanoribbons (NRs) is thus highly desirable and essential for the further study. Traditional exploration of the optimal synthesis conditions of novel materials is based on the trial-and-error approach, which is time consuming, costly and laborious. Recently, machine learning (ML) has demonstrated promising capability in guiding material synthesis through effectively learning from the past data and then making recommendations. Here, we report the implementation of supervised ML for the chemical vapor deposition (CVD) synthesis of high-quality 1D few-layered WTe2 nanoribbons (NRs). The synthesis parameters of the WTe2 NRs are optimized by the trained ML model. On top of that, the growth mechanism of as-synthesized 1T' few-layered WTe2 NRs is further proposed, which may inspire the growth strategies for other 1D nanostructures. Our findings suggest that ML is a powerful and efficient approach to aid the synthesis of 1D nanostructures, opening up new opportunities for intelligent material development.

Viaarxiv icon