Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xue Yang

Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation

Mar 25, 2022
Xue Yang, Changchun Bao

Figure 1 for Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation

Figure 2 for Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation

Figure 3 for Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation

Figure 4 for Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation

Speaker-independent speech separation has achieved remarkable performance in recent years with the development of deep neural network (DNN). Various network architectures, from traditional convolutional neural network (CNN) and recurrent neural network (RNN) to advanced transformer, have been designed sophistically to improve separation performance. However, the state-of-the-art models usually suffer from several flaws related to the computation, such as large model size, huge memory consumption and computational complexity. To find the balance between the performance and computational efficiency and to further explore the modeling ability of traditional network structure, we combine RNN and a newly proposed variant of convolutional network to cope with speech separation problem. By embedding two RNNs into basic block of this variant with the help of dual-path strategy, the proposed network can effectively learn the local information and global dependency. Besides, a four-staged structure enables the separation procedure to be performed gradually at finer and finer scales as the feature dimension increases. The experimental results on various datasets have proven the effectiveness of the proposed method and shown that a trade-off between the separation performance and computational efficiency is well achieved.

* Submitted to Interspeech 2022

Via

Access Paper or Ask Questions

A Glyph-driven Topology Enhancement Network for Scene Text Recognition

Mar 07, 2022
Tongkun Guan, Chaochen Gu, Jingzheng Tu, Xue Yang, Qi Feng

Figure 1 for A Glyph-driven Topology Enhancement Network for Scene Text Recognition

Figure 2 for A Glyph-driven Topology Enhancement Network for Scene Text Recognition

Figure 3 for A Glyph-driven Topology Enhancement Network for Scene Text Recognition

Figure 4 for A Glyph-driven Topology Enhancement Network for Scene Text Recognition

Attention-based methods by establishing one-dimensional (1D) and two-dimensional (2D) mechanisms with an encoder-decoder framework have dominated scene text recognition (STR) tasks due to their capabilities of building implicit language representations. However, 1D attention-based mechanisms suffer from alignment drift on latter characters. 2D attention-based mechanisms only roughly focus on the spatial regions of characters without excavating detailed topological structures, which reduces the visual performance. To mitigate the above issues, we propose a novel Glyph-driven Topology Enhancement Network (GTEN) to improve topological features representations in visual models for STR. Specifically, an unsupervised method is first employed to exploit 1D sequence-aligned attention weights. Second, we construct a supervised segmentation module to capture 2D ordered and pixel-wise topological information of glyphs without extra character-level annotations. Third, these resulting outputs fuse enhanced topological features to enrich semantic feature representations for STR. Experiments demonstrate that GTEN achieves competitive performance on IIIT5K-Words, Street View Text, ICDAR-series, SVT Perspective, and CUTE80 datasets.

Via

Access Paper or Ask Questions

The KFIoU Loss for Rotated Object Detection

Feb 01, 2022
Xue Yang, Yue Zhou, Gefan Zhang, Jirui Yang, Wentao Wang, Junchi Yan, Xiaopeng Zhang, Qi Tian

Figure 1 for The KFIoU Loss for Rotated Object Detection

Figure 2 for The KFIoU Loss for Rotated Object Detection

Figure 3 for The KFIoU Loss for Rotated Object Detection

Figure 4 for The KFIoU Loss for Rotated Object Detection

Differing from the well-developed horizontal object detection area whereby the computing-friendly IoU based loss is readily adopted and well fits with the detection metrics. In contrast, rotation detectors often involve a more complicated loss based on SkewIoU which is unfriendly to gradient-based training. In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss instead of the strict value-level identity. Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU by its definition, and show its alignment with the SkewIoU at trend-level. This is in contrast to recent Gaussian modeling based rotation detectors e.g. GWD, KLD that involves a human-specified distribution distance metric which requires additional hyperparameter tuning. The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU, thanks to its full differentiability and ability to handle the non-overlapping cases. We further extend our technique to the 3-D case which also suffers from the same issues as 2-D detection. Extensive results on various public datasets (2-D/3-D, aerial/text/face images) with different base detectors show the effectiveness of our approach.

* 19 pages, 5 figures, 11 tables, tensorflow code: https://github.com/yangxue0827/RotationDetection, pytorch code: https://github.com/open-mmlab/mmrotate

Via

Access Paper or Ask Questions

AlphaRotate: A Rotation Detection Benchmark using TensorFlow

Nov 12, 2021
Xue Yang, Yue Zhou, Junchi Yan

Figure 1 for AlphaRotate: A Rotation Detection Benchmark using TensorFlow

Figure 2 for AlphaRotate: A Rotation Detection Benchmark using TensorFlow

AlphaRotate is an open-source Tensorflow benchmark for performing scalable rotation detection on various datasets. It currently provides more than 18 popular rotation detection models under a single, well-documented API designed for use by both practitioners and researchers. AlphaRotate regards high performance, robustness, sustainability and scalability as the core concept of design, and all models are covered by unit testing, continuous integration, code coverage, maintainability checks, and visual monitoring and analysis. AlphaRotate can be installed from PyPI and is released under the Apache-2.0 License. Source code is available at https://github.com/yangxue0827/RotationDetection.

* 7 pages, 1 figure, 1 table

Via

Access Paper or Ask Questions

RSDet++: Point-based Modulated Loss for More Accurate Rotated Object Detection

Sep 24, 2021
Wen Qian, Xue Yang, Silong Peng, Junchi Yan, Xiujuan Zhang

Figure 1 for RSDet++: Point-based Modulated Loss for More Accurate Rotated Object Detection

Figure 2 for RSDet++: Point-based Modulated Loss for More Accurate Rotated Object Detection

Figure 3 for RSDet++: Point-based Modulated Loss for More Accurate Rotated Object Detection

Figure 4 for RSDet++: Point-based Modulated Loss for More Accurate Rotated Object Detection

We classify the discontinuity of loss in both five-param and eight-param rotated object detection methods as rotation sensitivity error (RSE) which will result in performance degeneration. We introduce a novel modulated rotation loss to alleviate the problem and propose a rotation sensitivity detection network (RSDet) which is consists of an eight-param single-stage rotated object detector and the modulated rotation loss. Our proposed RSDet has several advantages: 1) it reformulates the rotated object detection problem as predicting the corners of objects while most previous methods employ a five-para-based regression method with different measurement units. 2) modulated rotation loss achieves consistent improvement on both five-param and eight-param rotated object detection methods by solving the discontinuity of loss. To further improve the accuracy of our method on objects smaller than 10 pixels, we introduce a novel RSDet++ which is consists of a point-based anchor-free rotated object detector and a modulated rotation loss. Extensive experiments demonstrate the effectiveness of both RSDet and RSDet++, which achieve competitive results on rotated object detection in the challenging benchmarks DOTA1.0, DOTA1.5, and DOTA2.0. We hope the proposed method can provide a new perspective for designing algorithms to solve rotated object detection and pay more attention to tiny objects. The codes and models are available at: https://github.com/yangxue0827/RotationDetection.

* arXiv admin note: substantial text overlap with arXiv:1911.08299

Via

Access Paper or Ask Questions

An adaptive Origin-Destination flows cluster-detecting method to identify urban mobility trends

Jun 10, 2021
Mengyuan Fang, Luliang Tang, Zihan Kan, Xue Yang, Tao Pei, Qingquan Li, Chaokui Li

Figure 1 for An adaptive Origin-Destination flows cluster-detecting method to identify urban mobility trends

Figure 2 for An adaptive Origin-Destination flows cluster-detecting method to identify urban mobility trends

Figure 3 for An adaptive Origin-Destination flows cluster-detecting method to identify urban mobility trends

Figure 4 for An adaptive Origin-Destination flows cluster-detecting method to identify urban mobility trends

Origin-Destination (OD) flow, as an abstract representation of the object`s movement or interaction, has been used to reveal the urban mobility and human-land interaction pattern. As an important spatial analysis approach, the clustering methods of point events have been extended to OD flows to identify the dominant trends and spatial structures of urban mobility. However, the existing methods for OD flow cluster-detecting are limited both in specific spatial scale and the uncertain result due to different parameters setting, which is difficult for complicated OD flows clustering under spatial heterogeneity. To address these limitations, in this paper, we proposed a novel OD flows cluster-detecting method based on the OPTICS algorithm which can identify OD flow clusters with various aggregation scales. The method can adaptively determine parameter value from the dataset without prior knowledge and artificial intervention. Experiments indicated that our method outperformed three state-of-the-art methods with more accurate and complete of clusters and less noise. As a case study, our method is applied to identify the potential routes for public transport service settings by detecting OD flow clusters within urban travel data.

Via

Access Paper or Ask Questions

Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence

Jun 04, 2021
Xue Yang, Xiaojiang Yang, Jirui Yang, Qi Ming, Wentao Wang, Qi Tian, Junchi Yan

Figure 1 for Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence

Figure 2 for Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence

Figure 3 for Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence

Figure 4 for Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence

Existing rotated object detectors are mostly inherited from the horizontal detection paradigm, as the latter has evolved into a well-developed area. However, these detectors are difficult to perform prominently in high-precision detection due to the limitation of current regression loss design, especially for objects with large aspect ratios. Taking the perspective that horizontal detection is a special case for rotated object detection, in this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology, in terms of the relation between rotation and horizontal detection. We show that one essential challenge is how to modulate the coupled parameters in the rotation regression loss, as such the estimated parameters can influence to each other during the dynamic joint optimization, in an adaptive and synergetic way. Specifically, we first convert the rotated bounding box into a 2-D Gaussian distribution, and then calculate the Kullback-Leibler Divergence (KLD) between the Gaussian distributions as the regression loss. By analyzing the gradient of each parameter, we show that KLD (and its derivatives) can dynamically adjust the parameter gradients according to the characteristics of the object. It will adjust the importance (gradient weight) of the angle parameter according to the aspect ratio. This mechanism can be vital for high-precision detection as a slight angle error would cause a serious accuracy drop for large aspect ratios objects. More importantly, we have proved that KLD is scale invariant. We further show that the KLD loss can be degenerated into the popular $l_{n}$-norm loss for horizontal detection. Experimental results on seven datasets using different detectors show its consistent superiority, and codes are available at https://github.com/yangxue0827/RotationDetection.

* 15 pages, 5 figures, 7 tables

Via

Access Paper or Ask Questions

Optimization for Oriented Object Detection via Representation Invariance Loss

Mar 22, 2021
Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, Yunpeng Dong

Figure 1 for Optimization for Oriented Object Detection via Representation Invariance Loss

Figure 2 for Optimization for Oriented Object Detection via Representation Invariance Loss

Figure 3 for Optimization for Oriented Object Detection via Representation Invariance Loss

Figure 4 for Optimization for Oriented Object Detection via Representation Invariance Loss

Arbitrary-oriented objects exist widely in natural scenes, and thus the oriented object detection has received extensive attention in recent years. The mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (QBB) to represent the rotating objects. However, these methods suffer from the representation ambiguity for oriented object definition, which leads to suboptimal regression optimization and the inconsistency between the loss metric and the localization accuracy of the predictions. In this paper, we propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects. Specifically, RIL treats multiple representations of an oriented object as multiple equivalent local minima, and hence transforms bounding box regression into an adaptive matching process with these local minima. Then, the Hungarian matching algorithm is adopted to obtain the optimal regression strategy. We also propose a normalized rotation loss to alleviate the weak correlation between different variables and their unbalanced loss contribution in OBB representation. Extensive experiments on remote sensing datasets and scene text datasets show that our method achieves consistent and substantial improvement. The source code and trained models are available at https://github.com/ming71/RIDet.

* The code and models are available at https://github.com/ming71/RIDet

Via

Access Paper or Ask Questions

Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss

Jan 28, 2021
Xue Yang, Junchi Yan, Qi Ming, Wentao Wang, Xiaopeng Zhang, Qi Tian

Figure 1 for Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss

Figure 2 for Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss

Figure 3 for Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss

Figure 4 for Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss

Boundary discontinuity and its inconsistency to the final detection metric have been the bottleneck for rotating detection regression loss design. In this paper, we propose a novel regression loss based on Gaussian Wasserstein distance as a fundamental approach to solve the problem. Specifically, the rotated bounding box is converted to a 2-D Gaussian distribution, which enables to approximate the indifferentiable rotational IoU induced loss by the Gaussian Wasserstein distance (GWD) which can be learned efficiently by gradient back-propagation. GWD can still be informative for learning even there is no overlapping between two rotating bounding boxes which is often the case for small object detection. Thanks to its three unique properties, GWD can also elegantly solve the boundary discontinuity and square-like problem regardless how the bounding box is defined. Experiments on five datasets using different detectors show the effectiveness of our approach. Codes are available at https://github.com/yangxue0827/RotationDetection.

* 15 pages, 6 figures, 9 tables, codes are available at https://github.com/yangxue0827/RotationDetection

Via

Access Paper or Ask Questions