It is challenging to detect curve texts due to their irregular shapes and varying sizes. In this paper, we first investigate the deficiency of the existing curve detection methods and then propose a novel Conditional Spatial Expansion (CSE) mechanism to improve the performance of curve text detection. Instead of regarding the curve text detection as a polygon regression or a segmentation problem, we treat it as a region expansion process. Our CSE starts with a seed arbitrarily initialized within a text region and progressively merges neighborhood regions based on the extracted local features by a CNN and contextual information of merged regions. The CSE is highly parameterized and can be seamlessly integrated into existing object detection frameworks. Enhanced by the data-dependent CSE mechanism, our curve text detection system provides robust instance-level text region extraction with minimal post-processing. The analysis experiment shows that our CSE can handle texts with various shapes, sizes, and orientations, and can effectively suppress the false-positives coming from text-like textures or unexpected texts included in the same RoI. Compared with the existing curve text detection algorithms, our method is more robust and enjoys a simpler processing flow. It also creates a new state-of-art performance on curve text benchmarks with F-score of up to 78.4$\%$.
In this work, we propose a novel hybrid method for scene text detection namely Correlation Propagation Network (CPN). It is an end-to-end trainable framework engined by advanced Convolutional Neural Networks. Our CPN predicts text objects according to both top-down observations and the bottom-up cues. Multiple candidate boxes are assembled by a spatial communication mechanism call Correlation Propagation (CP). The extracted spatial features by CNN are regarded as node features in a latticed graph and Correlation Propagation algorithm runs distributively on each node to update the hypothesis of corresponding object centers. The CP process can flexibly handle scale-varying and rotated text objects without using predefined bounding box templates. Benefit from its distributive nature, CPN is computationally efficient and enjoys a high level of parallelism. Moreover, we introduce deformable convolution to the backbone network to enhance the adaptability to long texts. The evaluation on public benchmarks shows that the proposed method achieves state-of-art performance, and it significantly outperforms the existing methods for handling multi-scale and multi-oriented text objects with much lower computation cost.
A novel framework named Markov Clustering Network (MCN) is proposed for fast and robust scene text detection. MCN predicts instance-level bounding boxes by firstly converting an image into a Stochastic Flow Graph (SFG) and then performing Markov Clustering on this graph. Our method can detect text objects with arbitrary size and orientation without prior knowledge of object size. The stochastic flow graph encode objects' local correlation and semantic information. An object is modeled as strongly connected nodes, which allows flexible bottom-up detection for scale-varying and rotated objects. MCN generates bounding boxes without using Non-Maximum Suppression, and it can be fully parallelized on GPUs. The evaluation on public benchmarks shows that our method outperforms the existing methods by a large margin in detecting multioriented text objects. MCN achieves new state-of-art performance on challenging MSRA-TD500 dataset with precision of 0.88, recall of 0.79 and F-score of 0.83. Also, MCN achieves realtime inference with frame rate of 34 FPS, which is $1.5\times$ speedup when compared with the fastest scene text detection algorithm.