In recent years, scene text recognition is always regarded as a sequence-to-sequence problem. Connectionist Temporal Classification (CTC) and Attentional sequence recognition (Attn) are two very prevailing approaches to tackle this problem while they may fail in some scenarios respectively. CTC concentrates more on every individual character but is weak in text semantic dependency modeling. Attn based methods have better context semantic modeling ability while tends to overfit on limited training data. In this paper, we elaborately design a Rectified Attentional Double Supervised Network (ReADS) for general scene text recognition. To overcome the weakness of CTC and Attn, both of them are applied in our method but with different modules in two supervised branches which can make a complementary to each other. Moreover, effective spatial and channel attention mechanisms are introduced to eliminate background noise and extract valid foreground information. Finally, a simple rectified network is implemented to rectify irregular text. The ReADS can be trained end-to-end and only word-level annotations are required. Extensive experiments on various benchmarks verify the effectiveness of ReADS which achieves state-of-the-art performance.
Chinese scene text reading is one of the most challenging problems in computer vision and has attracted great interest. Different from English text, Chinese has more than 6000 commonly used characters and Chinesecharacters can be arranged in various layouts with numerous fonts. The Chinese signboards in street view are a good choice for Chinese scene text images since they have different backgrounds, fonts and layouts. We organized a competition called ICDAR2019-ReCTS, which mainly focuses on reading Chinese text on signboard. This report presents the final results of the competition. A large-scale dataset of 25,000 annotated signboard images, in which all the text lines and characters are annotated with locations and transcriptions, were released. Four tasks, namely character recognition, text line recognition, text line detection and end-to-end recognition were set up. Besides, considering the Chinese text ambiguity issue, we proposed a multi ground truth (multi-GT) evaluation method to make evaluation fairer. The competition started on March 1, 2019 and ended on April 30, 2019. 262 submissions from 46 teams are received. Most of the participants come from universities, research institutes, and tech companies in China. There are also some participants from the United States, Australia, Singapore, and Korea. 21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4. The official website for the competition is http://rrc.cvc.uab.es/?ch=12.
Strong gravitational lensing of astrophysical sources by foreground galaxies is a powerful cosmological tool. While such lens systems are relatively rare in the Universe, the number of detectable galaxy-scale strong lenses is expected to grow dramatically with next-generation optical surveys, numbering in the hundreds of thousands, out of tens of billions of candidate images. Automated and efficient approaches will be necessary in order to find and analyze these strong lens systems. To this end, we implement a novel, modular, end-to-end deep learning pipeline for denoising, deblending, searching, and modeling galaxy-galaxy strong lenses (GGSLs). To train and quantify the performance of our pipeline, we create a dataset of 1 million synthetic strong lensing images using state-of-the-art simulations for next-generation sky surveys. When these pretrained modules were used as a pipeline for inference, we found that the classification (searching GGSL) accuracy improved significantly---from 82% with the baseline to 90%, while the regression (modeling GGSL) accuracy improved by 25% over the baseline.
It is a long-standing goal of artificial intelligence (AI) to be superior to human beings in decision making. Games are suitable for testing AI capabilities of making good decisions in non-numerical tasks. In this paper, we develop a new AI algorithm to play the penny-matching game considered in Shannon's "mind-reading machine" (1953) against human players. In particular, we exploit cognitive hierarchy theory and Bayesian learning techniques to continually evolve a model for predicting human player decisions, and let the AI player make decisions according to the model predictions to pursue the best chance of winning. Experimental results show that our AI algorithm beats 27 out of 30 volunteer human players.
For a foreseeable future, autonomous vehicles (AVs) will operate in traffic together with human-driven vehicles. The AV planning and control systems need extensive testing, including early-stage testing in simulations where the interactions among autonomous/human-driven vehicles are represented. Motivated by the need for such simulation tools, we propose a game-theoretic approach to modeling vehicle interactions, in particular, for urban traffic environments with unsignalized intersections. We develop traffic models with heterogeneous (in terms of their driving styles) and interactive vehicles based on our proposed approach, and use them for virtual testing, evaluation, and calibration of AV control systems. For illustration, we consider two AV control approaches, analyze their characteristics and performance based on the simulation results with our developed traffic models, and optimize the parameters of one of them.
Terrain traversability analysis is a fundamental issue to achieve the autonomy of a robot at off-road environments. Geometry-based and appearance-based methods have been studied in decades, while behavior-based methods exploiting learning from demonstration (LfD) are new trends. Behavior-based methods learn cost functions that guide trajectory planning in compliance with experts' demonstrations, which can be more scalable to various scenes and driving behaviors. This research proposes a method of off-road traversability analysis and trajectory planning using Deep Maximum Entropy Inverse Reinforcement Learning. To incorporate vehicle's kinematics while solving the problem of exponential increase of state-space complexity, two convolutional neural networks, i.e., RL ConvNet and Svf ConvNet, are developed to encode kinematics into convolution kernels and achieve efficient forward reinforcement learning. We conduct experiments in off-road environments. Scene maps are generated using 3D LiDAR data, and expert demonstrations are either the vehicle's real driving trajectories at the scene or synthesized ones to represent specific behaviors such as crossing negative obstacles. Four cost functions of traversability analysis are learned and tested at various scenes of capability in guiding the trajectory planning of different behaviors. We also demonstrate the computation efficiency of the proposed method.
n this paper, we describe an integrated framework for autonomous decision making in a dynamic and interactive environment. We model the interactions between the ego agent and its operating environment as a two-player dynamic game, and integrate cognitive behavioral models, Bayesian inference, and receding-horizon optimal control to define a dynamically-evolving decision strategy for the ego agent. Simulation examples representing autonomous vehicle control in three traffic scenarios where the autonomous ego vehicle interacts with a human-driven vehicle are reported.
In this paper we present BPnP, a novel method to do back-propagation through a PnP solver. We show that the gradients of such geometric optimization process can be computed using the Implicit Function Theorem as if it is differentiable. Furthermore, we develop a residual-conformity trick to make end-to-end pose regression using BPnP smooth and stable. We also propose a "march in formation" algorithm which successfully uses BPnP for keypoint regression. Our invention opens a door to vast possibilities. The ability to incorporate geometric optimization in end-to-end learning will greatly boost its power and promote innovations in various computer vision tasks.
In this paper, we describe a framework for autonomous decision making in a dynamic and interactive environment based on cognitive hierarchy theory. We model the interactions between the ego agent and its operating environment as a two-player dynamic game, and integrate cognitive behavioral models, Bayesian inference, and receding-horizon optimal control to define a dynamically-evolving decision strategy for the ego agent. Simulation examples representing autonomous vehicle control in three traffic scenarios where the autonomous ego vehicle interacts with a human-driven vehicle are reported.