Alert button
Picture for Tong Gao

Tong Gao

Alert button

Towards Automated Error Analysis: Learning to Characterize Errors

Jan 14, 2022
Tong Gao, Shivang Singh, Raymond J. Mooney

Figure 1 for Towards Automated Error Analysis: Learning to Characterize Errors
Figure 2 for Towards Automated Error Analysis: Learning to Characterize Errors
Figure 3 for Towards Automated Error Analysis: Learning to Characterize Errors
Figure 4 for Towards Automated Error Analysis: Learning to Characterize Errors

Characterizing the patterns of errors that a system makes helps researchers focus future development on increasing its accuracy and robustness. We propose a novel form of "meta learning" that automatically learns interpretable rules that characterize the types of errors that a system makes, and demonstrate these rules' ability to help understand and improve two NLP systems. Our approach works by collecting error cases on validation data, extracting meta-features describing these samples, and finally learning rules that characterize errors using these features. We apply our approach to VilBERT, for Visual Question Answering, and RoBERTa, for Common Sense Question Answering. Our system learns interpretable rules that provide insights into systemic errors these systems make on the given tasks. Using these insights, we are also able to "close the loop" and modestly improve performance of these systems.

* 12 pages, 11 figures 
Viaarxiv icon

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

Aug 14, 2021
Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin

Figure 1 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Figure 2 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Figure 3 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Figure 4 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction. MMOCR implements 14 state-of-the-art algorithms, which is significantly more than all the existing open-source OCR projects we are aware of to date. To facilitate future research and industrial applications of text recognition-related problems, we also provide a large number of trained models and detailed benchmarks to give insights into the performance of text detection, recognition and understanding. MMOCR is publicly released at https://github.com/open-mmlab/mmocr.

* Accepted to ACM MM (Open Source Competition Track) 
Viaarxiv icon

Systematic Generalization on gSCAN with Language Conditioned Embedding

Oct 04, 2020
Tong Gao, Qi Huang, Raymond J. Mooney

Figure 1 for Systematic Generalization on gSCAN with Language Conditioned Embedding
Figure 2 for Systematic Generalization on gSCAN with Language Conditioned Embedding
Figure 3 for Systematic Generalization on gSCAN with Language Conditioned Embedding
Figure 4 for Systematic Generalization on gSCAN with Language Conditioned Embedding

Systematic Generalization refers to a learning algorithm's ability to extrapolate learned behavior to unseen situations that are distinct but semantically similar to its training data. As shown in recent work, state-of-the-art deep learning models fail dramatically even on tasks for which they are designed when the test set is systematically different from the training data. We hypothesize that explicitly modeling the relations between objects in their contexts while learning their representations will help achieve systematic generalization. Therefore, we propose a novel method that learns objects' contextualized embeddings with dynamic message passing conditioned on the input natural language and end-to-end trainable with other downstream deep learning modules. To our knowledge, this model is the first one that significantly outperforms the provided baseline and reaches state-of-the-art performance on grounded-SCAN (gSCAN), a grounded natural language navigation dataset designed to require systematic generalization in its test splits.

* Accepted by AACL-IJCNLP 2020. Huang and Gao share co-first authorship, authors contribute equally and are listed in alphabetical order 
Viaarxiv icon

Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation

Feb 22, 2019
Sang-Woo Lee, Tong Gao, Sohee Yang, Jaejun Yoo, Jung-Woo Ha

Figure 1 for Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation
Figure 2 for Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation
Figure 3 for Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation
Figure 4 for Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation

Answerer in Questioner's Mind (AQM) is an information-theoretic framework that has been recently proposed for task-oriented dialog systems. AQM benefits from asking a question that would maximize the information gain when it is asked. However, due to its intrinsic nature of explicitly calculating the information gain, AQM has a limitation when the solution space is very large. To address this, we propose AQM+ that can deal with a large-scale problem and ask a question that is more coherent to the current context of the dialog. We evaluate our method on GuessWhich, a challenging task-oriented visual dialog problem, where the number of candidate classes is near 10K. Our experimental results and ablation studies show that AQM+ outperforms the state-of-the-art models by a remarkable margin with a reasonable approximation. In particular, the proposed AQM+ reduces more than 60% of error as the dialog proceeds, while the comparative algorithms diminish the error by less than 6%. Based on our results, we argue that AQM+ is a general task-oriented dialog algorithm that can be applied for non-yes-or-no responses.

* Accepted for ICLR 2019. Camera ready version. Our code is publically available: https://github.com/naver/aqm-plus 
Viaarxiv icon