Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhibo Yang

Taking a Respite from Representation Learning for Molecular Property Prediction

Oct 07, 2022

Jianyuan Deng, Zhibo Yang, Hehe Wang, Iwao Ojima, Dimitris Samaras, Fusheng Wang

Figure 1 for Taking a Respite from Representation Learning for Molecular Property Prediction

Figure 2 for Taking a Respite from Representation Learning for Molecular Property Prediction

Figure 3 for Taking a Respite from Representation Learning for Molecular Property Prediction

Figure 4 for Taking a Respite from Representation Learning for Molecular Property Prediction

Abstract:Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite the boom of AI techniques in molecular representation learning, some key aspects underlying molecular property prediction haven't been carefully examined yet. In this study, we conducted a systematic comparison on three representative models, random forest, MolBERT and GROVER, which utilize three major molecular representations, extended-connectivity fingerprints, SMILES strings and molecular graphs, respectively. Notably, MolBERT and GROVER, are pretrained on large-scale unlabelled molecule corpuses in a self-supervised manner. In addition to the commonly used MoleculeNet benchmark datasets, we also assembled a suite of opioids-related datasets for downstream prediction evaluation. We first conducted dataset profiling on label distribution and structural analyses; we also examined the activity cliffs issue in the opioids-related datasets. Then, we trained 4,320 predictive models and evaluated the usefulness of the learned representations. Furthermore, we explored into the model evaluation by studying the effect of statistical tests, evaluation metrics and task settings. Finally, we dissected the chemical space generalization into inter-scaffold and intra-scaffold generalization and measured prediction performance to evaluate model generalizbility under both settings. By taking this respite, we reflected on the key aspects underlying molecular property prediction, the awareness of which can, hopefully, bring better AI techniques in this field.

Via

Access Paper or Ask Questions

Target-absent Human Attention

Jul 04, 2022

Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai, Dimitris Samaras

Figure 1 for Target-absent Human Attention

Figure 2 for Target-absent Human Attention

Figure 3 for Target-absent Human Attention

Figure 4 for Target-absent Human Attention

Abstract:The prediction of human gaze behavior is important for building human-computer interactive systems that can anticipate a user's attention. Computer vision models have been developed to predict the fixations made by people as they search for target objects. But what about when the image has no target? Equally important is to know how people search when they cannot find a target, and when they would stop searching. In this paper, we propose the first data-driven computational model that addresses the search-termination problem and predicts the scanpath of search fixations made by people searching for targets that do not appear in images. We model visual search as an imitation learning problem and represent the internal knowledge that the viewer acquires through fixations using a novel state representation that we call Foveated Feature Maps (FFMs). FFMs integrate a simulated foveated retina into a pretrained ConvNet that produces an in-network feature pyramid, all with minimal computational overhead. Our method integrates FFMs as the state representation in inverse reinforcement learning. Experimentally, we improve the state of the art in predicting human target-absent search behavior on the COCO-Search18 dataset

* Accepted to ECCV2022

Via

Access Paper or Ask Questions

Vision-Language Pre-Training for Boosting Scene Text Detectors

Apr 29, 2022

Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao

Figure 1 for Vision-Language Pre-Training for Boosting Scene Text Detectors

Figure 2 for Vision-Language Pre-Training for Boosting Scene Text Detectors

Figure 3 for Vision-Language Pre-Training for Boosting Scene Text Detectors

Figure 4 for Vision-Language Pre-Training for Boosting Scene Text Detectors

Abstract:Recently, vision-language joint representation learning has proven to be highly effective in various scenarios. In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language. Concretely, we propose to learn contextualized, joint representations through vision-language pre-training, for the sake of enhancing the performance of scene text detectors. Towards this end, we devise a pre-training architecture with an image encoder, a text encoder and a cross-modal encoder, as well as three pretext tasks: image-text contrastive learning (ITC), masked language modeling (MLM) and word-in-image prediction (WIP). The pre-trained model is able to produce more informative representations with richer semantics, which could readily benefit existing scene text detectors (such as EAST and PSENet) in the down-stream text detection task. Extensive experiments on standard benchmarks demonstrate that the proposed paradigm can significantly improve the performance of various representative text detectors, outperforming previous pre-training approaches. The code and pre-trained models will be publicly released.

* Accepted by CVPR 2022

Via

Access Paper or Ask Questions

Revisiting Document Image Dewarping by Grid Regularization

Mar 31, 2022

Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Gui-Song Xia

Figure 1 for Revisiting Document Image Dewarping by Grid Regularization

Figure 2 for Revisiting Document Image Dewarping by Grid Regularization

Figure 3 for Revisiting Document Image Dewarping by Grid Regularization

Figure 4 for Revisiting Document Image Dewarping by Grid Regularization

Abstract:This paper addresses the problem of document image dewarping, which aims at eliminating the geometric distortion in document images for document digitization. Instead of designing a better neural network to approximate the optical flow fields between the inputs and outputs, we pursue the best readability by taking the text lines and the document boundaries into account from a constrained optimization perspective. Specifically, our proposed method first learns the boundary points and the pixels in the text lines and then follows the most simple observation that the boundaries and text lines in both horizontal and vertical directions should be kept after dewarping to introduce a novel grid regularization scheme. To obtain the final forward mapping for dewarping, we solve an optimization problem with our proposed grid regularization. The experiments comprehensively demonstrate that our proposed approach outperforms the prior arts by large margins in terms of readability (with the metrics of Character Errors Rate and the Edit Distance) while maintaining the best image quality on the publicly-available DocUNet benchmark.

Via

Access Paper or Ask Questions

ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

Oct 20, 2021

Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, Tong Lu

Figure 1 for ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

Figure 2 for ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

Figure 3 for ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

Figure 4 for ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

Abstract:Recent approaches for end-to-end text spotting have achieved promising results. However, most of the current spotters were plagued by the inconsistency problem between text detection and recognition. In this work, we introduce and prove the existence of the inconsistency problem and analyze it from two aspects: (1) inconsistency of text recognition features between training and testing, and (2) inconsistency of optimization targets between text detection and recognition. To solve the aforementioned issues, we propose a differentiable Auto-Rectification Module (ARM) together with a new training strategy to enable propagating recognition loss back into detection branch, so that our detection branch can be jointly optimized by detection and recognition targets, which largely alleviates the inconsistency problem between text detection and recognition. Based on these designs, we present a simple yet robust end-to-end text spotting framework, termed Auto-Rectification Text Spotter (ARTS), to detect and recognize arbitrarily-shaped text in natural scenes. Extensive experiments demonstrate the superiority of our method. In particular, our ARTS-S achieves 77.1% end-to-end text spotting F-measure on Total-Text at a competitive speed of 10.5 FPS, which significantly outperforms previous methods in both accuracy and inference speed.

Via

Access Paper or Ask Questions

Parsing Table Structures in the Wild

Sep 06, 2021

Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, Gui-Song Xia

Figure 1 for Parsing Table Structures in the Wild

Figure 2 for Parsing Table Structures in the Wild

Figure 3 for Parsing Table Structures in the Wild

Figure 4 for Parsing Table Structures in the Wild

Abstract:This paper tackles the problem of table structure parsing (TSP) from images in the wild. In contrast to existing studies that mainly focus on parsing well-aligned tabular images with simple layouts from scanned PDF documents, we aim to establish a practical table structure parsing system for real-world scenarios where tabular input images are taken or scanned with severe deformation, bending or occlusions. For designing such a system, we propose an approach named Cycle-CenterNet on the top of CenterNet with a novel cycle-pairing module to simultaneously detect and group tabular cells into structured tables. In the cycle-pairing module, a new pairing loss function is proposed for the network training. Alongside with our Cycle-CenterNet, we also present a large-scale dataset, named Wired Table in the Wild (WTW), which includes well-annotated structure parsing of multiple style tables in several scenes like the photo, scanning files, web pages, \emph{etc.}. In experiments, we demonstrate that our Cycle-CenterNet consistently achieves the best accuracy of table structure parsing on the new WTW dataset by 24.6\% absolute improvement evaluated by the TEDS metric. A more comprehensive experimental analysis also validates the advantages of our proposed methods for the TSP task.

* Accepted to ICCV 2021

Via

Access Paper or Ask Questions

Artificial Intelligence in Drug Discovery: Applications and Techniques

Jun 11, 2021

Jianyuan Deng, Zhibo Yang, Dimitris Samaras, Fusheng Wang

Figure 1 for Artificial Intelligence in Drug Discovery: Applications and Techniques

Figure 2 for Artificial Intelligence in Drug Discovery: Applications and Techniques

Figure 3 for Artificial Intelligence in Drug Discovery: Applications and Techniques

Figure 4 for Artificial Intelligence in Drug Discovery: Applications and Techniques

Abstract:Artificial intelligence (AI) has been transforming the practice of drug discovery in the past decade. Various AI techniques have been used in a wide range of applications, such as virtual screening and drug design. In this perspective, we first give an overview on drug discovery and discuss related applications, which can be reduced to two major tasks, i.e., molecular property prediction and molecule generation. We then discuss common data resources, molecule representations and benchmark platforms. Furthermore, to summarize the progress in AI-driven drug discovery, we present the relevant AI techniques including model architectures and learning paradigms in the surveyed papers. We expect that the perspective will serve as a guide for researchers who are interested in working at this intersected area of artificial intelligence and drug discovery. We also provide a GitHub repository\footnote{\url{https://github.com/dengjianyuan/Survey_AI_Drug_Discovery}} with the collection of papers and codes, if applicable, as a learning resource, which will be regularly updated.

* Minor text revisions

Via

Access Paper or Ask Questions

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

May 09, 2021

Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

Figure 1 for PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

Figure 2 for PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

Figure 3 for PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

Figure 4 for PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

Abstract:Scene text detection and recognition have been well explored in the past few years. Despite the progress, efficient and accurate end-to-end spotting of arbitrarily-shaped text remains challenging. In this work, we propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels. By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text. Moreover, as a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications. Taking the advantages of the kernel representation, we design a series of components as follows: 1) a computationally efficient feature enhancement network composed of stacked Feature Pyramid Enhancement Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel Aggregation (PA); and 3) an efficient attention-based recognition head with Masked RoI. Benefiting from the kernel representation and the tailored components, our method achieves high inference speed while maintaining competitive accuracy. Extensive experiments show the superiority of our method. For example, the proposed PAN++ achieves an end-to-end text spotting F-measure of 64.9 at 29.2 FPS on the Total-Text dataset, which significantly outperforms the previous best method. Code will be available at: https://git.io/PAN.

* Accepted to TPAMI 2021

Via

Access Paper or Ask Questions

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

Apr 05, 2021

Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai

Figure 1 for MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

Figure 2 for MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

Figure 3 for MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

Figure 4 for MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

Abstract:Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios. However, they might still fall short when handling text instances of extreme aspect ratios and varying scales. To tackle such difficulties, we propose in this paper a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization. Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features based on initial raw detections; a Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to selectively concentrate on reliable raw detections and exclude unreliable ones; besides, we propose an Instance-wise IoU loss for balanced training to deal with text instances of different scales. An extensive ablation study demonstrates the effectiveness and superiority of the proposed strategies. The resulting text detection system, which integrates the proposed strategies with a leading scene text detector EAST, achieves state-of-the-art or competitive performance on various standard benchmarks for text detection while keeping a fast running speed.

* Accepted by CVPR21

Via

Access Paper or Ask Questions

Hierarchical Proxy-based Loss for Deep Metric Learning

Mar 25, 2021

Zhibo Yang, Muhammet Bastan, Xinliang Zhu, Doug Gray, Dimitris Samaras

Figure 1 for Hierarchical Proxy-based Loss for Deep Metric Learning

Figure 2 for Hierarchical Proxy-based Loss for Deep Metric Learning

Figure 3 for Hierarchical Proxy-based Loss for Deep Metric Learning

Figure 4 for Hierarchical Proxy-based Loss for Deep Metric Learning

Abstract:Proxy-based metric learning losses are superior to pair-based losses due to their fast convergence and low training complexity. However, existing proxy-based losses focus on learning class-discriminative features while overlooking the commonalities shared across classes which are potentially useful in describing and matching samples. Moreover, they ignore the implicit hierarchy of categories in real-world datasets, where similar subordinate classes can be grouped together. In this paper, we present a framework that leverages this implicit hierarchy by imposing a hierarchical structure on the proxies and can be used with any existing proxy-based loss. This allows our model to capture both class-discriminative features and class-shared characteristics without breaking the implicit data hierarchy. We evaluate our method on five established image retrieval datasets such as In-Shop and SOP. Results demonstrate that our hierarchical proxy-based loss framework improves the performance of existing proxy-based losses, especially on large datasets which exhibit strong hierarchical structure.

Via

Access Paper or Ask Questions