Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaodan Liang

SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving

Jun 22, 2021

Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu

Figure 1 for SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving

Figure 2 for SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving

Figure 3 for SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving

Figure 4 for SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving

Abstract:Aiming at facilitating a real-world, ever-evolving and scalable autonomous driving system, we present a large-scale benchmark for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data, which is the first and largest benchmark to date. Existing autonomous driving systems heavily rely on `perfect' visual perception models (e.g., detection) trained using extensive annotated data to ensure the safety. However, it is unrealistic to elaborately label instances of all scenarios and circumstances (e.g., night, extreme weather, cities) when deploying a robust autonomous driving system. Motivated by recent powerful advances of self-supervised and semi-supervised learning, a promising direction is to learn a robust detection model by collaboratively exploiting large-scale unlabeled data and few labeled data. Existing dataset (e.g., KITTI, Waymo) either provides only a small amount of data or covers limited domains with full annotation, hindering the exploration of large-scale pre-trained models. Here, we release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes. We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models. The data and more up-to-date information have been released at https://soda-2d.github.io.

Via

Access Paper or Ask Questions

One Million Scenes for Autonomous Driving: ONCE Dataset

Jun 21, 2021

Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu(+2 more)

Figure 1 for One Million Scenes for Autonomous Driving: ONCE Dataset

Figure 2 for One Million Scenes for Autonomous Driving: ONCE Dataset

Figure 3 for One Million Scenes for Autonomous Driving: ONCE Dataset

Figure 4 for One Million Scenes for Autonomous Driving: ONCE Dataset

Abstract:Current perception models in autonomous driving have become notorious for greatly relying on a mass of annotated data to cover unseen cases and address the long-tail problem. On the other hand, learning from unlabeled large-scale collected data and incrementally self-training powerful recognition models have received increasing attention and may become the solutions of next-generation industry-level powerful and robust perception models in autonomous driving. However, the research community generally suffered from data inadequacy of those essential real-world scene data, which hampers the future exploration of fully/semi/self-supervised methods for 3D perception. In this paper, we introduce the ONCE (One millioN sCenEs) dataset for 3D object detection in the autonomous driving scenario. The ONCE dataset consists of 1 million LiDAR scenes and 7 million corresponding camera images. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available (e.g. nuScenes and Waymo), and it is collected across a range of different areas, periods and weather conditions. To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset. We conduct extensive analyses on those methods and provide valuable observations on their performance related to the scale of used data. Data, code, and more information are available at https://once-for-auto-driving.github.io/index.html.

Via

Access Paper or Ask Questions

Prototypical Graph Contrastive Learning

Jun 17, 2021

Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang

Figure 1 for Prototypical Graph Contrastive Learning

Figure 2 for Prototypical Graph Contrastive Learning

Figure 3 for Prototypical Graph Contrastive Learning

Figure 4 for Prototypical Graph Contrastive Learning

Abstract:Graph-level representations are critical in various real-world applications, such as predicting the properties of molecules. But in practice, precise graph annotations are generally very expensive and time-consuming. To address this issue, graph contrastive learning constructs instance discrimination task which pulls together positive pairs (augmentation pairs of the same graph) and pushes away negative pairs (augmentation pairs of different graphs) for unsupervised representation learning. However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i.e., the negatives likely having the same semantic structure with the query, leading to performance degradation. To mitigate this sampling bias issue, in this paper, we propose a Prototypical Graph Contrastive Learning (PGCL) approach. Specifically, PGCL models the underlying semantic structure of the graph data via clustering semantically similar graphs into the same group, and simultaneously encourages the clustering consistency for different augmentations of the same graph. Then given a query, it performs negative sampling via drawing the graphs from those clusters that differ from the cluster of query, which ensures the semantic difference between query and its negative samples. Moreover, for a query, PGCL further reweights its negative samples based on the distance between their prototypes (cluster centroids) and the query prototype such that those negatives having moderate prototype distance enjoy relatively large weights. This reweighting strategy is proved to be more effective than uniform sampling. Experimental results on various graph benchmarks testify the advantages of our PGCL over state-of-the-art methods.

Via

Access Paper or Ask Questions

Vision-Language Navigation with Random Environmental Mixup

Jun 15, 2021

Chong Liu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang, Yi-Dong Shen

Figure 1 for Vision-Language Navigation with Random Environmental Mixup

Figure 2 for Vision-Language Navigation with Random Environmental Mixup

Figure 3 for Vision-Language Navigation with Random Environmental Mixup

Figure 4 for Vision-Language Navigation with Random Environmental Mixup

Abstract:Vision-language Navigation (VLN) tasks require an agent to navigate step-by-step while perceiving the visual observations and comprehending a natural language instruction. Large data bias, which is caused by the disparity ratio between the small data scale and large navigation space, makes the VLN task challenging. Previous works have proposed various data augmentation methods to reduce data bias. However, these works do not explicitly reduce the data bias across different house scenes. Therefore, the agent would overfit to the seen scenes and achieve poor navigation performance in the unseen scenes. To tackle this problem, we propose the Random Environmental Mixup (REM) method, which generates cross-connected house scenes as augmented data via mixuping environment. Specifically, we first select key viewpoints according to the room connection graph for each scene. Then, we cross-connect the key views of different scenes to construct augmented scenes. Finally, we generate augmented instruction-path pairs in the cross-connected scenes. The experimental results on benchmark datasets demonstrate that our augmentation data via REM help the agent reduce its performance gap between the seen and unseen environment and improve the overall performance, making our model the best existing approach on the standard VLN benchmark.

Via

Access Paper or Ask Questions

GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

Jun 08, 2021

Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, Liang Lin

Figure 1 for GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

Figure 2 for GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

Figure 3 for GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

Figure 4 for GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

Abstract:Automatic math problem solving has recently attracted increasing attention as a long-standing AI benchmark. In this paper, we focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge. However, the existing methods were highly dependent on handcraft rules and were merely evaluated on small-scale datasets. Therefore, we propose a Geometric Question Answering dataset GeoQA, containing 5,010 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems. Compared with another publicly available dataset GeoS, GeoQA is 25 times larger, in which the program annotations can provide a practical testbed for future research on explicit and explainable numerical reasoning. Moreover, we introduce a Neural Geometric Solver (NGS) to address geometric problems by comprehensively parsing multimodal information and generating interpretable programs. We further add multiple self-supervised auxiliary tasks on NGS to enhance cross-modal semantic representation. Extensive experiments on GeoQA validate the effectiveness of our proposed NGS and auxiliary tasks. However, the results are still significantly lower than human performance, which leaves large room for future research. Our benchmark and code are released at https://github.com/chen-judge/GeoQA .

* Accepted to Findings of ACL 2021

Via

Access Paper or Ask Questions

Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

Jun 01, 2021

Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, Song-Chun Zhu

Figure 1 for Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

Figure 2 for Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

Figure 3 for Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

Figure 4 for Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

Abstract:Geometry problem solving has attracted much attention in the NLP community recently. The task is challenging as it requires abstract problem understanding and symbolic reasoning with axiomatic knowledge. However, current datasets are either small in scale or not publicly available. Thus, we construct a new large-scale benchmark, Geometry3K, consisting of 3,002 geometry problems with dense annotation in formal language. We further propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (Inter-GPS). Inter-GPS first parses the problem text and diagram into formal language automatically via rule-based text parsing and neural object detecting, respectively. Unlike implicit learning in existing methods, Inter-GPS incorporates theorem knowledge as conditional rules and performs symbolic reasoning step by step. Also, a theorem predictor is designed to infer the theorem application sequence fed to the symbolic solver for the more efficient and reasonable searching path. Extensive experiments on the Geometry3K and GEOS datasets demonstrate that Inter-GPS achieves significant improvements over existing methods. The project with code and data is available at https://lupantech.github.io/inter-gps.

* Accepted to ACL 2021, 13 pages, 6 figures

Via

Access Paper or Ask Questions

Towards Quantifiable Dialogue Coherence Evaluation

Jun 01, 2021

Zheng Ye, Liucun Lu, Lishan Huang, Liang Lin, Xiaodan Liang

Figure 1 for Towards Quantifiable Dialogue Coherence Evaluation

Figure 2 for Towards Quantifiable Dialogue Coherence Evaluation

Figure 3 for Towards Quantifiable Dialogue Coherence Evaluation

Figure 4 for Towards Quantifiable Dialogue Coherence Evaluation

Abstract:Automatic dialogue coherence evaluation has attracted increasing attention and is crucial for developing promising dialogue systems. However, existing metrics have two major limitations: (a) they are mostly trained in a simplified two-level setting (coherent vs. incoherent), while humans give Likert-type multi-level coherence scores, dubbed as "quantifiable"; (b) their predicted coherence scores cannot align with the actual human rating standards due to the absence of human guidance during training. To address these limitations, we propose Quantifiable Dialogue Coherence Evaluation (QuantiDCE), a novel framework aiming to train a quantifiable dialogue coherence metric that can reflect the actual human rating standards. Specifically, QuantiDCE includes two training stages, Multi-Level Ranking (MLR) pre-training and Knowledge Distillation (KD) fine-tuning. During MLR pre-training, a new MLR loss is proposed for enabling the model to learn the coarse judgement of coherence degrees. Then, during KD fine-tuning, the pretrained model is further finetuned to learn the actual human rating standards with only very few human-annotated data. To advocate the generalizability even with limited fine-tuning data, a novel KD regularization is introduced to retain the knowledge learned at the pre-training stage. Experimental results show that the model trained by QuantiDCE presents stronger correlations with human judgements than the other state-of-the-art metrics.

* Long paper; ACL2021

Via

Access Paper or Ask Questions

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

May 25, 2021

Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

Figure 1 for TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

Figure 2 for TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

Figure 3 for TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

Figure 4 for TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

Abstract:Recent breakthroughs of Neural Architecture Search (NAS) extend the field's research scope towards a broader range of vision tasks and more diversified search spaces. While existing NAS methods mostly design architectures on a single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks. Many of them leverage transfer learning and seek to preserve, reuse, and refine network design knowledge to achieve higher efficiency in future tasks. However, the enormous computational cost and experiment complexity of cross-task NAS are imposing barriers for valuable research in this direction. Existing NAS benchmarks all focus on one type of vision task, i.e., classification. In this work, we propose TransNAS-Bench-101, a benchmark dataset containing network performance across seven tasks, covering classification, regression, pixel-level prediction, and self-supervised tasks. This diversity provides opportunities to transfer NAS methods among tasks and allows for more complex transfer schemes to evolve. We explore two fundamentally different types of search space: cell-level search space and macro-level search space. With 7,352 backbones evaluated on seven tasks, 51,464 trained models with detailed training information are provided. With TransNAS-Bench-101, we hope to encourage the advent of exceptional NAS algorithms that raise cross-task search efficiency and generalizability to the next level. Our dataset file will be available at Mindspore, VEGA.

* Published at CVPR 2021. 8 pages main paper, 13 pages in total

Via

Access Paper or Ask Questions

DAGN: Discourse-Aware Graph Network for Logical Reasoning

Apr 08, 2021

Yinya Huang, Meng Fang, Yu Cao, Liwei Wang, Xiaodan Liang

Figure 1 for DAGN: Discourse-Aware Graph Network for Logical Reasoning

Figure 2 for DAGN: Discourse-Aware Graph Network for Logical Reasoning

Figure 3 for DAGN: Discourse-Aware Graph Network for Logical Reasoning

Figure 4 for DAGN: Discourse-Aware Graph Network for Logical Reasoning

Abstract:Recent QA with logical reasoning questions requires passage-level relations among the sentences. However, current approaches still focus on sentence-level relations interacting among tokens. In this work, we explore aggregating passage-level clues for solving logical reasoning QA by using discourse-based information. We propose a discourse-aware graph network (DAGN) that reasons relying on the discourse structure of the texts. The model encodes discourse information as a graph with elementary discourse units (EDUs) and discourse relations, and learns the discourse-aware features via a graph network for downstream QA tasks. Experiments are conducted on two logical reasoning QA datasets, ReClor and LogiQA, and our proposed DAGN achieves competitive results. The source code is available at https://github.com/Eleanor-H/DAGN.

* Accepted by NAACL 2021

Via

Access Paper or Ask Questions

SOON: Scenario Oriented Object Navigation with Graph-based Exploration

Mar 31, 2021

Fengda Zhu, Xiwen Liang, Yi Zhu, Xiaojun Chang, Xiaodan Liang

Figure 1 for SOON: Scenario Oriented Object Navigation with Graph-based Exploration

Figure 2 for SOON: Scenario Oriented Object Navigation with Graph-based Exploration

Figure 3 for SOON: Scenario Oriented Object Navigation with Graph-based Exploration

Figure 4 for SOON: Scenario Oriented Object Navigation with Graph-based Exploration

Abstract:The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots. Most visual navigation benchmarks, however, focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step. This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere. Accordingly, in this paper, we introduce a Scenario Oriented Object Navigation (SOON) task. In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description. To give a promising direction to solve this task, we propose a novel graph-based exploration (GBE) method, which models the navigation state as a graph and introduces a novel graph-based exploration approach to learn knowledge from the graph and stabilize training by learning sub-optimal trajectories. We also propose a new large-scale benchmark named From Anywhere to Object (FAO) dataset. To avoid target ambiguity, the descriptions in FAO provide rich semantic scene information includes: object attribute, object relationship, region description, and nearby region description. Our experiments reveal that the proposed GBE outperforms various state-of-the-arts on both FAO and R2R datasets. And the ablation studies on FAO validates the quality of the dataset.

Via

Access Paper or Ask Questions