Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jian Sun

S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers

Mar 14, 2022
Binyuan Hui, Ruiying Geng, Lihan Wang, Bowen Qin, Bowen Li, Jian Sun, Yongbin Li

Figure 1 for S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers

Figure 2 for S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers

Figure 3 for S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers

Figure 4 for S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers

The task of converting a natural language question into an executable SQL query, known as text-to-SQL, is an important branch of semantic parsing. The state-of-the-art graph-based encoder has been successfully used in this task but does not model the question syntax well. In this paper, we propose S$^2$SQL, injecting Syntax to question-Schema graph encoder for Text-to-SQL parsers, which effectively leverages the syntactic dependency information of questions in text-to-SQL to improve the performance. We also employ the decoupling constraint to induce diverse relational edge embedding, which further improves the network's performance. Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used, resulting in a performance ranks first on the Spider leaderboard.

* Accepted at ACL 2022 Findings

Via

Access Paper or Ask Questions

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

Mar 13, 2022
Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun

Figure 1 for Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

Figure 2 for Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

Figure 3 for Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

Figure 4 for Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

In this paper we revisit large kernel design in modern convolutional neural networks (CNNs), which is often neglected in the past few years. Inspired by recent advances of vision transformers (ViTs), we point out that using a few large kernels instead of a stack of small convolutions could be a more powerful paradigm. We therefore summarize 5 guidelines, e.g., applying re-parameterized large depth-wise convolutions, to design efficient high-performance large-kernel CNNs. Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31. RepLKNet greatly bridges the performance gap between CNNs and ViTs, e.g., achieving comparable or better results than Swin Transformer on ImageNet and downstream tasks, while the latency of RepLKNet is much lower. Moreover, RepLKNet also shows feasible scalability to big data and large models, obtaining 87.8% top-1 accuracy on ImageNet and 56.0%} mIoU on ADE20K. At last, our study further suggests large-kernel CNNs share several nice properties with ViTs, e.g., much larger effective receptive fields than conventional CNNs, and higher shape bias rather than texture bias. Code & models at https://github.com/megvii-research/RepLKNet.

* Accepted by CVPR 2022

Via

Access Paper or Ask Questions

PETR: Position Embedding Transformation for Multi-View 3D Object Detection

Mar 10, 2022
Yingfei Liu, Tiancai Wang, Xiangyu Zhang, Jian Sun

Figure 1 for PETR: Position Embedding Transformation for Multi-View 3D Object Detection

Figure 2 for PETR: Position Embedding Transformation for Multi-View 3D Object Detection

Figure 3 for PETR: Position Embedding Transformation for Multi-View 3D Object Detection

Figure 4 for PETR: Position Embedding Transformation for Multi-View 3D Object Detection

In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark. It can serve as a simple yet strong baseline for future research.

* Tech Report

Via

Access Paper or Ask Questions

Towards Self-Supervised Category-Level Object Pose and Size Estimation

Mar 06, 2022
Yisheng He, Haoqiang Fan, Haibin Huang, Qifeng Chen, Jian Sun

Figure 1 for Towards Self-Supervised Category-Level Object Pose and Size Estimation

Figure 2 for Towards Self-Supervised Category-Level Object Pose and Size Estimation

Figure 3 for Towards Self-Supervised Category-Level Object Pose and Size Estimation

Figure 4 for Towards Self-Supervised Category-Level Object Pose and Size Estimation

This work presents a self-supervised framework for category-level object pose and size estimation from a single depth image. Unlike previous works that rely on time-consuming and labor-intensive ground truth pose labels for supervision, we leverage the geometric consistency residing in point clouds of the same shape for self-supervision. Specifically, given a normalized category template mesh in the object-coordinate system and the partially observed object instance in the scene, our key idea is to apply differentiable shape deformation, registration, and rendering to enforce geometric consistency between the predicted and the observed scene object point cloud. We evaluate our approach on real-world datasets and find that our approach outperforms the simple traditional baseline by large margins while being competitive with some fully-supervised approaches.

Via

Access Paper or Ask Questions

Relieving Long-tailed Instance Segmentation via Pairwise Class Balance

Jan 08, 2022
Yin-Yin He, Peizhen Zhang, Xiu-Shen Wei, Xiangyu Zhang, Jian Sun

Figure 1 for Relieving Long-tailed Instance Segmentation via Pairwise Class Balance

Figure 2 for Relieving Long-tailed Instance Segmentation via Pairwise Class Balance

Figure 3 for Relieving Long-tailed Instance Segmentation via Pairwise Class Balance

Figure 4 for Relieving Long-tailed Instance Segmentation via Pairwise Class Balance

Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes. It causes severe biases of the head classes (with majority samples) against the tailed ones. This renders "how to appropriately define and alleviate the bias" one of the most important issues. Prior works mainly use label distribution or mean score information to indicate a coarse-grained bias. In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one. To this end, we propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences. PCB generates fightback soft labels for regularization during training. Besides, an iterative learning paradigm is developed to support a progressive and smooth regularization in such debiasing. PCB can be plugged and played to any existing method as a complement. Experimental results on LVIS demonstrate that our method achieves state-of-the-art performance without bells and whistles. Superior results across various architectures show the generalization ability.

Via

Access Paper or Ask Questions

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Dec 27, 2021
Wanwei He, Yinpei Dai, Yinhe Zheng, Yuchuan Wu, Zheng Cao, Dermot Liu, Peng Jiang, Min Yang, Fei Huang, Luo Si, Jian Sun, Yongbin Li

Figure 1 for GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Figure 2 for GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Figure 3 for GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Figure 4 for GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2.0 and MultiWOZ2.1, improving their end-to-end combined scores by 2.5, 5.3 and 5.5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings.

* 7 pages, 5 figures. Accepted by AAAI 2022

Via

Access Paper or Ask Questions

Learning to Select the Next Reasonable Mention for Entity Linking

Dec 08, 2021
Jian Sun, Yu Zhou, Chengqing Zong

Figure 1 for Learning to Select the Next Reasonable Mention for Entity Linking

Figure 2 for Learning to Select the Next Reasonable Mention for Entity Linking

Figure 3 for Learning to Select the Next Reasonable Mention for Entity Linking

Figure 4 for Learning to Select the Next Reasonable Mention for Entity Linking

Entity linking aims to establish a link between entity mentions in a document and the corresponding entities in knowledge graphs (KGs). Previous work has shown the effectiveness of global coherence for entity linking. However, most of the existing global linking methods based on sequential decisions focus on how to utilize previously linked entities to enhance the later decisions. In those methods, the order of mention is fixed, making the model unable to adjust the subsequent linking targets according to the previously linked results, which will cause the previous information to be unreasonably utilized. To address the problem, we propose a novel model, called DyMen, to dynamically adjust the subsequent linking target based on the previously linked entities via reinforcement learning, enabling the model to select a link target that can fully use previously linked information. We sample mention by sliding window to reduce the action sampling space of reinforcement learning and maintain the semantic coherence of mention. Experiments conducted on several benchmark datasets have shown the effectiveness of the proposed model.

* Accepted to AAAI-2022 Workshop on Knowledge Discovery from Unstructured Data in Financial Services

Via

Access Paper or Ask Questions

XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

Nov 21, 2021
Jian Sun, Ali Pourramezan Fard, Mohammad H. Mahoor

Figure 1 for XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

Figure 2 for XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

Figure 3 for XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

Figure 4 for XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

Although Capsule Networks show great abilities in defining the position relationship between features in deep neural networks for visual recognition tasks, they are computationally expensive and not suitable for running on mobile devices. The bottleneck is in the computational complexity of the Dynamic Routing mechanism used between capsules. On the other hand, neural networks such as XNOR-Net are fast and computationally efficient but have relatively low accuracy because of their information loss in the binarization process. This paper proposes a new class of Fully Connected (FC) Layers by xnorizing the linear projector outside or inside the Dynamic Routing within the CapsFC layer. Specifically, our proposed FC layers have two versions, XnODR (Xnorizing Linear Projector Outside Dynamic Routing) and XnIDR (Xnorizing Linear Projector Inside Dynamic Routing). To test their generalization, we insert them into MobileNet V2 and ResNet-50 separately. Experiments on three datasets, MNIST, CIFAR-10, MultiMNIST validate their effectiveness. Our experimental results demonstrate that both XnODR and XnIDR help networks to have high accuracy with lower FLOPs and fewer parameters (e.g., 95.32\% accuracy with 2.99M parameters and 311.22M FLOPs on CIFAR-10).

* 12 pages, 2 figures, 3 tables

Via

Access Paper or Ask Questions