Most existing methods of Out-of-Domain (OOD) intent classification, which rely on extensive auxiliary OOD corpora or specific training paradigms, are underdeveloped in the underlying principle that the models should have differentiated confidence in In- and Out-of-domain intent. In this work, we demonstrate that calibrated subnetworks can be uncovered by pruning the (poor-calibrated) overparameterized model. Calibrated confidence provided by the subnetwork can better distinguish In- and Out-of-domain. Furthermore, we theoretically bring new insights into why temperature scaling can differentiate In- and Out-of-Domain intent and empirically extend the Lottery Ticket Hypothesis to the open-world setting. Extensive experiments on three real-world datasets demonstrate our approach can establish consistent improvements compared with a suite of competitive baselines.
Transformers have made progress in miscellaneous tasks, but suffer from quadratic computational and memory complexities. Recent works propose sparse Transformers with attention on sparse graphs to reduce complexity and remain strong performance. While effective, the crucial parts of how dense a graph needs to be to perform well are not fully explored. In this paper, we propose Normalized Information Payload (NIP), a graph scoring function measuring information transfer on graph, which provides an analysis tool for trade-offs between performance and complexity. Guided by this theoretical analysis, we present Hypercube Transformer, a sparse Transformer that models token interactions in a hypercube and shows comparable or even better results with vanilla Transformer while yielding $O(N\log N)$ complexity with sequence length $N$. Experiments on tasks requiring various sequence lengths lay validation for our graph function well.
As a simple technique to accelerate inference of large-scale pre-trained models, early exiting has gained much attention in the NLP community. It allows samples to exit early at internal classifiers without passing through the entire model. Most existing work usually trains the internal classifiers independently and employs an exiting strategy to decide whether or not to exit based on the confidence of the current internal classifier. However, none of these works takes full advantage of the fact that the internal classifiers are trained to solve the same task therefore can be used to construct an ensemble. In this paper, we show that a novel objective function for the training of the ensemble internal classifiers can be naturally induced from the perspective of ensemble learning and information theory. The proposed training objective consists of two terms: one for accuracy and the other for the diversity of the internal classifiers. In contrast, the objective used in prior work is exactly the accuracy term of our training objective therefore only optimizes the accuracy but not diversity. Further, we propose a simple voting-based strategy that considers predictions of all the past internal classifiers to infer the correct label and decide whether to exit. Experimental results on various NLP tasks show that our proposed objective function and voting-based strategy can achieve better accuracy-speed trade-offs.