In theoretical machine learning, the statistical complexity is a notion that measures the richness of a hypothesis space. In this work, we apply a particular measure of statistical complexity, namely the Rademacher complexity, to the quantum circuit model in quantum computation and study how the statistical complexity depends on various quantum circuit parameters. In particular, we investigate the dependence of the statistical complexity on the resources, depth, width, and the number of input and output registers of a quantum circuit. To study how the statistical complexity scales with resources in the circuit, we introduce a resource measure of magic based on the $(p,q)$ group norm, which quantifies the amount of magic in the quantum channels associated with the circuit. These dependencies are investigated in the following two settings: (i) where the entire quantum circuit is treated as a single quantum channel, and (ii) where each layer of the quantum circuit is treated as a separate quantum channel. The bounds we obtain can be used to constrain the capacity of quantum neural networks in terms of their depths and widths as well as the resources in the network.
The model prediction of the Gaussian process (GP) regression can be significantly biased when the data are contaminated by outliers. We propose a new robust GP regression algorithm that iteratively trims a portion of the data points with the largest deviation from the predicted mean. While the new algorithm retains the attractive properties of the standard GP as a nonparametric and flexible regression method, it can significantly reduce the influence of outliers even in some extreme cases. It is also easier to implement than previous robust GP variants that rely on approximate inference. Applied to various synthetic datasets with contaminations, the proposed method outperforms the standard GP and the popular robust GP variant with the Student's t likelihood, especially when the outlier fraction is high. Lastly, as a practical example in the astrophysical study, we show that this method can determine the main-sequence ridge line precisely in the color-magnitude diagram of star clusters.
Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for most of the world's languages. In this paper, we present the first large-scale NLI dataset (consisting of ~56,000 annotated sentence pairs) for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). Unlike recent attempts at extending NLI to other languages, our dataset does not rely on any automatic translation or non-expert annotation. Instead, we elicit annotations from native speakers specializing in linguistics. We follow closely the annotation protocol used for MNLI, but create new strategies for eliciting diverse hypotheses. We establish several baseline results on our dataset using state-of-the-art pre-trained models for Chinese, and find even the best performing models to be far outpaced by human performance (~12% absolute performance gap), making it a challenging new resource that we hope will help to accelerate progress in Chinese NLU. To the best of our knowledge, this is the first human-elicited MNLI-style corpus for a non-English language.
We introduce CLUE, a Chinese Language Understanding Evaluation benchmark. It contains eight different tasks, including single-sentence classification, sentence pair classification, and machine reading comprehension. We evaluate CLUE on a number of existing full-network pre-trained models for Chinese. We also include a small hand-crafted diagnostic test set designed to probe specific linguistic phenomena using different models, some of which are unique to Chinese. Along with CLUE, we release a large clean crawled raw text corpus that can be used for model pre-training. We release CLUE, baselines and pre-training dataset on Github.
In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for named entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization, and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labeling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines, and leader-board.
In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for name entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labelling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines and leader-board.
Automatic dialogue evaluation plays a crucial role in open-domain dialogue research. Previous works train neural networks with limited annotation for conducting automatic dialogue evaluation, which would naturally affect the evaluation fairness as dialogue systems close to the scope of training corpus would have more preference than the other ones. In this paper, we study alleviating this problem from the perspective of continual learning: given an existing neural dialogue evaluator and the next system to be evaluated, we fine-tune the learned neural evaluator by selectively forgetting/updating its parameters, to jointly fit dialogue systems have been and will be evaluated. Our motivation is to seek for a lifelong and low-cost automatic evaluation for dialogue systems, rather than to reconstruct the evaluator over and over again. Experimental results show that our continual evaluator achieves comparable performance with reconstructing new evaluators, while requires significantly lower resources.