Alert button
Picture for Yin Tian

Yin Tian

Alert button

CLUE: A Chinese Language Understanding Evaluation Benchmark

Apr 14, 2020
Liang Xu, Xuanwei Zhang, Lu Li, Hai Hu, Chenjie Cao, Weitang Liu, Junyi Li, Yudong Li, Kai Sun, Yechen Xu, Yiming Cui, Cong Yu, Qianqian Dong, Yin Tian, Dian Yu, Bo Shi, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Zhenzhong Lan

Figure 1 for CLUE: A Chinese Language Understanding Evaluation Benchmark
Figure 2 for CLUE: A Chinese Language Understanding Evaluation Benchmark
Figure 3 for CLUE: A Chinese Language Understanding Evaluation Benchmark
Figure 4 for CLUE: A Chinese Language Understanding Evaluation Benchmark

We introduce CLUE, a Chinese Language Understanding Evaluation benchmark. It contains eight different tasks, including single-sentence classification, sentence pair classification, and machine reading comprehension. We evaluate CLUE on a number of existing full-network pre-trained models for Chinese. We also include a small hand-crafted diagnostic test set designed to probe specific linguistic phenomena using different models, some of which are unique to Chinese. Along with CLUE, we release a large clean crawled raw text corpus that can be used for model pre-training. We release CLUE, baselines and pre-training dataset on Github.

* 9 pages, 4 figures 
Viaarxiv icon

CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese

Jan 20, 2020
Liang Xu, Yu tong, Qianqian Dong, Yixuan Liao, Cong Yu, Yin Tian, Weitang Liu, Lu Li, Caiquan Liu, Xuanwei Zhang

Figure 1 for CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
Figure 2 for CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
Figure 3 for CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
Figure 4 for CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese

In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for named entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization, and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labeling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines, and leader-board.

* 6 pages, 5 tables, 1 figure 
Viaarxiv icon

CLUENER2020: Fine-grained Name Entity Recognition for Chinese

Jan 13, 2020
Liang Xu, Qianqian Dong, Cong Yu, Yin Tian, Weitang Liu, Lu Li, Xuanwei Zhang

Figure 1 for CLUENER2020: Fine-grained Name Entity Recognition for Chinese
Figure 2 for CLUENER2020: Fine-grained Name Entity Recognition for Chinese
Figure 3 for CLUENER2020: Fine-grained Name Entity Recognition for Chinese
Figure 4 for CLUENER2020: Fine-grained Name Entity Recognition for Chinese

In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for name entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labelling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines and leader-board.

* 6 pages, 5 tables, 1 figure 
Viaarxiv icon