We present a Chinese judicial reading comprehension (CJRC) dataset which contains approximately 10K documents and almost 50K questions with answers. The documents come from judgment documents and the questions are annotated by law experts. The CJRC dataset can help researchers extract elements by reading comprehension technology. Element extraction is an important task in the legal field. However, it is difficult to predefine the element types completely due to the diversity of document types and causes of action. By contrast, machine reading comprehension technology can quickly extract elements by answering various questions from the long document. We build two strong baseline models based on BERT and BiDAF. The experimental results show that there is enough space for improvement compared to human annotators.
We consider the importance of different utterances in the context for selecting the response usually depends on the current query. In this paper, we propose the model TripleNet to fully model the task with the triple <context, query, response> instead of <context, response> in previous works. The heart of TripleNet is a novel attention mechanism named triple attention to model the relationships within the triple at four levels. The new mechanism updates the representation for each element based on the attention with the other two concurrently and symmetrically. We match the triple <C, Q, R> centered on the response from char to context level for prediction. Experimental results on two large-scale multi-turn response selection datasets show that the proposed model can significantly outperform the state-of-the-art methods. TripleNet source code is available at https://github.com/wtma/TripleNet
Machine Reading Comprehension (MRC) with multiple-choice questions requires the machine to read given passage and select the correct answer among several candidates. In this paper, we propose a novel approach called Convolutional Spatial Attention (CSA) model which can better handle the MRC with multiple-choice questions. The proposed model could fully extract the mutual information among the passage, question, and the candidates, to form the enriched representations. Furthermore, to merge various attention results, we propose to use convolutional operation to dynamically summarize the attention values within the different size of regions. Experimental results show that the proposed model could give substantial improvements over various state-of-the-art systems on both RACE and SemEval-2018 Task11 datasets.
Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attention. However, the existing reading comprehension datasets are mostly in English. In this paper, we introduce a Span-Extraction dataset for Chinese Machine Reading Comprehension to add language diversities in this area. The dataset is composed by near 20,000 real questions annotated by human on Wikipedia paragraphs. We also annotated a challenge set which contains the questions that need comprehensive understanding and multi-sentence inference throughout the context. With the release of the dataset, we hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018). We hope the release of the dataset could further accelerate the machine reading comprehension research in Chinese language. The data is available through: https://github.com/ymcui/cmrc2018
This paper describes the system which got the state-of-the-art results at SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge. In this paper, we present a neural network called Hybrid Multi-Aspects (HMA) model, which mimic the human's intuitions on dealing with the multiple-choice reading comprehension. In this model, we aim to produce the predictions in multiple aspects by calculating attention among the text, question and choices, and combine these results for final predictions. Experimental results show that our HMA model could give substantial improvements over the baseline system and got the first place on the final test set leaderboard with the accuracy of 84.13%.
Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attention. However, existing reading comprehension datasets are mostly in English. To add diversity in reading comprehension datasets, in this paper we propose a new Chinese reading comprehension dataset for accelerating related research in the community. The proposed dataset contains two different types: cloze-style reading comprehension and user query reading comprehension, associated with large-scale training data as well as human-annotated validation and hidden test set. Along with this dataset, we also hosted the first Evaluation on Chinese Machine Reading Comprehension (CMRC-2017) and successfully attracted tens of participants, which suggest the potential impact of this dataset.
This paper proposed a bias-compensated normalized maximum correntropy criterion (BCNMCC) algorithm charactered by its low steady-state misalignment for system identification with noisy input in an impulsive output noise environment. The normalized maximum correntropy criterion (NMCC) is derived from a correntropy based cost function, which is rather robust with respect to impulsive noises. To deal with the noisy input, we introduce a bias-compensated vector (BCV) to the NMCC algorithm, and then an unbiasedness criterion and some reasonable assumptions are used to compute the BCV. Taking advantage of the BCV, the bias caused by the input noise can be effectively suppressed. System identification simulation results demonstrate that the proposed BCNMCC algorithm can outperform other related algorithms with noisy input especially in an impulsive output noise environment.
Robust diffusion adaptive estimation algorithms based on the maximum correntropy criterion (MCC), including adaptation to combination MCC and combination to adaptation MCC, are developed to deal with the distributed estimation over network in impulsive (long-tailed) noise environments. The cost functions used in distributed estimation are in general based on the mean square error (MSE) criterion, which is desirable when the measurement noise is Gaussian. In non-Gaussian situations, such as the impulsive-noise case, MCC based methods may achieve much better performance than the MSE methods as they take into account higher order statistics of error distribution. The proposed methods can also outperform the robust diffusion least mean p-power(DLMP) and diffusion minimum error entropy (DMEE) algorithms. The mean and mean square convergence analysis of the new algorithms are also carried out.