Abstract:FB15k-237 mitigates the data leakage issue by excluding inverse and symmetric relationship triples, however, this has led to substantial performance degradation and slow improvement progress. Traditional approaches demonstrate limited effectiveness on FB15k-237, primarily because the underlying mechanism by which structural features of the dataset influence model performance remains unexplored. To bridge this gap, we systematically investigate the impact mechanism of dataset structural features on link prediction performance. Firstly, we design a structured subgraph sampling strategy that ensures connectivity while constructing subgraphs with distinct structural features. Then, through correlation and sensitivity analyses conducted across several mainstream models, we observe that the distribution of relationship categories within subgraphs significantly affects performance, followed by the size of strongly connected components. Further exploration using the LIME model clarifies the intrinsic mechanism by which relationship categories influence link prediction performance, revealing that relationship categories primarily modulate the relative importance between entity embeddings and relationship embeddings and relationship embeddings, thereby affecting link prediction outcomes. These findings provide theoretical insights for addressing performance bottlenecks on FB15k-237, while the proposed analytical framework also offers methodological guidance for future studies dealing with structurally constrained datasets.
Abstract:Entity extraction is a key technology for obtaining information from massive texts in natural language processing. The further interaction between them does not meet the standards of human reading comprehension, thus limiting the understanding of the model, and also the omission or misjudgment of the answer (ie the target entity) due to the reasoning question. An effective MRC-based entity extraction model-MRC-I2DP, which uses the proposed gated attention-attracting mechanism to adjust the restoration of each part of the text pair, creating problems and thinking for multi-level interactive attention calculations to increase the target entity It also uses the proposed 2D probability coding module, TALU function and mask mechanism to strengthen the detection of all possible targets of the target, thereby improving the probability and accuracy of prediction. Experiments have proved that MRC-I2DP represents an overall state-of-the-art model in 7 from the scientific and public domains, achieving a performance improvement of up to compared to the model model in F1.