Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AutoBlock: A Hands-off Blocking Framework for Entity Matching

Dec 07, 2019

Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, David Page

Figure 1 for AutoBlock: A Hands-off Blocking Framework for Entity Matching

Figure 2 for AutoBlock: A Hands-off Blocking Framework for Entity Matching

Figure 3 for AutoBlock: A Hands-off Blocking Framework for Entity Matching

Figure 4 for AutoBlock: A Hands-off Blocking Framework for Entity Matching

Share this with someone who'll enjoy it:

Abstract:Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matched. However, most of the traditional blocking methods are learning-free and key-based, and their successes are largely built on laborious human effort in cleaning data and designing blocking keys. In this paper, we propose AutoBlock, a novel hands-off blocking framework for entity matching, based on similarity-preserving representation learning and nearest neighbor search. Our contributions include: (a) Automation: AutoBlock frees users from laborious data cleaning and blocking key tuning. (b) Scalability: AutoBlock has a sub-quadratic total time complexity and can be easily deployed for millions of records. (c) Effectiveness: AutoBlock outperforms a wide range of competitive baselines on multiple large-scale, real-world datasets, especially when datasets are dirty and/or unstructured.

* In The Thirteenth ACM International Conference on Web Search and Data Mining (WSDM '20), February 3-7, 2020, Houston, TX, USA. ACM, Anchorage, Alaska, USA , 9 pages

View paper on

Share this with someone who'll enjoy it:

Title:AutoBlock: A Hands-off Blocking Framework for Entity Matching

Paper and Code