Title:

An Adaptive Blocking Approach for Entity Matching with MapReduce

Category:

Thesis and Dissertation Workshop (WTDBD)

Topics of interest:

MapReduce, Entity Matching, Adaptive Window, Sorted Neighborhood Method

Abstract:

Cloud computing has proven to be a powerful ally to efficient parallel execution of data-intensive tasks such as Entity Matching (EM) in the era of Big Data. For this reason, studies about challenges and possible solutions of how EM can benefit from the cloud computing paradigm have become an important demand nowadays. In this context, we investigate how the MapReduce programming model can be used to perform efficient parallel EM using a variation of the Sorted Neighborhood Method (SNM) that uses a varying size window. We propose Distributed Duplicate Count Strategy (DDCS), an efficient MapReduce-based approach for this adaptive SNM, aiming to decrease even more the execution time of SNM.

Author(s):

Demetrio Gomes Mestre, Carlos Eduardo Pires

Baixar o PDF