Main Article Content

Abstract

Duplicate detection is the process of identifying multiple representations of same real world entities. In present, duplicate detection methods need to process ever larger datasetsin ever shorter time: maintaining the quality of a dataset becomes increasingly difficult. This paper presents two novel, progressive duplicate detection algorithms that significantly increase the efficiency of finding duplicates if the execution time is limited: They maximize the gain of the overall process within the time available by reporting most results much earlier than traditional approaches. Comprehensive experiments show that progressive algorithms can double the efficiency over time of traditional duplicate detection and significantly improve upon related work. Data are among the most important assets of a company. But due to data changes and sloppy data entry, errors such as duplicate entries might occur, making data cleansing and in particular duplicate detection indispensable. Progressive duplicate detection identifies most duplicate pairs early in the detection process.

Article Details

How to Cite
K. E. Eswari, & S.PraveenKumar. (2018). Parallel and Multiple E-Data Distributed Process with Progressive Duplicate Detection Model . International Journal of Intellectual Advancements and Research in Engineering Computations, 6(2), 1632–1635. Retrieved from https://ijiarec.com/ijiarec/article/view/708