Data De-duplication of Election Database Using Windowing Algorithm

  • S.B.Kadus Student, Department of Computer Engineering, SavitribaiPhule Pune University, Imperial College Of Engineering and Research, Pune,India.
  • H.A.Sawant Student, Department of Computer Engineering, SavitribaiPhule Pune University, Imperial College Of Engineering and Research, Pune,India.
  • S.S.Tilekar Student, Department of Computer Engineering, SavitribaiPhule Pune University, Imperial College Of Engineering and Research, Pune,India.
  • H.D.Zendage Student, Department of Computer Engineering, SavitribaiPhule Pune University, Imperial College Of Engineering and Research, Pune,India.
Keywords: Data Blocking, Data Linkage, Entity Resolution, IASNM, Scalability, SortedNeighborhood, Windowing Techniques

Abstract

Record linkage is the method of searching similar data from set of database that has similar information or contents.The linking of records across database is techniques which has increased interest in recent time. Data matching is much essential in so many fields, because they have so many valuable information and this valuable information may be too costly to get or not possible to acquire from anywhere. In data mining same data can badly affect the result of the processes , so when data cleaning process removing same or duplicate data is very important which applied on single database. The de-duplication is nothing but when this technique is tested on single database. In record linkage and de-duplication the difficulties of matching data become one of the biggest challenges, for the increasing nature of todays database. There are various indexing techniques have been developed in last few years For record linkage and de-duplication .The use of this techniques is to reduce the number of records by comparing matching with non-matching one, and at the same time maintain high matching quality. To confirm our theory we take a classical record linkage algorithm, the sorted neighborhood method (SNM), and show how we can get improved performance and accuracy by adaptively changing its fixed sliding window size . In which their complication are analyzed and records are matched as well as their flexibility and work is calculated using both real and fake data set.

References

S.Chaudhuri, V.Ganti and R.Ananthakrishna. Eliminating Fuzzy Duplicates in Data Warehouses, In VLDB, (2002).

C.A.Knoblock and M.Michelson, Learning Blocking Schemes for Record Linkage, In AAAI, (2006).

V.S.Verykios, Tailor, M.G.Elfeky and A.K.Elmagarmid, A record linkage tool box, icde, 00:0017, (2002).

J.M.Kennedy and H.B.Newcombe, Record Linkage, ACM Comm. ACM, 5(1962), 563-566.

R.A.Baxter and L.Gu, Adaptive ltering for e cient record linkage, In SDM, (2004).

A.Bhamidipaty and S.Sarawagi, Interactive Deduplication using Active Learning, In ACM KDD, (2002).

www.google.co.in

www.wikipedia.com
How to Cite
S.B.Kadus, H.A.Sawant, S.S.Tilekar, & H.D.Zendage. (2015). Data De-duplication of Election Database Using Windowing Algorithm. International Journal of Current Research in Science and Technology, 1(4), 7-10. Retrieved from https://crst.gfer.org/index.php/crst/article/view/16
Section
Articles