An Ensemble Approach for Record Matching in Data Linkage
Refereed conference paper presented and published in conference proceedings


引用次數
替代計量分析
.

其它資訊
摘要Objectives: To develop and test an optimal ensemble configuration of two complementary probabilistic data matching techniques namely Fellegi-Sunter (FS) and Jaro-Wrinkler (JW) with the goal of improving record matching accuracy. Methods: Experiments and comparative analyses were carried out to compare matching performance amongst the ensemble configurations combining FS and JW against the two techniques independently. Results: Our results show that an improvement can be achieved when FS technique is applied to the remaining unsure and unmatched records after the JW technique has been applied. Discussion: Whilst all data matching techniques rely on the quality of a diverse set of demographic data, FS technique focuses on the aggregating matching accuracy from a number of useful variables and JW looks closer into matching the data content (spelling in this case) of each field. Hence, these two techniques are shown to be complementary. In addition, the sequence of applying these two techniques is critical. Conclusion: We have demonstrated a useful ensemble approach that has potential to improve data matching accuracy, particularly when the number of demographic variables is limited. This ensemble technique is particularly useful when there are multiple acceptable spellings in the fields, such as names and addresses.
著者Poon SK, Poon J, Lam MK, Yin QL, Sze DMY, Wu JCY, Mok VCT, Ching JYL, Chan KL, Cheung WHN, Lau AY
會議名稱24th Australian National Health Informatics Conference (HIC)
會議開始日01.01.2016
會議地點Melbourne
會議國家/地區澳大利亞
詳細描述To ORKTS: Research from Joint laboratory (ACCLAIM) with University of Sydney
出版年份2016
卷號227
出版社IOS PRESS
頁次113 - 119
國際標準書號978-1-61499-665-1
電子國際標準書號978-1-61499-666-8
國際標準期刊號0926-9630
語言英式英語
關鍵詞Data Linkage; Fellegi-Sunter; Jaro-Wrinkler; probabilistic data matching
Web of Science 學科類別Medical Informatics

上次更新時間 2020-19-10 於 03:10