Thais Menezes (UCD)
will speak on
A Model-Based Record Linkage Approach Using Household Information to Enhance Individual Matching Across Different Databa
Time: 3:00PM
Date: Thu 29th February 2024
Location: E0.32 (beside Pi restaurant)
[map]
Abstract: The field of record linkage is focused on matching information from the same entity across diverse sources without unique identifiers. Record linkage is gaining importance in applications ranging from medical record enhancement to the study of population mobility between censuses or surveys. Conventional record linkage models primarily concentrate on direct individual matching, often disregarding valuable group-level information inherent in the data. Motivated by recent research indicating enhanced performance when incorporating group information into the matching process, we propose a novel model-based approach that facilitates the joint estimation of individual and household match status, while also estimating the feature matching probabilities, given the match status of both individuals and their households. To illustrate the methodology we use the Italian Survey of Household Income and Wealth from 2014 and 2016. Our results, which account for different initialisation methods, demonstrate a notable improvement in the F1 score, with values around 80% when household information is considered, compared to approximately 46% for methods directly matching individuals without leveraging group information. Additionally, our findings underscore the model's robustness, as it consistently yields favourable outcomes across various initialisation methods and in the presence of implemented blocking strategies.
(This talk is part of the Working Group on Statistical Learning series.)
PDF notice
Return to all seminars
Social Media Links