Harmonizing and Synthesizing Health Research Data in R | by Rodrigo M Carrillo Larco, MD, PhD | January, 2025
R code to extract data from unique datasets and combine them into one synchronized dataset ready for seamless analysis

My academic research mainly involves identifying health research data sets, synchronizing them, and combining (combining) individual data sets to analyze them together. This means combining data sets across populations, study sites, or countries. It also means combining variables for effective analysis together. In other words, I work in the field of data integration where I have been working full time since 2017.
I will explain the method I follow to extract data from each dataset, and combine each dataset into a combined dataset ready for analysis. This is based on seven years of experience working in educational settings around the world. This article covers the code in R.
Data aggregation – what is it?
In most settings we will collect new data (primary data set) or work with only one data set already available for analysis. This single dataset may come from a single hospital, a specific population (eg, a community-based epidemiological study), or a national health survey (ie, a nationally representative health survey…