Chapter 3 Importing raw data

The raw data of HEIS is available at: https://bit.ly/3a9BTI0 (accessed 23-3-2020). For documentation, we explain the details regarding the importing process in this chapter, but readers uninterested on the details of importing can skip the rest of this chapter and download our converted R version of the raw files (in .rdata format) here: https://bit.ly/3hxwsqh

The raw data files of SCI include a .mdb data file and a questionnaire. The first step is to import the data file into R. This require an R package called “RODBC.” We also need “tidyverse” and “data.table” packages later and if any of them are not installed on your platform you can install by install.packages() function.

install.packages(RODBC)
install.packages(tidyverse)
install.packages(data.table)

3.1 Importing access files to R

With the RODBC packaged the .mdb files can be imported using the following sample code for the 1397 survey. Note than the below code will only work with 32-bit versions of the Microsoft drivers and are not usable on 64-bit version. If you have a 64-bit R and get error with the below code, see this stack-exchange post: https://stackoverflow.com/questions/13070706/how-to-connect-r-with-access-database-in-64-bit-window

library(RODBC) # load package

db <- file.path("F:/Data/HEIS/97/Data97.mdb") # File path to be changed based on the location of data-files

channel<-odbcConnectAccess2007(db) #internal RODBC function

names <-subset(sqlTables(channel),TABLE_TYPE == "TABLE", TABLE_NAME)[[1]]

for (i in names) {
  print(i)
  nam <- paste(i)
  assign(nam, sqlFetch(channel,i))
}

rm("channel","i", "nam", "names","db")

save.image(file=paste0("F:/Data/HEIS/97.RData"))

3.2 Importing code for all years

The below code is written to import the access files of all years and merge them with the summary files provided separately by SCI. Note that for year 1389 and 1390 the access files are provided in .accdb format. The SCI published the summary files up to 1395. As the format of summary files are changing between .dbf, .xls, and .xlsx formats, we need to include packages “foreign” and “readxl” too.

library(RODBC) # load packages
library(foreign) 
library(readxl)

for (j in 63:95) {

  if (j==89|j==90) {
    db <- file.path(paste0("F:/Data/HBS/",j,"/Data-",j,"/Data",j,".accdb") )
  } else {
    db <- file.path(paste0("F:/Data/HBS/",j,"/Data-",j,"/Data",j) )
  }

channel<-odbcConnectAccess2007(db) #internal RODBC function

names <-subset(sqlTables(channel),TABLE_TYPE == "TABLE", TABLE_NAME)[[1]]

for (i in names) {
  print(i)
  nam <- paste(i)
  assign(nam, sqlFetch(channel,i))
}

rm("channel","i", "nam", "names","db")

save.image(file=paste0("F:/Data/HBS/r/raw",j,".RData"))

print(j)

rm(list=ls())
}


## merging summary files

for (j in 63:75) {
  
  assign(paste0("Sum_R",j), read.dbf(paste0("F:/Data/HBS/Summeries/63-75/R",j,"_SUM.dbf")))
  assign(paste0("Sum_U",j), read.dbf(paste0("F:/Data/HBS/Summeries/63-75/U",j,"_SUM.dbf")))
  
  load(paste0("F:/DATA/HBS/r/raw",j,".rdata")) 
  
  save.image(file=paste0("F:/Data/HBS/r/",j,".RData"))
  
  print(j)
  
  rm(list=ls())
}

for (j in rbind(76:87,93,95)) {
  
  assign(paste0("Sum_R",j), read_xls(paste0("F:/Data/HBS/Summeries/sum_",j,"/W_SUM_R",j,".xls")))
  assign(paste0("Sum_U",j), read_xls(paste0("F:/Data/HBS/Summeries/sum_",j,"/W_SUM_U",j,".xls")))
  
  load(paste0("F:/DATA/HBS/r/raw",j,".rdata")) 
  
  save.image(file=paste0("F:/Data/HBS/r/",j,".RData"))
  
  print(j)
  
  rm(list=ls())
}

for (j in rbind(88:92,94)) {
  
  assign(paste0("Sum_R",j), read_xlsx(paste0("F:/Data/HBS/Summeries/sum_",j,"/W_SUM_R",j,".xlsx")))
  assign(paste0("Sum_U",j), read_xlsx(paste0("F:/Data/HBS/Summeries/sum_",j,"/W_SUM_U",j,".xlsx")))
  
  load(paste0("F:/DATA/HBS/r/raw",j,".rdata")) 
  
  save.image(file=paste0("F:/Data/HBS/r/",j,".RData"))
  
  print(j)
  
  rm(list=ls())
}