Chapter 1 Introduction

The Statistical Center of Iran (SCI, https://amar.org.ir) has been conducting Household Expenditure and Income Survey (HEIS), formerly known as Household Budget Survey, in rural areas since 1342 (1963-1964), and urban areas as of 1347 (1968-69). In the beginning, it only contained the household expenditures questions; and it was in 1353 (1974-75) that the questions concerning the household income were added to the survey’s questionnaire.

The HIES aims to hand in estimates of the average income and expenditure for urban and rural households at provincial and country levels. Among many applications, the surveys enable researchers to estimate the household’s income and expenditure composition and distribution patterns, the household consumption pattern, the weight for each commodity in the household consumption basket, also to calculate the poverty line, and study the imparity in household income and facilities.

The SCI has published the raw data of this survey since 1984. In this project, I clean and explore this data in a reproducible way enabling researchers to use them with little hassle. Because the raw data of the surveys conducted prior to 1376 (1997-98) does not provide sample multipliers to the user, at the first stage of this project, I put my primary focus on the period after 1376. For the future, I aim at generating the sample multiplier for older years using the Census data.

A note on Persian calendar. The reference year of the surveys is based on the Persian calendar in which new year’s eve (Nowrouz) is on March 20 or March 21. In this document, we refer to the survey year based on the Persian calendar. To convert years from Persian to Gregorian calendar, one should add 621 for dates before 1 Farvardin (March 21) and 622 for dates after 1 Farvardin (March 21). For example, 1400 in the Persian calendar starts in March 2021 and ends in March 2022.

1.1 Sample design

The SCI uses a multi-stage sample design for this survey, and its target population includes all private and collective settled households in urban and rural areas. A three-stage cluster sampling method with strata is used in the survey using the latest population census as the framework. At the first stage, the census areas are classified and selected. At the second stage, the urban and rural blocks are selected, and the selection of sample households is made at the third stage. The number of samples is optimized to estimate the average annual income and expenditure of the sample household. In order to obtain estimations more representative of the whole year, the samples are evenly distributed between the months of the year.

1.1.1 Rotating panel feature

Since 1389 (2010-11), the sample is designed with a rotating panel feature, in the sense that households are resampled up to three consecutive years. The first rotating frame is designed for 1389-1391, then for 1392-1396, and the third one, which is ongoing, has started in 1397.

As an example the design of the second frame is as follows:

group Y1392 Y1393 Y1394 Y1395 Y1396
A A3
B B2 B3
C C1 C2 C3
D D1 D2 D3
E E1 E2 E3
F F1 F2
G G1

1.2 Different rounds

Over time surveys are different in some features such as sample size, rotating panel sampling, number of expenditure and income tables, and coding different variables. The below table compares different years. More details on the differences will be presented in the next parts.

year Households Rotating.panel Expenditure.tables Income.tables
1376 21950 No 10 3
1377 17477 No 10 3
1378 27464 No 10 3
1379 26941 No 10 3
1380 26961 No 10 3
1381 32152 No 10 3
1382 23134 No 10 3
1383 24534 No 10 3
1384 26895 No 14 3
1385 30910 No 14 3
1386 31283 No 14 3
1387 39088 No 14 3
1388 37045 No 14 3
1389 38950 yes 14 3
1390 38513 yes 14 4
1391 38192 yes 14 4
1392 38331 yes 14 4
1393 39856 yes 14 4
1394 39857 yes 14 4
1395 39864 yes 14 4
1396 37962 yes 14 4
1397 38960 yes 14 4
1398 38328 yes 14 4