Greetings to all (my first post),
I have a question about how to set up svyset for my analysis. The aim is an analysis of differences over time for several aspects of household wellbeing. The data come from two household surveys five years apart. The sampling approach was the same in both surveys with one difference (equal village samples vs. proportional village samples to size).
I wondered whether I introduce bias into the comparison between the two surveys if I do enter the complete sampling structure into svyset for the second survey, but can't for the first. If so, would it be "better" not to enter the sampling structure for the second survey, basically adding no pweights (or the same pweight for all villages) as it was a proportional to size sample? I know this will mean the standard errors will not be estimated correctly. I know it is difficult to say what is "worse/better", but I would still appreciate your "feeling" about the two options (using the same structure with lacking sampling info, or different structures).
If it is better to enter the structure for survey 2 (and therefore have different structures for survey 1 and 2), should I calculate the pweight separately and then combine in one column (my database has the two surveys below each other, i.e. one column per variable, with an identifier variable for the year)? Also my FPC will be different for the different surveys. Do I do the same, just calculate differently for the two surveys and combine in one column? I have some question about how to set up svyset for survey 2, but I will post this in a separate message, so as not to make this too long.
Hope this interests you, thanks in advance,
Sebastiaan
I have a question about how to set up svyset for my analysis. The aim is an analysis of differences over time for several aspects of household wellbeing. The data come from two household surveys five years apart. The sampling approach was the same in both surveys with one difference (equal village samples vs. proportional village samples to size).
- In the first survey, data were collected from 10 villages (sample frame=all households in these 10 villages). The aim was 500 household interviews with 50 interviews in each village, regardless of the number of households in each village. Each village is divided into sub-villages, and most villages have both sub-villages that are coastal and inland. One coastal and one inland sub-village was randomly selected from each village (or two coastal/inland sub-villages for villages with only coastal/inland sub-villages). The 50 interviews were divided over the coastal/inland sub-village proportional to size: # households in all coastal/inland sub-villages to total # households in that village (see table below).
Village | Sub-village | Coastal/ inland | # Households | Assigned interviews | |
A | 550 | ||||
A-1 | C | 200 | |||
A-2 | C | 100 | 27 | 300/550*50 | |
A-3 | I | 130 | |||
A-4 | I | 120 | 23 | 250/550*50 | |
B | 690 | ||||
B-1 | C | 300 | 22 | 300/690*50 | |
B-2 | I | 250 | |||
B-3 | I | 140 | 28 | 390/690*50 |
- In the second survey, the same procedure was applied except that the village samples were assigned proportional to size, and the total sample size of the survey was doubled to 1,000 household interviews. The original 10 villages were restructured by the government into 16 between the two surveys and the 1,000 interviews were therefore divided over 16 villages (these can be combined back into the old 10-village structure as sub-villages became villages).
I wondered whether I introduce bias into the comparison between the two surveys if I do enter the complete sampling structure into svyset for the second survey, but can't for the first. If so, would it be "better" not to enter the sampling structure for the second survey, basically adding no pweights (or the same pweight for all villages) as it was a proportional to size sample? I know this will mean the standard errors will not be estimated correctly. I know it is difficult to say what is "worse/better", but I would still appreciate your "feeling" about the two options (using the same structure with lacking sampling info, or different structures).
If it is better to enter the structure for survey 2 (and therefore have different structures for survey 1 and 2), should I calculate the pweight separately and then combine in one column (my database has the two surveys below each other, i.e. one column per variable, with an identifier variable for the year)? Also my FPC will be different for the different surveys. Do I do the same, just calculate differently for the two surveys and combine in one column? I have some question about how to set up svyset for survey 2, but I will post this in a separate message, so as not to make this too long.
Hope this interests you, thanks in advance,
Sebastiaan