Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 65068

setting up svyset for baseline and follow-on surveys with missing sampling structure information

$
0
0
Greetings to all (my first post),

I have a question about how to set up svyset for my analysis. The aim is an analysis of differences over time for several aspects of household wellbeing. The data come from two household surveys five years apart. The sampling approach was the same in both surveys with one difference (equal village samples vs. proportional village samples to size).
  • In the first survey, data were collected from 10 villages (sample frame=all households in these 10 villages). The aim was 500 household interviews with 50 interviews in each village, regardless of the number of households in each village. Each village is divided into sub-villages, and most villages have both sub-villages that are coastal and inland. One coastal and one inland sub-village was randomly selected from each village (or two coastal/inland sub-villages for villages with only coastal/inland sub-villages). The 50 interviews were divided over the coastal/inland sub-village proportional to size: # households in all coastal/inland sub-villages to total # households in that village (see table below).
Village Sub-village Coastal/ inland # Households Assigned interviews
A 550
A-1 C 200
A-2 C 100 27 300/550*50
A-3 I 130
A-4 I 120 23 250/550*50
B 690
B-1 C 300 22 300/690*50
B-2 I 250
B-3 I 140 28 390/690*50
  • In the second survey, the same procedure was applied except that the village samples were assigned proportional to size, and the total sample size of the survey was doubled to 1,000 household interviews. The original 10 villages were restructured by the government into 16 between the two surveys and the 1,000 interviews were therefore divided over 16 villages (these can be combined back into the old 10-village structure as sub-villages became villages).
Not all the sampling information of the first survey was recorded, and for most villages I don't have # households in each sub-village, only the total # households in the village. I cannot accurately estimate the sub-village data because of the village restructuring. I think I can therefore only correct for over/undersampling due to the equal division of the sample over the villages.

I wondered whether I introduce bias into the comparison between the two surveys if I do enter the complete sampling structure into svyset for the second survey, but can't for the first. If so, would it be "better" not to enter the sampling structure for the second survey, basically adding no pweights (or the same pweight for all villages) as it was a proportional to size sample? I know this will mean the standard errors will not be estimated correctly. I know it is difficult to say what is "worse/better", but I would still appreciate your "feeling" about the two options (using the same structure with lacking sampling info, or different structures).


If it is better to enter the structure for survey 2 (and therefore have different structures for survey 1 and 2), should I calculate the pweight separately and then combine in one column (my database has the two surveys below each other, i.e. one column per variable, with an identifier variable for the year)? Also my FPC will be different for the different surveys. Do I do the same, just calculate differently for the two surveys and combine in one column? I have some question about how to set up svyset for survey 2, but I will post this in a separate message, so as not to make this too long.


Hope this interests you, thanks in advance,

Sebastiaan

Viewing all articles
Browse latest Browse all 65068

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>