I am trying to run a simulation that fits a logit model on 90% of the sample and then tests it on the remaining 10%. Next, I want to compare the predictions with the actual outcomes. I'm not sure how to save the predicted outcomes to compare with actual outcomes. This is what I have so far, but it is not working. Should save the p1 in a separate dta file with the actual Y and compare as a second step after the simulation has run? If so, how do I do that?
program define simcheck
* drop all variables to create an empty dataset
drop _all
* get dataset
use "F:\Master.dta"
* set sample size. Set to 90% (estimation here based on 200)
generate random = runiform()
sort random
gen group = 1 + (_n > 180)
* retain the variables of interest
keep DV X1 X2 X3 X4 X5 group
* run logit model ON group == 1
logit DV X1 X2 X3 X4 X5 if group == 1
* get predictions from model and test on other 10%. How many errors?
predict p1 if group == 2
generate predicted_error = DV-p1 if group == 2
* close programming language
end
simulate predicted_error, reps(10): simcheck
program define simcheck
* drop all variables to create an empty dataset
drop _all
* get dataset
use "F:\Master.dta"
* set sample size. Set to 90% (estimation here based on 200)
generate random = runiform()
sort random
gen group = 1 + (_n > 180)
* retain the variables of interest
keep DV X1 X2 X3 X4 X5 group
* run logit model ON group == 1
logit DV X1 X2 X3 X4 X5 if group == 1
* get predictions from model and test on other 10%. How many errors?
predict p1 if group == 2
generate predicted_error = DV-p1 if group == 2
* close programming language
end
simulate predicted_error, reps(10): simcheck