Hello. Could anyone help me figure out why the two commands below end up differently? Each row of my dataset corresponds to a delivery in a given health facility. The variable tp_par indicates the type of delivery. I would like to collapse my dataset aiming for a variable with the sum of all deliveries of type 5 by health facility.
1st code
2nd code
1st code
Code:
. use SINASC_12a13_all, clear . keep if tp_par==5 (3598297 observations deleted) . gen um=1 . collapse (sum) npar_semtp=um, by(cnes) . su npar_semtp Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- npar_semtp | 1880 715.2697 1237.922 1 26083
2nd code
Code:
. use SINASC_12a13_all, clear . gen npar_semtp = (tp_par==5) . replace npar_semtp=. if tp_par==. (574065 real changes made, 574065 to missing) . collapse (sum) npar_semtp, by(cnes) . su npar_semtp Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- npar_semtp | 1894 709.9826 1234.857 0 26083