Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 65052

collapse of all obs vs collapse of a fraction of obs

$
0
0
Hello. Could anyone help me figure out why the two commands below end up differently? Each row of my dataset corresponds to a delivery in a given health facility. The variable tp_par indicates the type of delivery. I would like to collapse my dataset aiming for a variable with the sum of all deliveries of type 5 by health facility.

1st code
Code:
. use SINASC_12a13_all, clear

. keep if tp_par==5
(3598297 observations deleted)

. gen um=1

. collapse (sum) npar_semtp=um, by(cnes)

. su npar_semtp

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
  npar_semtp |      1880    715.2697    1237.922          1      26083

2nd code
Code:
. use SINASC_12a13_all, clear

. gen npar_semtp = (tp_par==5)

. replace npar_semtp=. if tp_par==.
(574065 real changes made, 574065 to missing)

. collapse (sum) npar_semtp, by(cnes)

. su npar_semtp

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
  npar_semtp |      1894    709.9826    1234.857          0      26083

Viewing all articles
Browse latest Browse all 65052

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>