Hello,
This is my first Statalist post, and please forgive me if I'm not explaining the question clear enough.
I have a large dataset with over 4 million observations. Major variables include shopping trip ids (made by different individuals), products, date of purchase, price paid, location and etc. The goal is to run a few regressions with multiple fixed effects (i.e., controlling for product, store and time altogether). I couldn't make it a panel dataset using
because of repeated time values. I could group trip ids and date of purchase to make a unique time id and then use xtset but I'm not sure if it will be the best way. Because trip ids are just random numbers, that makes time ids not in chronological order. Will that affect the use of panel data commands?
If not, would you suggest ways to set up the regression with a large non-panel dataset? Thanks a lot.
- Louise
This is my first Statalist post, and please forgive me if I'm not explaining the question clear enough.
I have a large dataset with over 4 million observations. Major variables include shopping trip ids (made by different individuals), products, date of purchase, price paid, location and etc. The goal is to run a few regressions with multiple fixed effects (i.e., controlling for product, store and time altogether). I couldn't make it a panel dataset using
PHP Code:
xtset product date
If not, would you suggest ways to set up the regression with a large non-panel dataset? Thanks a lot.
- Louise