When introducing new testing procedures, the authors generally "use Monte Carlo techniques to evaluate the size and power of their test". They usually state the nominal size is 5%. However, I have been unable to figure out what this means. Googling unfortunately gave me a bunch of links on how to determine required sample sizes...
My current hypothesis is that the authors randomly generate data under the null hypothesis (e.g. "no serial correlation") and then at each iteration check whether p < nominal size (e.g. p < 0.05). In other words, they count how often the null hypothesis is rejected even though it's true. Then they calculate the average, which should be as close to 0.05 as possible? Is this correct?
As a bonus question, they often also test the "power" of a test. How does that work? Do they again randomly generate the data, but now under the alternative hypothesis (Ha) and then again count how often the null hypothesis is rejected at p = nominal size? Getting as close to 1 is then the goal.
As an example, I've attached the paper introducing the CIPS measure (see xtcips in Stata), section VI details the size/power calculations.
My current hypothesis is that the authors randomly generate data under the null hypothesis (e.g. "no serial correlation") and then at each iteration check whether p < nominal size (e.g. p < 0.05). In other words, they count how often the null hypothesis is rejected even though it's true. Then they calculate the average, which should be as close to 0.05 as possible? Is this correct?
As a bonus question, they often also test the "power" of a test. How does that work? Do they again randomly generate the data, but now under the alternative hypothesis (Ha) and then again count how often the null hypothesis is rejected at p = nominal size? Getting as close to 1 is then the goal.
As an example, I've attached the paper introducing the CIPS measure (see xtcips in Stata), section VI details the size/power calculations.