level 13
Clustered Standard Errors(CSEs) happen when some observations in a data set are related to each other. This correlation occurs when an individual trait, like ability or socioeconomic background, is identical or similar for groups of observations within clusters. Panel data (multi-dimensional data collected over time) is usually the type of data associated with CSEs.
2019年09月28日 09点09分
3
level 13
For example, let’s say you wanted to know if class size affects SAT scores. Specifically, you think that smaller class size leads to better SAT scores. You collect panel data for dozens of classes in dozens of schools. As this is panel data, you almost certainly have clustering. Teachers might be more efficient in some classes than other classes, students may be clustered by ability (e.g. special education classes), or some schools might have better access to computers than others. According to Cameron and Miller, this clustering will lead to:
Standard errors that are smaller than regular OLS standard errors.
Narrow confidence intervals.
T-statistics that are too large.
Misleadingly small p-values.
2019年09月28日 09点09分
4
level 13
ncorrect standard errors violate of the the assumption of independence required by many estimation methods and statistical tests and can lead to Type I and Type II errors.
2019年09月28日 09点09分
5
level 13
Accurate standard errors are a fundamental component of statistical inference. Therefore, If you have CSEs in your data (which in turn produce inaccurate SEs), you should make adjustments for the clustering before running any further analysis on the data.
Hand calculations for clustered standard errors are somewhat complicated (compared to your average statistical formula). For example, this snippet from The American Economic Review gives the variance formula for the calculation of the clustered standard errors:
2019年09月28日 09点09分
6
level 13
One way to control for Clustered Standard Errors is to specify a model. For example, you could specify a random coefficient model or a hierarchical model. However, accuracy of any calculated SEs completely relies upon you specifying the correct model for within-cluster error correlation. A second option is Cluster-Robust Inference, which does not require you to specify a model. It does, however, have the assumption that the number of clusters approaches infinity (Ibragimov & Muller).
2019年09月28日 09点09分
7