level 3
final stats 341 assignment 4 Q1In our real life, we often count numbers of things around us. As we know that these count data are often initially assumed to be poisson distributed. Poisson is the nature model for the count data. A key assumption of the poisson regression model is that the varience equals the mean. var(Y ) = E(Y ) = μ, as the mean increases, the distribution gets closer to normal. However, count data often exhibit over-dispersion, with a variance larger than the mean.We usually anylysis the count data based on the general linear model, this model assumes additive treatment effects and errors. the errors are assumed to be independently and normally distributed with a constant variance, the response variable Y is also assumed to be normally distributed. The two distributions commonly used to model overdispersed count data arising from experiments with one-way designs, the poisson and the negative binomial(with k known), are members of the exponential family. k is often called the overdispersion parameter, it may used to reflect overdispersion when assuming the poisson distribution. As we know that the response variable and the error term are not normally distributed with count data. if the count data are modeled by either poisson distribution or the negative binomial distribution, the variance depends on the mean rather than remaining constant.the negative binomial distribution parameters are the probability of success in a single trial p , and the number of failures r, like the Poisson distribution, it is useful in modeling count data. It assumes that for each individual, a poisson distribution applies, but that the rates for individuals μi, given specific values on the predictors, vary across individuals. a new probability dstribution known as the negative binomial distribution is used to characterize the variance of the residuals. the variance of the negative binomial distribution is comprised of two components:(1)the expected rate μ as in poisson regression plus (2)a second amount that characterizes the additional variance in the rate parameter across individuals, not accounted for by the poisson distribution the negative binomial model of the errors allows greater variance than is permitted by poisson regression, thereby accounting for overdispersion in count data,negative binomial regression may still result in inflated t values.It is more general than the Poisson and is often suitable for count data when the Poisson is not. Because the negative binomial has a variance that is greater than its mean, often making it suitable for count data that do not meet the assumptions of the Poisson distribution. In the limit, as the parameter r increases to infinity, the negative binomial distribution approaches the Poisson distribution.The negative binomial distribution with 1111111 has the mean E(Y ) = μ and variance var(Y ) = μ(1 + μ/k)= μ+ μ²/k. if 1/k is zero we obtain the poisson variance, if 1/k > 0 then the variance is larger than the mean. thus, the negative binomial distribution is overdispersed relative to the poisson. The poisson distribution is the limiting distribution of the negative binomial as k→8. Varying the parameters μ and k provides a rich family of distributions to discribe count data, with overdispersion increasing as k decreases.Suppose collecting data on the number of auto accidents on a busy highway, and would like to be able to model the number of accidents per day. Because these are count data, and because there are a very large number of cars and a small probability of an accident for any specific car, Poisson distribution may be used. However, the probability of having an accident is likely to vary from day to day as the weather and amount of traffic change, and so the assumptions needed for the Poisson distribution are not met. In particular, the variance of this type of count data sometimes exceeds the mean by a large amount. most days have few or no accidents, and a few days have a large number, the overdispersion appears, Therefore, it will be preferable to use a negative binomial distribution.fianlly, we conclude that the negative binomial distribution can be a good model for count data where we have overdispersion realtive to the poisson model.
2007年05月05日 14点05分