Esta é uma pergunta muito básica. Por que usamos uma distribuição chi square? Qual o significado dessa distribuição? Por que essa distribuição é usada para criar um intervalo de confiança para a variação?

Todos os lugares em que procuro uma explicação no Google apenas apresentam esse fato, explicando quando usar o chi, mas não explicando por que usar o chi e por que ela tem a mesma aparência.

Muito obrigado a qualquer um que possa me indicar a direção certa e que seja - realmente entendendo por que estou usando o chi quando estou criando um intervalo de confiança para a variação.

variance chi-squared

— nafrtiti
fonte

Você o usa porque - quando os dados são normais -

Q = (n - 1) \frac{s^{2}}{σ^{2}} \sim χ_{n - 1}^{2}

$Q = (n-1)\frac{s^2}{\sigma^2}\sim \chi^2_{n-1}$ . (Isto faz com que

Q

$Q$ uma quantidade essencial)

— Glen_b -Reinstate Monica

Consulte também stats.stackexchange.com/questions/15711/… e seus links.

— Nick Cox

Para aqueles interessados nas aplicações de ou em pesquisas adicionais sobre

χ^{2}

$\chi^2$ , você deve prestar atenção à distinção entre uma distribuição de

χ^{2}

$\chi^2$ ("qui-quadrado") e uma distribuição de

χ

$\chi$ ("qui") (é o raiz quadrada de um

χ^{2}

$\chi^2$ , sem surpresa).

— whuber

Resposta rápida

O motivo é que, assumindo que os dados são iid e e definindo $X_i\sim N(\mu,\sigma^2)$ ao formar intervalos de confiança, a distribuição da amostra associada à variância da amostra (, lembre-se, uma variável aleatória!) É uma distribuição qui-quadrado (), assim como a distribuição da amostra associada à média da amostra é uma distribuição normal padrão (

\begin{array}{rcl} \bar{X} & = & \sum^{N} \frac{X_{i}}{N} \\ S^{2} & = & \sum^{N} \frac{(\bar{X} - X_{i})^{2}}{N - 1} \end{array}

$\begin{eqnarray*} \bar{X}&=&\sum^N \frac{X_i}{N}\\ S^2 &=& \sum^{N} \frac{(\bar{X}-X_i)^2}{N-1} \end{eqnarray*}$

S^{2}

$S^2$

S^{2} (N - 1) / σ^{2} \sim χ_{n - 1}^{2}

$S^2(N-1)/\sigma^2 \sim \chi^2_{n-1}$

) quando você conhece a variação, e com um aluno t quando você não (

(\bar{X} - μ) \sqrt{n} / σ \sim Z (0, 1)

$(\bar{X}-\mu)\sqrt{n}/\sigma \sim Z(0,1)$

(\bar{X} - μ) \sqrt{n} / S \sim T_{n - 1}

$(\bar{X}-\mu)\sqrt{n}/S \sim T_{n-1}$

Resposta longa

Primeiramente, provaremos que segue uma distribuição qui-quadrado com $S^2(N-1)/\sigma^2$ $N-1$ graus de liberdade. Depois disso, veremos como essa prova é útil ao derivar os intervalos de confiança para a variação e como a distribuição do qui-quadrado aparece (e por que é tão útil!). Vamos começar.

A prova

Para isso, talvez você precise se acostumar com a distribuição do qui-quadrado neste artigo da Wikipedia . Essa distribuição possui apenas um parâmetro: os graus de liberdade, , e passa a ter uma Função Geradora de Momento (MGF) dada por: Se pudermos mostrar que a distribuição de tem uma função geradora de momentos como esta, mas com $\nu$

m_{χ_{ν}^{2}} (t) = (1 - 2 t)^{- ν / 2} .

$\begin{equation*} m_{\chi^2_\nu}(t)=(1-2t)^{-\nu/2}. \end{equation*}$

S^{2} (N - 1) / σ^{2}

$S^2(N-1)/\sigma^2$

, mostramos que

segue uma distribuição qui-quadrado com

graus de liberdade. Para mostrar isso, observe dois fatos:

ν = N - 1

$\nu=N-1$

S^{2} (N - 1) / σ^{2}

$S^2(N-1)/\sigma^2$

N - 1

$N-1$

Se definirmos, onde, isto é, variáveis aleatória normal, a função de geração de momento deé dada por
$Y = \sum \frac{(X_{i} - \bar{X})^{2}}{σ^{2}} = \sum Z_{i}^{2},$ $\begin{equation*} Y = \sum \frac{(X_i-\bar{X})^2}{\sigma^2} = \sum Z_i^2, \end{equation*}$ $Z_i\sim N(0,1)$ $Y$ $\begin{array}{rcl} m_{Y} (t) & = & E [e^{t Y}] \\ = & E [e^{t Z_{1}^{2}}] \times E [e^{t Z_{2}^{2}}] \times . . . E [e^{t Z_{N}^{2}}] \\ = & m_{Z_{i}^{2}} (t) \times m_{Z_{2}^{2}} (t) \times . . . m_{Z_{N}^{2}} (t) . \end{array}$ $\begin{eqnarray*} m_Y(t) &=& \mathbb{E}[e^{tY}]\\ &=&\mathbb{E}[e^{tZ_1^2}]\times \mathbb{E}[e^{tZ_2^2}]\times ...\mathbb{E}[e^{tZ_N^2}]\\ &=&m_{Z_i^2}(t)\times m_{Z_2^2}(t)\times ...m_{Z_N^2}(t). \end{eqnarray*}$ The MGF of $Z^2$ is given by $\begin{array}{rcl} m_{Z^{2}} (t) & = & \int_{- \infty}^{\infty} f (z) \exp (t z^{2}) d z \\ = & (1 - 2 t)^{- 1 / 2}, \end{array}$ $\begin{eqnarray*} m_{Z^2}(t) &=& \int_{-\infty}^{\infty} f(z)\exp(tz^2)dz\\ &=&(1-2t)^{-1/2}, \end{eqnarray*}$ where I have used the PDF of the standard normal, $f(z)=e^{-z^2/2}/\sqrt{2\pi}$ and, hence, $m_{Y} (t) = (1 - 2 t)^{- N / 2},$ $\begin{equation*} m_Y(t)=(1-2t)^{-N/2}, \end{equation*}$ which implies that $Y$ follows a chi-square distribution with $N$ degrees of freedom.
$Y_1$ $Y_2$ $\nu_1$ $\nu_2$ degrees of freedom, then $W=Y_1+Y_2$ distributes with a chi-square distribution with $\nu_1+\nu_2$ degrees of freedom (this follows from taking the MGF of $W$ ; do this!).

With the above facts, note that if you multiply the sample variance by $N-1$ , you obtain (after some algebra),

(N - 1) S^{2} = - n (\bar{X} - μ) + \sum (X_{i} - μ)^{2},

$\begin{equation*} (N-1)S^2 = -n(\bar{X}-\mu)+\sum(X_i-\mu)^2, \end{equation*}$ and, hence, dividing by

σ^{2}

$\sigma^2$ ,

\frac{(N - 1) S^{2}}{σ^{2}} + \frac{(\bar{X} - μ)^{2}}{σ^{2} / N} = \sum \frac{(X_{i} - μ)^{2}}{σ^{2}} .

$\begin{equation*} \frac{(N-1)S^2}{\sigma^2}+\frac{(\bar{X}-\mu)^2}{\sigma^2/N}=\sum \frac{(X_i-\mu)^2}{\sigma^2}. \end{equation*}$ Note that the second term in the left-side of this sum distributes as a chi-square distribution with 1 degree of freedom, and the right-hand side sum distributes as a chi-square with

N

$N$ degrees of freedom. Therefore, $S^2(N-1)/\sigma^2$ distributes as a chi-square with $N-1$ degrees of freedom.

Calculating the Confidence Interval for the variance.

When looking for a confidence interval for the variance, you want to know the limits $L_1$ and $L_2$ in

P (L_{1} \leq σ^{2} \leq L_{2}) = 1 - α .

$\begin{equation*} \mathbb{P}\left(L_1\leq \sigma^2 \leq L_2\right) = 1-\alpha. \end{equation*}$ Let's play with the inequality inside the parenthesis. First, divide by

S^{2} (N - 1)

$S^2(N-1)$ ,

\frac{L_{1}}{S^{2} (N - 1)} \leq \frac{σ^{2}}{S^{2} (N - 1)} \leq \frac{L_{2}}{S^{2} (N - 1)} .

$\begin{equation*} \frac{L_1}{S^2(N-1)}\leq \frac{\sigma^2}{S^2(N-1)} \leq \frac{L_2}{S^2(N-1)}. \end{equation*}$ And then remember two things: (1) the statistic

S^{2} (N - 1) / σ^{2}

$S^2(N-1)/\sigma^2$ has a chi-squared distribution with

N - 1

$N-1$ degrees of freedom and (2) the variances is always greather than zero, which implies that you can invert the inequalities, because

\begin{array}{rcl} \frac{L_{1}}{S^{2} (N - 1)} \leq \frac{σ^{2}}{S^{2} (N - 1)} & \Rightarrow & \frac{S^{2} (N - 1)}{σ^{2}} \leq \frac{S^{2} (N - 1)}{L_{1}}, \\ \frac{σ^{2}}{S^{2} (N - 1)} \leq \frac{L_{2}}{S^{2} (N - 1)} & \Rightarrow & \frac{S^{2} (N - 1)}{L_{2}} \leq \frac{S^{2} (N - 1)}{σ^{2}}, \end{array}

$\begin{eqnarray*} \frac{L_1}{S^2(N-1)}\leq \frac{\sigma^2}{S^2(N-1)} &\Rightarrow& \frac{S^2(N-1)}{\sigma^2}\leq \frac{S^2(N-1)}{L_1},\\ \frac{\sigma^2}{S^2(N-1)} \leq \frac{L_2}{S^2(N-1)} &\Rightarrow& \frac{S^2(N-1)}{L_2} \leq \frac{S^2(N-1)}{\sigma^2},\\ \end{eqnarray*}$ hence, the probability we are looking for is:

P (\frac{S^{2} (N - 1)}{L_{2}} \leq \frac{S^{2} (N - 1)}{σ^{2}} \leq \frac{S^{2} (N - 1)}{L_{1}}) = 1 - α .

$\begin{equation*} \mathbb{P}\left(\frac{S^2(N-1)}{L_2} \leq \frac{S^2(N-1)}{\sigma^2}\leq \frac{S^2(N-1)}{L_1}\right) = 1-\alpha. \end{equation*}$ Note that

S^{2} (N - 1) / σ^{2} \sim χ^{2} (N - 1)

$S^2(N-1)/\sigma^2 \sim \chi^2(N-1)$ . We want then,

\begin{array}{rcl} \int_{\frac{S^{2} (N - 1)}{L_{2}}}^{N - 1} p_{χ^{2}} (x) d x & = & (1 - α) / 2, \\ \int_{N - 1}^{\frac{S^{2} (N - 1)}{L_{1}}} p_{χ^{2}} (x) d x & = & (1 - α) / 2 \end{array}

$\begin{eqnarray*} \int_{\frac{S^2(N-1)}{L_2}}^{N-1}p_{\chi^2}(x)dx &=& (1-\alpha)/2\ \ \ ,\\ \int_{N-1}^{\frac{S^2(N-1)}{L_1}}p_{\chi^2}(x)dx &=& (1-\alpha)/2\ \ \, \end{eqnarray*}$ (we integrate up to

N - 1

$N-1$ because the expected value of a chi-squared random variable with

N - 1

$N-1$ degrees of freedom is

N - 1

$N-1$ ) or, equivalently,

\begin{array}{rcl} \int_{0}^{\frac{S^{2} (N - 1)}{L_{2}}} p_{χ^{2}} (x) d x = α / 2, \\ \int_{\frac{S^{2} (N - 1)}{L_{1}}}^{\infty} p_{χ^{2}} (x) d x = α / 2. \end{array}

$\begin{eqnarray*} \int_{0}^{\frac{S^2(N-1)}{L_2}}p_{\chi^2}(x)dx=\alpha/2,\\ \int_{\frac{S^2(N-1)}{L_1}}^{\infty}p_{\chi^2}(x)dx=\alpha/2. \end{eqnarray*}$ Calling

χ_{α / 2}^{2} = \frac{S^{2} (N - 1)}{L_{2}}

$\chi^2_{\alpha/2}=\frac{S^2(N-1)}{L_2}$ and

χ_{1 - α / 2}^{2} = \frac{S^{2} (N - 1)}{L_{1}}

$\chi^2_{1-\alpha/2}= \frac{S^2(N-1)}{L_1}$ , where the values

χ_{α / 2}^{2}

$\chi^2_{\alpha/2}$ and

χ_{1 - α / 2}^{2}

$\chi^2_{1-\alpha/2}$ can be found in chi-square tables (in computers mainly!) and solving for

L_{1}

$L_1$ and

L_{2}

$L_2$ ,

\begin{array}{rcl} L_{1} & = & \frac{S^{2} (N - 1)}{χ_{1 - α / 2}^{2}}, \\ L_{2} & = & \frac{S^{2} (N - 1)}{χ_{α / 2}^{2}} . \end{array}

$\begin{eqnarray*} L_1 &=& \frac{S^2(N-1)}{\chi^2_{1-\alpha/2}},\\ L_2 &=& \frac{S^2(N-1)}{\chi^2_{\alpha/2}}. \end{eqnarray*}$ Hence, your confidence interval for the variance is

C . I . = (\frac{S^{2} (N - 1)}{χ_{1 - α / 2}^{2}}, \frac{S^{2} (N - 1)}{χ_{α / 2}^{2}}) .

$\begin{equation*} C.I.=\left(\frac{S^2(N-1)}{\chi^2_{1-\alpha/2}}, \frac{S^2(N-1)}{\chi^2_{\alpha/2}}\right). \end{equation*}$

— Néstor
fonte

Simply because

S^{2}

$S^2$ does not follow a centered chi-square distribution, while

S^{2} (N - 1) / σ^{2}

$S^2(N-1)/\sigma^2$ does and, therefore, its easier to work with. Are you asking for a derivation for that? (i.e., you want someone to show you that

S^{2} (N - 1) / σ^{2}

$S^2(N-1)/\sigma^2$ follows a chi-square distribution with

N - 1

$N-1$ degrees of freedom?)

— Néstor

It would be helpful to modify this answer to include the very strong but unstated assumption that the sample variance follows a chi-squared distribution when the underlying data are independent and follow a normal distribution. Unlike the theory of the distribution of the sample mean, where in practice its sampling distribution will be approximately Normal to reasonable accuracy in many situations, this same asymptotic behavior tends not to happen with the sample variance (until sample sizes become extremely large).

— whuber

Oops. So, so true! This actually came from a problem solution that I handed out to some students, where I state on the question all these assumptions. I edited the answer now.

— Néstor

@user34756 The reason we don't use the distribution of

S^{2}

$S^2$ directly is that its distribution depends on the value of a parameter. You may find it useful to investigate the use of pivotal quantities in constructing confidence intervals.

— Glen_b -Reinstate Monica

Isn't

f (z) = e^{- z^{2} / 2}

$f(z) = e^{-z^2/2}$ instead of

f (z) = e^{- z^{2}}

$f(z) = e^{-z^2}$ ?

— Benoît Legat

Por que o qui quadrado é usado ao criar um intervalo de confiança para a variação?

Resposta rápida

Resposta longa

A prova

Calculating the Confidence Interval for the variance.