Distribuição do erro da soma dos quadrados para regressão linear?

Eu sei que a distribuição da variação de amostra É do fato de que pode ser expressa na forma de matriz, (onde A: simétrico), e pode ser novamente expressa em: (onde Q: ortonormal, D: matriz diagonal).

\sum \frac{(X_{i} - \bar{X})^{2}}{σ^{2}} \sim χ_{(n - 1)}^{2}

$\sum\frac{(X_i-\bar{X})^2}{\sigma^2}\sim \chi^2_{(n-1)}$

\sum \frac{(X_{i} - \bar{X})^{2}}{n - 1} \sim \frac{σ^{2}}{n - 1} χ_{(n - 1)}^{2}

$\sum\frac{(X_i-\bar{X})^2}{n-1}\sim \frac{\sigma^2}{n-1}\chi^2_{(n-1)}$

(X - \bar{X})^{2}

$(X-\bar{X})^2$

x A x^{'}

$xAx'$

x^{'} Q D Q^{'} x

$x'QDQ'x$

Que tal , dada a suposição ? $\sum(Y_i-\hat{\beta}_0-\hat{\beta}_1X_i)^2$ $(Y - \beta_0 - \beta_1X)\sim \mathcal{N}(0, \sigma^2)$

Eu acho que

\sum \frac{(Y_{i} - {\hat{β}}_{0} - {\hat{β}}_{1} X_{i})^{2}}{σ^{2}} \sim χ_{(n - 2)}^{2} .

$\sum\frac{(Y_i-\hat{\beta}_0-\hat{\beta}_1X_i)^2}{\sigma^2}\sim \chi^2_{(n-2)}.$

Mas não tenho idéia de como provar ou mostrar.

É distribuído exatamente como ? $\chi^2_{(n-2)}$

— KH Kim
fonte

Isso é lição de casa? Nesse caso, use a tag Homework.

— MånsT

Não, não é. Eu acho que é verdade bcoz, afinal, a soma dos quadrados é um quadrado de combinação linear de Y, dado X constante. Mas é isso? Prova simples como essa seria apreciada! math.stackexchange.com/questions/47009/...

— KH Kim

As descrições que você fornece na pergunta e no seu comentário são um pouco confusas. Você já escreveu qual é a sua matriz

A

$A$ deve ser para a variação da amostra? Isso ajuda você a ver como generalizar?

— cardeal

Corrigido para D. Acho que o ponto crítico é que o elemento diagonal de D deve ser algo como (1,1,1, ..., 1,0,0). Existe alguma maneira de provar isso? ou Existe alguma maneira de mostrar que

χ^{2} (n) = χ^{2} (n - 2) + χ^{2} (1) + χ^{2} (1)

$\chi^2(n)=\chi^2(n-2)+\chi^2(1)+\chi^2(1)$ onde sse /

σ^{2} \sim χ^{2} (n - 2)

$\sigma^2 \sim \chi^2(n-2)$ ,

\sum e_{i}^{2} / σ^{2} \sim χ^{2} (n)

$\sum{e_i^2}/\sigma^2 \sim \chi^2(n)$

— KH Kim

Podemos provar isso para um caso mais geral de $p$ variáveis usando a "matriz de chapéu" e algumas de suas propriedades úteis. Esses resultados são geralmente muito mais difíceis de declarar em termos não matriciais, devido ao uso da decomposição espectral.

Agora, na versão matricial dos mínimos quadrados, a matriz do chapéu é $H=X(X^TX)^{-1}X^T$ Onde $X$ tem $n$ linhas e $p+1$ colunas (coluna de unidades para $\beta_0$ ) Assuma a classificação completa da coluna por conveniência - caso contrário, você pode substituir $p+1$ pela classificação da coluna de $X$ na sequência. Podemos escrever os valores ajustados como $\hat{Y}_i=\sum_{j=1}^nH_{ij}Y_j$ ou em notação matricial $\hat{Y}=HY$ . Usando isso, podemos escrever a soma dos quadrados como:

\frac{\sum_{i = 1} (Y - \hat{Y_{i}})^{2}}{σ^{2}} = \frac{(Y - \hat{Y})^{T} (Y - \hat{Y})}{σ^{2}} = \frac{(Y - H Y)^{T} (Y - H Y)}{σ^{2}}

$\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}=\frac{(Y-\hat{Y})^T(Y-\hat{Y})}{\sigma^2}=\frac{(Y-HY)^T(Y-HY)}{\sigma^2}$

= \frac{Y^{T} (I_{n} - H) Y}{σ^{2}}

$=\frac{Y^T(I_n-H)Y}{\sigma^2}$

Onde $I_n$ é uma matriz de identidade de ordem $n$ . The last step follows from the fact that $H$ is an idepotent matrix, as

H^{2} = [X (X^{T} X)^{- 1} X^{T}] [X (X^{T} X)^{- 1} X^{T}] = X (X^{T} X)^{- 1} X^{T} = H = H H^{T} = H^{T} H

$H^2=[X(X^TX)^{-1}X^T][X(X^TX)^{-1}X^T]=X(X^TX)^{-1}X^T=H=HH^T=H^TH$

Now a neat property of idepotent matrices is that all of their eigenvalues must be equal to zero or one. Letting $e$ denote a normalised eigenvector of $H$ with eigenvalue $l$ , we can prove this as follows:

H e = l e ⟹ H (H e) = H (l e)

$He=le\implies H(He)=H(le)$

L H S = H^{2} e = H e = l e R H S = l H e = l^{2} e

$LHS=H^2e=He=le\;\;\; RHS=lHe=l^2e$

⟹ l e = l^{2} e ⟹ l = 0 or 1

$\implies le=l^2e\implies l=0\text{ or }1$

(note that $e$ cannot be zero as it must satisfy $e^Te=1$ ) Now because $H$ is idepotent, $I_n-H$ also is, because

(I_{n} - H) (I_{n} - H) = I - I H - H I + H^{2} = I_{n} - H

$(I_n-H)(I_n-H)=I-IH-HI+H^2=I_n-H$

We also have the property that the sum of the eigenvalues equals the trace of the matrix, and

t r (I_{n} - H) = t r (I_{n}) - t r (H) = n - t r (X (X^{T} X)^{- 1} X^{T}) = n - t r ((X^{T} X)^{- 1} X^{T} X)

$tr(I_n-H)=tr(I_n)-tr(H)=n-tr(X(X^TX)^{-1}X^T)=n-tr((X^TX)^{-1}X^TX)$

= n - t r (I_{p + 1}) = n - p - 1

$=n-tr(I_{p+1})=n-p-1$

Hence $I-H$ must have $n-p-1$ eigenvalues equal to $1$ and $p+1$ eigenvalues equal to $0$ .

Now we can use the spectral decomposition of $I-H=ADA^T$ where $D=\begin{pmatrix}I_{n-p-1} & 0_{[n-p-1]\times[p+1]}\\0_{[p+1]\times [n-p-1]} & 0_{[p+1]\times [p+1]}\end{pmatrix}$ and $A$ is orthogonal (because $I-H$ is symmetric) . A further property which is useful is that $HX=X$ . This helps narrow down the $A$ matrix

H X = X ⟹ (I - H) X = 0 ⟹ A D A^{T} X = 0 ⟹ D A^{T} X = 0

$HX=X\implies(I-H)X=0\implies ADA^TX=0\implies DA^TX=0$

⟹ (A^{T} X)_{i j} = 0 i = 1, \dots, n - p - 1 j = 1, \dots, p + 1

$\implies (A^TX)_{ij}=0\;\;\;i=1,\dots,n-p-1\;\;\; j=1,\dots,p+1$

and we get:

\frac{\sum_{i = 1} (Y - \hat{Y_{i}})^{2}}{σ^{2}} = \frac{Y^{T} A D A^{T} Y}{σ^{2}} = \frac{\sum_{i = 1}^{n - p - 1} (A^{T} Y)_{i}^{2}}{σ^{2}}

$\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}=\frac{Y^TADA^TY}{\sigma^2}=\frac{\sum_{i=1}^{n-p-1}(A^TY)_i^2}{\sigma^2}$

Now, under the model we have $Y\sim N(X\beta,\sigma^2I)$ and using standard normal theory we have $A^TY\sim N(A^TX\beta,\sigma^2A^TA)\sim N(A^TX\beta,\sigma^2I)$ showing that the components of $A^TY$ are independent. Now using the useful result, we have that $(A^TY)_i\sim N(0,\sigma^2)$ for $i=1,\dots,n-p-1$ . The chi-square distribution with $n-p-1$ degrees of freedom for the sum of squared errors follows immediately.

— probabilityislogic
fonte

Wow, Thank you very much. It really is magnificent! Matrix form really pays off! In summary, SSE/

σ^{2} = Y^{T} (I - H) Y

$\sigma^2 = Y^T(I-H)Y$ and

I - H

$I-H$ is idempotent. Idempotent matrices have eigenvalues either 0 or 1. So sum of eigenvalues is the number of eigenvalue 1. and

t r (I_{n} - H) = t r (I_{n}) - t r (H) = t r (I_{n}) - t r (X (X^{T} X)^{-} 1 X^{T}) = t r (I_{n}) - t r ((X^{T} X)^{-} 1 X^{T} X)

$tr(I_n-H)= tr(I_n)-tr(H)=tr(I_n)-tr(X(X^T X)^-1 X^T)=tr(I_n)-tr((X^T X)^-1 X^T X)$ since

t r (A B) = t r (B A)

$tr(AB)=tr(BA)$ , and

t r (I_{n} - H)

$tr(I_n-H)$ becomes n-p+1. and sum of eigenvalues of a matrix is sum of traces of the matrix! and

I - H

$I-H$ can be expressed as

A D A^{T}

$ADA^T$ . So the first

Y^{T} (I - H) Y

$Y^T(I-H)Y$ becomes

Y^{T} A D A^{T} Y

$Y^TADA^TY$ with D with only n-p-1 diagonal 1's.

— KH Kim

Great answer!! Just to present another approach, we can instead choose to define a transformed multivariate normal variable

v := A^{'} Y

$v := A'Y$ and it will still follow the same distribution

N (0, σ^{2} I)

$\mathcal{N}\left(0, \sigma^{2}I\right)$ if we use the affine property. Then the last fraction

\frac{Y^{'} A D A^{'} Y}{σ^{2}} = \frac{v^{'} D v}{σ^{2}} = \frac{v^{'} [\begin{matrix} I & 0 \\ 0 & 0 \end{matrix}] v}{σ^{2}} = \sum_{i = 1}^{tr D} {(\frac{v_{i}}{σ})}^{2}

$\frac{Y'ADA'Y}{\sigma^{2}} = \frac{v'Dv}{\sigma^{2}} = \frac{v'\begin{bmatrix} I & 0\\0 & 0\end{bmatrix}v}{\sigma^{2}}= \sum_{i=1}^{\operatorname{tr}D} \left(\frac{v_{i}}{\sigma}\right)^{2}$ .

— Daeyoung Lim