Como derivar matriz de variância-covariância de coeficientes em regressão linear

36

Estou lendo um livro sobre regressão linear e tenho alguns problemas para entender a matriz de variância-covariância de $\mathbf{b}$ :

insira a descrição da imagem aqui

Os itens diagonais são fáceis, mas os fora da diagonal são um pouco mais difíceis, o que me intriga é que

σ (b_{0}, b_{1}) = E (b_{0} b_{1}) - E (b_{0}) E (b_{1}) = E (b_{0} b_{1}) - β_{0} β_{1}

$\sigma(b_0, b_1) = E(b_0 b_1) - E(b_0)E(b_1) = E(b_0 b_1) - \beta_0 \beta_1$

mas não há vestígios de e aqui. $\beta_0$ $\beta_1$

regression

— qed
fonte

3

Pergunta relacionada: stats.stackexchange.com/questions/44838/…

— ocram

2

Qual é o livro?

— Konstantinos

Neter et al., Modelos de regressão linear aplicada, 1983, página 216. Você pode encontrar o mesmo material em Modelos Estatísticos Lineares Aplicados, 5ª Edição, página 207.

— akavalar

53

Esta é realmente uma pergunta interessante que desafia seu entendimento básico de uma regressão.

Primeiro elimine qualquer confusão inicial sobre notação. Estamos olhando para a regressão:

y = b_{0} + b_{1} x + \hat{u}

$y=b_0+b_1x+\hat{u}$

em que $b_0$ e $b_1$ são os estimadores da verdadeira $\beta_0$ e $\beta_1$ , e são os resíduos da regressão. Observe que a regressão verdadeira e não servida subjacente é, portanto, denotada como: $\hat{u}$

y = β_{0} + β_{1} x + u

$y=\beta_0+\beta_1x+u$

Com a expectativa de $E[u]=0$ e variância $E[u^2]=\sigma^2$ . Alguns livros denotam $b$ como e nós adaptar esta convenção aqui. Também usamos a notação matricial, onde b é o vetor 2x1 que contém os estimadores de , ou seja, $\hat{\beta}$ $\beta=[\beta_0, \beta_1]'$ $b=[b_0, b_1]'$ . (Também por uma questão de clareza, trato X como fixado nos cálculos a seguir).

Agora a sua pergunta. Sua fórmula para a covariância é realmente correta, ou seja:

σ (b_{0}, b_{1}) = E (b_{0} b_{1}) - E (b_{0}) E (b_{1}) = E (b_{0} b_{1}) - β_{0} β_{1}

$\sigma(b_0, b_1) = E(b_0 b_1) - E(b_0)E(b_1) = E(b_0 b_1) - \beta_0 \beta_1$

Eu acho que você quer saber como é que temos os verdadeiros coeficientes não observados $\beta_0, \beta_1$ nesta fórmula? Eles são cancelados se dermos um passo adiante ao expandir a fórmula. Para ver isso, observe que a variação populacional do estimador é dada por:

V a r (\hat{β}) = σ^{2} (X^{'} X)^{- 1}

$Var(\hat\beta)=\sigma^2(X'X)^{-1}$

Essa matriz mantém as variações nos elementos diagonais e covariâncias nos elementos fora da diagonal.

To arrive to the above formula, let's generalize your claim by using matrix notation. Let us therefore denote variance with $Var[\cdot]$ and expectation with $E[\cdot]$ .

V a r [b] = E [b^{2}] - E [b] E [b^{'}]

$Var[b]=E[b^2]-E[b]E[b']$

Essencialmente, temos a fórmula geral de variação, usando apenas a notação matricial. A equação resolve quando substitui na expressão padrão o estimador $b=(X'X)^{-1}X'y$ . Suponha também que $E[b]=\beta$ seja um estimador imparcial. Portanto, obtemos:

E [((X^{'} X)^{- 1} X^{'} y)^{2}] - \underset{2 \times 2}{β^{2}}

$E[((X'X)^{-1}X'y)^2] - \underset{2 \times 2}{\beta^2}$

Note that we have on the right hand side $\beta^2$ - 2x2 matrix, namely $bb'$ , but you may at this point already guess what will happen with this term shortly.

Replacing $y$ with our expression for the true underlying data generating process above, we have:

\begin{aligned} E [((X^{'} X)^{- 1} X^{'} y)^{2}] - β^{2} & = E [((X^{'} X)^{- 1} X^{'} (X β + u))^{2}] - β^{2} \\ = E [(\underset{= I}{\underset{⏟}{(X^{'} X)^{- 1} X^{'} X}} β + (X^{'} X)^{- 1} X^{'} u)^{2}] - β^{2} \\ = E [(β + (X^{'} X)^{- 1} X^{'} u)^{2}] - β^{2} \\ = β^{2} + E [(X^{'} X)^{- 1} X^{'} u)^{2}] - β^{2} \end{aligned}

$\begin{align*} E\Big[\Big((X'X)^{-1}X'y\Big)^2\Big] - \beta^2 &= E\Big[\Big((X'X)^{-1}X'(X\beta+u)\Big)^2\Big]-\beta^2 \\ &= E\Big[\Big(\underbrace{(X'X)^{-1}X'X}_{=I}\beta+(X'X)^{-1}X'u\Big)^2\Big]-\beta^2 \\ &= E\Big[\Big(\beta+(X'X)^{-1}X'u\Big)^2\Big]-\beta^2 \\ &= \beta^2+E\Big[\Big(X'X)^{-1}X'u\Big)^2\Big]-\beta^2 \end{align*}$

since $E[u]=0$ . Furthermore, the quadratic $\beta^2$ term cancels out as anticipated.

Thus we have:

V a r [b] = ((X^{'} X)^{- 1} X^{'})^{2} E [u^{2}]

$Var[b]=((X'X)^{-1}X')^2E[u^2]$

By linearity of expectations. Note that by assumption $E[u^2]=\sigma^2$ and $((X'X)^{-1}X')^2=(X'X)^{-1}X'X(X'X)'^{-1}=(X'X)^{-1}$ since $X'X$ is a $K\times K$ symetric matrix and thus the same as its transpose. Finally we arrive at

V a r [b] = σ^{2} (X^{'} X)^{- 1}

$Var[b]=\sigma^2(X'X)^{-1}$

Now that we got rid of all $\beta$ terms. Intuitively, the variance of the estimator is independent of the value of true underlying coefficient, as this is not a random variable per se. The result is valid for all individual elements in the variance covariance matrix as shown in the book thus also valid for the off diagonal elements as well with $\beta_0\beta_1$ to cancel out respectively. The only problem was that you had applied the general formula for the variance which does not reflect this cancellation at first.

Ultimately, the variance of the coefficients reduces to $\sigma^2(X'X)^{-1}$ and independent of $\beta$ . But what does this mean? (I believe you asked also for a more general understanding of the general covariance matrix)

Look at the formula in the book. It simply asserts that the variance of the estimator increases for when the true underlying error term is more noisy ( $\sigma^2$ increases), but decreases for when the spread of X increases. Because having more observations spread around the true value, lets you in general build an estimator that is more accurate and thus closer to the true $\beta$ . On the other hand, the covariance terms on the off-diagonal become practically relevant in hypothesis testing of joint hypotheses such as $b_0=b_1=0$ . Other than that they are a bit of a fudge, really. Hope this clarifies all questions.

— Majte
fonte

and when keep the spread constant and decrease the x's, the standard error of the intercept becomes smaller, which makes sense.

— Theta30

I don't follow the expansion of the square. Why is not simplified to

((X^{'} X)^{- 1} X^{'})^{2} = ((X^{'} X)^{- 1} X^{'}) ((X^{'} X)^{- 1} X^{'}) = X^{- 2}

$((X'X)^{-1}X')^2 = ((X'X)^{-1}X')((X'X)^{-1}X') = X^{-2}$ ?

— David

2

In your case we have

X^{'} X = [\begin{matrix} n & \sum X_{i} \\ \sum X_{i} & \sum X_{i}^{2} \end{matrix}]

$X'X=\begin{bmatrix}n & \sum X_i\\\sum X_i & \sum X_i^2\end{bmatrix}$

Invert this matrix and you will get the desired result.

— mpiktas
fonte

1

It appears that $\beta_0 \beta_1$ are the predicted values (expected values). They make the switch between $E(b_0)=\beta_0$ and $E(b_1)=\beta_1$ .

— Drew75
fonte

β_{0}

$\beta_0$ and

β_{1}

$\beta_1$ are generally unknown, what can they switch to?

— qed

I think I understand the confusion, and I think they perhaps should have written

β_{0}^{*}

$\beta_0^*$ rather than

β_{0}

$\beta_0$ . Here's another post that goes through the calculation: link

— Drew75

2

@qed: to sample estimates of the unknown quantities.

— Glen_b -Reinstate Monica