A intuição sobre os sinais "mais" relacionados à variância (do fato de que mesmo quando calculamos a variação de uma diferença de variáveis aleatórias independentes, adicionamos suas variações) é correta, mas fatalmente incompleta: se as variáveis aleatórias envolvidas não forem independentes , as covariâncias também estão envolvidas e as covariâncias podem ser negativas. Existe uma expressão que é quase como a expressão em questão foi pensado que ele "deveria" ser pelo OP (e me), e é a variância da previsão de erro , denotam que e0=y0−y^0 , onde y0=β0+β1x0+u0 :
Var(e0)=σ2⋅(1+1n+(x0−x¯)2Sxx)
A diferença crítica entre a variação do erro de previsão e a variação do erro de estimativa (ou seja, do residual) é que o termo de erro da observação prevista não está correlacionado com o estimador , uma vez que o valor não foi utilizado na construção o estimador e o cálculo das estimativas, sendo um valor fora da amostra.y0
A álgebra para ambos prossegue exatamente da mesma maneira até um ponto (usando 0 em vez de i ), mas diverge. Especificamente:0i
No regressão linear simples , Var ( u i ) = σ 2 , a variância do estimador β = ( β 0 , β 1 ) ' ainda estáyi=β0+β1xi+uiVar(ui)=σ2β^=(β^0,β^1)′
Var(β^)=σ2(X′X)−1
Nós temos
X′X=[n∑xi∑xi∑x2i]
e entao
(X′X)−1=[∑x2i−∑xi−∑xin]⋅[n∑x2i−(∑xi)2]−1
Nós temos
[n∑x2i−(∑xi)2]=[n∑x2i−n2x¯2]=n[∑x2i−nx¯2]=n∑(x2i−x¯2)≡nSxx
So
(X′X)−1=[(1/n)∑x2i−x¯−x¯1]⋅(1/Sxx)
o que significa que
Var(β^0)=σ2(1n∑x2i)⋅ (1/Sxx)=σ2nSxx+nx¯2Sxx=σ2(1n+x¯2Sxx)
Var(β^1)=σ2(1/Sxx)
Cov(β^0,β^1)=−σ2(x¯/Sxx)
The i-th residual is defined as
u^i=yi−y^i=(β0−β^0)+(β1−β^1)xi+ui
The actual coefficients are treated as constants, the regressor is fixed (or conditional on it), and has zero covariance with the error term, but the estimators are correlated with the error term, because the estimators contain the dependent variable, and the dependent variable contains the error term. So we have
Var(u^i)=[Var(ui)+Var(β^0)+x2iVar(β^1)+2xiCov(β^0,β^1)]+2Cov([(β0−β^0)+(β1−β^1)xi],ui)
=[σ2+σ2(1n+x¯2Sxx)+x2iσ2(1/Sxx)+2Cov([(β0−β^0)+(β1−β^1)xi],ui)
Pack it up a bit to obtain
Var(u^i)=[σ2⋅(1+1n+(xi−x¯)2Sxx)]+2Cov([(β0−β^0)+(β1−β^1)xi],ui)
The term in the big parenthesis has exactly the same structure with the variance of the prediction error, with the only change being that instead of xi we will have x0 (and the variance will be that of e0 and not of u^i). The last covariance term is zero for the prediction error because y0 and hence u0 is not included in the estimators, but not zero for the estimation error because yi and hence ui is part of the sample and so it is included in the estimator. We have
2Cov([(β0−β^0)+(β1−β^1)xi],ui)=2E([(β0−β^0)+(β1−β^1)xi]ui)
=−2E(β^0ui)−2xiE(β^1ui)=−2E([y¯−β^1x¯]ui)−2xiE(β^1ui)
the last substitution from how β^0 is calculated. Continuing,
...=−2E(y¯ui)−2(xi−x¯)E(β^1ui)=−2σ2n−2(xi−x¯)E[∑(xi−x¯)(yi−y¯)Sxxui]
=−2σ2n−2(xi−x¯)Sxx[∑(xi−x¯)E(yiui−y¯ui)]
=−2σ2n−2(xi−x¯)Sxx[−σ2n∑j≠i(xj−x¯)+(xi−x¯)σ2(1−1n)]
=−2σ2n−2(xi−x¯)Sxx[−σ2n∑(xi−x¯)+(xi−x¯)σ2]
=−2σ2n−2(xi−x¯)Sxx[0+(xi−x¯)σ2]=−2σ2n−2σ2(xi−x¯)2Sxx
Inserting this into the expression for the variance of the residual, we obtain
Var(u^i)=σ2⋅(1−1n−(xi−x¯)2Sxx)
So hats off to the text the OP is using.
(I have skipped some algebraic manipulations, no wonder OLS algebra is taught less and less these days...)
SOME INTUITION
So it appears that what works "against" us (larger variance) when predicting, works "for us" (lower variance) when estimating. This is a good starting point for one to ponder why an excellent fit may be a bad sign for the prediction abilities of the model (however counter-intuitive this may sound...).
The fact that we are estimating the expected value of the regressor, decreases the variance by 1/n. Why? because by estimating, we "close our eyes" to some error-variability existing in the sample,since we essentially estimating an expected value. Moreover, the larger the deviation of an observation of a regressor from the regressor's sample mean, the smaller the variance of the residual associated with this observation will be... the more deviant the observation, the less deviant its residual... It is variability of the regressors that works for us, by "taking the place" of the unknown error-variability.
But that's good for estimation. For prediction, the same things turn against us: now, by not taking into account, however imperfectly, the variability in y0 (since we want to predict it), our imperfect estimators obtained from the sample show their weaknesses: we estimated the sample mean, we don't know the true expected value -the variance increases. We have an x0 that is far away from the sample mean as calculated from the other observations -too bad, our prediction error variance gets another boost, because the predicted y^0 will tend to go astray... in more scientific language "optimal predictors in the sense of reduced prediction error variance, represent a shrinkage towards the mean of the variable under prediction". We do not try to replicate the dependent variable's variability -we just try to stay "close to the average".