Premissas de distribuição residual de regressão

12

Por que é necessário colocar a premissa distributiva nos erros, ou seja,

$y_i = X\beta + \epsilon_{i}$ , com $\epsilon_{i} \sim \mathcal{N}(0,\sigma^{2})$ .

Por que não escrever

$y_i = X\beta + \epsilon_{i}$ , com $y_i \sim \mathcal{N}(X\hat{\beta},\sigma^{2})$ ,

onde em qualquer dos casos $\epsilon_i = y_i - \hat{y}$ .
Eu vi enfatizar que as premissas distributivas são colocadas nos erros, não nos dados, mas sem explicação.

Não estou realmente entendendo a diferença entre essas duas formulações. Em alguns lugares, vejo suposições distributivas sendo colocadas nos dados (parece bayesiano, na maioria das vezes), mas na maioria das vezes as suposições são colocadas nos erros.

Ao modelar, por que um / deveria escolher começar com suposições sobre um ou outro?

— bill_e
fonte

Primeiro, não é "necessário", depende do que você pretende fazer. Existem boas respostas, mas acho que o ponto crucial é a suposição subjacente de causalidade, no sentido de os Xs "causarem" oy, e se você olhar dessa maneira, verá que a distribuição de y é "causada" por a distribuição dos rhs, ou seja, os Xs e os erros (se houver). Você pode fazer muita econometria com premissas distributivas muito limitadas e, em particular, sem normalidade. Graças a Deus.

— precisa saber é o seguinte

3

\hat{y}

$\hat y$ não é

X β

$X\beta$ , e a média da população do

y

$y$ 's não é a mesma que a estimativa de amostra do mesmo. O que significa dizer que a segunda coisa não é realmente a mesma coisa que o primeiro, mas se você substituí-lo com a sua expectativa (

E (\hat{y}) = E (y) = X β

$E(\hat y) = E(y) = X\beta$ ), os dois seriam equivalentes.

— Glen_b -Reinstala Monica

O que é

? E se

varia com

, por que

não varia? Por favor, decida qual notação você deseja usar, o vetor ou matriz. Agora, se assumirmos que

sua notação é mais do que bizarra:

, ou seja, você define a distribuição de

\hat{y}

$\hat{y}$

y_{i}

$y_i$

i

$i$

X β

$X\beta$

\hat{y} = X \hat{β}

$\hat{y}=X\hat\beta$

y_{i} \sim N (x_{i}^{'} (\sum x_{j} x_{j}^{'})^{- 1} \sum x_{j} y_{j}, σ^{2})

$y_i\sim N(x_i'(\sum x_jx_j')^{-1}\sum x_jy_j,\sigma^2)$

em termos de si mesma e todas as outras observações

y_{i}

$y_i$

!

y_{j}

$y_j$

— Mvctas

1

Eu diminuí a votação da pergunta porque acho que a notação é confusa e isso já resultou em várias respostas sutilmente conflitantes.

— Mvctas

9

Em uma configuração de regressão linear, é comum fazer análises e derivar resultados condicionais em , isto é, condicionais nos "dados". Portanto, o que você precisa é que seja normal, ou seja, você precisa ser normal. Como o exemplo de Peter Flom ilustra, pode-se ter uma normalidade de sem ter a normalidade de e, portanto, como o que você precisa é de normalidade de , essa é a suposição sensata. $X$ $y\mid X$ $\epsilon$ $\epsilon$ $y$ $\epsilon$

— ekvall
fonte

8

Eu escreveria a segunda definição como

$y_i \sim \mathcal{N}(X_i\beta, \sigma^2)$

ou (como Karl Oskar sugere +1)

$y_i|X_i \sim \mathcal{N}(X_i\beta, \sigma^2)$

isto é, a suposição de modelagem é que a variável resposta é normalmente distribuída em torno da linha de regressão (que é uma estimativa da média condicional), com variação constante . Esta não é a mesma coisa que sugere que são normalmente distribuídos, porque a média da distribuição depende de . $\sigma^2$ $y_i$ $X_i$

Eu acho que vi formulações semelhantes a isso na literatura de aprendizado de máquina; tanto quanto eu posso ver isso é equivalente à primeira definição, tudo o que tenho feito é rexpress a segunda formulação um pouco diferente para eliminar a 's e s. $\epsilon_i$ $\hat{y}$

— Dikran Marsupial
fonte

3

The difference is easiest to illustrate with an example. Here's a simple one:

Suponha que Y seja bimodal, com a modalidade explicada por uma variável independente. Por exemplo, suponha que Y seja a altura e sua amostra (por qualquer motivo) consiste em jóqueis e jogadores de basquete. por exemplo, emR

set.seed(123)
tall <- rnorm(100, 78, 3)
short <- rnorm(100, 60, 3)

height <- c(tall, short)
sport <- c(rep("B", 100), rep("H",100))

plot(density(height))

m1 <- lm(height~sport)
plot(m1)

a primeira densidade é muito fora do normal. Mas os resíduos do modelo são extremamente próximos do normal.

Quanto ao motivo pelo qual as restrições são colocadas dessa maneira - deixarei que outra pessoa responda a essa pergunta.

— Peter Flom - Restabelece Monica
fonte

1

Obrigado! Entendo o que você quer dizer com distribuição bimodal. Pergunta de acompanhamento: E se as variações dos dados forem diferentes (heterocedasticidade?) Digamos ... todos os jóqueis são pequenos, mas as alturas dos jogadores de basquete variam muito. Talvez para eles, alto <- rnorm (100,78,10). Como uma situação como essa altera suas suposições sobre

ou

?

y_{i}

$y_i$

ϵ_{i}

$\epsilon_i$

— bill_e

In that case, heteroscedasticity would be a problem and you would need to use some other form of regression, or possibly some transformation, or you could add another variable (in this silly example, position played in basketball might do it).

— Peter Flom - Reinstate Monica

I'm not sure the formulation is intended to suggest that the ys are normally distributed, just that they have a normal conditional distribution.

— Dikran Marsupial

2

y_{i} \sim N ({\hat{y}}_{i}, σ_{ε}^{2})

$y_i\sim\mathcal N(\hat y_i,\sigma^2_\varepsilon)$

\hat{y}

$\hat y$

x_{i}

$\bf x_i$

$\hat y_i$ $\bf x_i\boldsymbol{\hat\beta}$ . This leads to the formulation @DikranMarsupial presents:

y_{i} \sim N (x_{i} \hat{β}, σ_{ε}^{2})

$y_i\sim\mathcal N({\bf x_i}\boldsymbol{\hat\beta},\sigma^2_\varepsilon)$ It is worth recognizing that this is exactly the same as your first formulation, because both stipulate normal distributions and the expected values are equal. That is:

\begin{aligned} E [x_{i} \hat{β}] & = E [x_{i} \hat{β} + E [N (0, σ_{ε}^{2})]] \\ = E [x_{i} \hat{β} + 0] \\ = E [x_{i} \hat{β}] \end{aligned}

$\begin{align} E[{\bf x_i}\boldsymbol{\hat\beta}] &= E[{\bf x_i}\boldsymbol{\hat\beta} + E[\mathcal N(0, \sigma^2_\varepsilon)]] \\ &= E[{\bf x_i}\boldsymbol{\hat\beta} + 0] \\ &= E[{\bf x_i}\boldsymbol{\hat\beta}] \end{align}$ (And obviously the variances are equal.) In other words, this is not a difference in assumptions, but simply a notational difference.

So the question becomes, is there a reason to prefer presenting the idea using the first formulation?

I think the answer is yes for two reasons:

People often confuse whether the raw data should be normally distributed (i.e., $Y$ ), or if the data conditional on $\bf X$ / the errors should be normally distributed (i.e., $Y|\bf X$ / $\varepsilon$ ), for example, see: What if residuals are normally distributed, but y is not?
People also often confuse what is supposed to be independent, the raw data or the errors. Moreover, we often mention the fact that something should be iid (independent and identically distributed); if you are thinking in terms of $Y|\bf X$ this can be another potential source of confusion, as $Y|\bf X$ can be independent, but cannot be identically distributed unless the null hypothesis holds (because the mean would vary).

I believe these confustions are more likely using the second formulation than the first.

— gung - Reinstate Monica
fonte

1

@Glen_b, I don't follow your comment. My claim is not that

\hat{y}

$\hat y$ is equal to

X β

$X\beta$ , but rather that

{\hat{y}}_{i}

$\hat y_i$ is equal to

x_{i} \hat{β}

$\bf x_i\boldsymbol{\hat\beta}$ . The subscripted

_{i}

$_i$ indexing the observations is relevant. The idea is that the predicted value,

{\hat{y}}_{i}

$\hat y_i$ , for a given observation is

x_{i} \hat{β}

$\bf x_i\boldsymbol{\hat\beta}$ . This has nothing to do w/ the population mean of

Y

$Y$ . (It appears that I had forgotten to add hats to my betas, though; I've corrected that now.)

— gung - Reinstate Monica

@Glen_b if it were the sample mean it would be

\bar{y}

$\bar{y}$ rather than

\hat{y}

$\hat{y}$ . I initially found the notation confusing as well, but the fact that

\hat{y} = X β

$\hat{y} = X\beta$ follows from the statements that

y_{i} = X β + ϵ_{i}

$y_i = X\beta + \epsilon_i$ and

ϵ_{i} = y_{i} - \hat{y}

$\epsilon_i = y_i - \hat{y}$ . For both of these things to be true,

\hat{y}

$\hat{y}$ can only be

X β

$X\beta$ .

— Dikran Marsupial