A teoria por trás do argumento de pesos em R ao usar lm ()

Depois de um ano na pós-graduação, meu entendimento de "mínimos quadrados ponderados" é o seguinte: seja , seja algumas matriz de design seja um vetor de parâmetro, seja um vetor de erro tal que , onde e . Em seguida, o modelo $\mathbf{y} \in \mathbb{R}^n$ $\mathbf{X}$ $n \times p$ $\boldsymbol\beta \in \mathbb{R}^p$ $\boldsymbol\epsilon \in \mathbb{R}^n$ $\boldsymbol\epsilon \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{V})$ $\mathbf{V} = \text{diag}(v_1, v_2, \dots, v_n)$ $\sigma^2 > 0$

y = X β + ϵ

$\mathbf{y} = \mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon$ sob as premissas é chamado de modelo "mínimos quadrados ponderados". O problema do WLS acaba sendo encontrar

\arg min_{β} {(y - X β)}^{T} V^{- 1} (y - X β) .

$\begin{equation} \arg\min_{\boldsymbol \beta}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)^{T}\mathbf{V}^{-1}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)\text{.} \end{equation}$ Suponha que

y = {[\begin{matrix} y_{1} & \dots & y_{n} \end{matrix}]}^{T}

$\mathbf{y} = \begin{bmatrix} y_1 & \dots & y_n\end{bmatrix}^{T}$ ,

β = {[\begin{matrix} β_{1} & \dots & β_{p} \end{matrix}]}^{T}

$\boldsymbol\beta = \begin{bmatrix} \beta_1 & \dots & \beta_p\end{bmatrix}^{T}$ e

X = [\begin{matrix} x_{11} & \dots & x_{1 p} \\ x_{21} & \dots & x_{2 p} \\ ⋮ & ⋮ & ⋮ \\ x_{n 1} & \dots & x_{n p} \end{matrix}] = [\begin{matrix} x_{1}^{T} \\ x_{2}^{T} \\ ⋮ \\ x_{n}^{T} \end{matrix}] .

$\mathbf{X} = \begin{bmatrix} x_{11} & \cdots & x_{1p} \\ x_{21} & \cdots & x_{2p} \\ \vdots & \vdots & \vdots \\ x_{n1} & \cdots & x_{np} \end{bmatrix} = \begin{bmatrix} \mathbf{x}_{1}^{T} \\ \mathbf{x}_{2}^{T} \\ \vdots \\ \mathbf{x}_{n}^{T} \end{bmatrix}\text{.}$

x_{i}^{T} β \in R^{1}

$\mathbf{x}_i^{T}\boldsymbol\beta\in \mathbb{R}^1$ , então

y - X β = [\begin{matrix} y_{1} - x_{1}^{T} β \\ y_{2} - x_{2}^{T} β \\ ⋮ \\ y_{n} - x_{n}^{T} β \end{matrix}] .

$\mathbf{y}-\mathbf{X}\boldsymbol\beta = \begin{bmatrix} y_1-\mathbf{x}_{1}^{T}\boldsymbol\beta \\ y_2-\mathbf{x}_{2}^{T}\boldsymbol\beta \\ \vdots \\ y_n-\mathbf{x}_{n}^{T}\boldsymbol\beta \end{bmatrix}\text{.}$ Isso fornece

\begin{aligned} (y - X β)^{T} V^{- 1} & = [\begin{matrix} y_{1} - x_{1}^{T} β & y_{2} - x_{2}^{T} β & \dots & y_{n} - x_{n}^{T} β \end{matrix}] diag (v_{1}^{- 1}, v_{2}^{- 1}, \dots, v_{n}^{- 1}) \\ = [\begin{matrix} v_{1}^{- 1} (y_{1} - x_{1}^{T} β) & v_{2}^{- 1} (y_{2} - x_{2}^{T} β) & \dots & v_{n}^{- 1} (y_{n} - x_{n}^{T} β) \end{matrix}] \end{aligned}

$\begin{align} (\mathbf{y}-\mathbf{X}\boldsymbol\beta)^{T}\mathbf{V}^{-1} &= \begin{bmatrix} y_1-\mathbf{x}_{1}^{T}\boldsymbol\beta &y_2-\mathbf{x}_{2}^{T}\boldsymbol\beta & \cdots & y_n-\mathbf{x}_{n}^{T}\boldsymbol\beta \end{bmatrix}\text{diag}(v_1^{-1}, v_2^{-1}, \dots, v_n^{-1}) \\ &= \begin{bmatrix} v_1^{-1}(y_1-\mathbf{x}_{1}^{T}\boldsymbol\beta) &v_2^{-1}(y_2-\mathbf{x}_{2}^{T}\boldsymbol\beta) & \cdots & v_n^{-1}(y_n-\mathbf{x}_{n}^{T}\boldsymbol\beta) \end{bmatrix} \end{align}$ v_n ^ {- 1} (y_n- \ mathbf {x} _ {n} ^ {T} \ boldsymbol \ beta) \ final {bmatrix} \ final {align} dando assim

\arg min_{β} {(y - X β)}^{T} V^{- 1} (y - X β) = \arg min_{β} \sum_{i = 1}^{n} v_{i}^{- 1} (y_{i} - x_{i}^{T} β)^{2} .

β

$\boldsymbol\beta$ é estimado usando

\hat{β} = (X^{T} V^{- 1} X)^{- 1} X^{T} V^{- 1} y .

$\hat{\boldsymbol\beta} = (\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{y}\text{.}$ Essa é a extensão do conhecimento que estou familiarizado. Nunca fui ensinado como

v_{1}, v_{2}, \dots, v_{n}

$v_1, v_2, \dots, v_n$ devem ser escolhidos, embora pareça que, a julgar por aqui , isso geralmente

Var (ϵ) = diag (σ_{1}^{2}, σ_{2}^{2}, \dots, σ_{n}^{2})

$\text{Var}(\boldsymbol\epsilon) = \text{diag}(\sigma^2_1, \sigma^2_2, \dots, \sigma^2_n)$ , o que faz sentido intuitivo. (Dê pesos altamente variáveis menos peso ao problema WLS e dê observações com menos variabilidade e mais peso.)

O que me interessa particularmente é como Rlida com pesos na lm()função quando pesos são designados como números inteiros. De usar ?lm:

Não NULLpesos podem ser usados para indicar que observações diferentes têm variações diferentes (com os valores em pesos sendo inversamente proporcionais às variações); ou equivalentemente, quando os elementos dos pesos são inteiros positivos , que cada resposta é a média de observações de peso unitário (incluindo o caso em que existem observações iguais a e os dados foram resumidos). $w_i$ $y_i$ $w_i$ $w_i$ $y_i$

Reli este parágrafo várias vezes e não faz sentido para mim. Usando a estrutura que desenvolvi acima, suponha que possua os seguintes valores simulados:

x <- c(0, 1, 2)
y <- c(0.25, 0.75, 0.85)
weights <- c(50, 85, 75)

lm(y~x, weights = weights)

Call:
lm(formula = y ~ x, weights = weights)

Coefficients:
(Intercept)            x  
     0.3495       0.2834

Usando a estrutura que desenvolvi acima, como esses parâmetros são derivados? Aqui está minha tentativa de fazer isso manualmente: assumindo que , temos e fazendo isso no give (observe que a invertibilidade não funciona neste caso, então eu usei uma inversa generalizada): $\mathbf{V} = \text{diag}(50, 85, 75)$

\begin{aligned} [\begin{matrix} {\hat{β}}_{0} \\ {\hat{β}}_{1} \end{matrix}] = \\ {([\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}] diag (1 / 50, 1 / 85, 1 / 75) {[\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}]}^{T})}^{- 1} {[\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}]}^{T} diag (1 / 50, 1 / 85, 1 / 75) [\begin{matrix} 0.25 \\ 0.75 \\ 0.85 \end{matrix}] \end{aligned}

$\begin{align}&\begin{bmatrix} \hat\beta_0 \\ \hat\beta_1 \end{bmatrix} = \\ &\left(\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}\text{diag}(1/50, 1/85, 1/75)\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}^{T} \right)^{-1}\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}^{T}\text{diag}(1/50, 1/85, 1/75)\begin{bmatrix} 0.25 \\ 0.75 \\ 0.85 \end{bmatrix} \end{align}$ R

X <- matrix(rep(1, times = 6), byrow = T, nrow = 3, ncol = 2)
V_inv <- diag(c(1/50, 1/85, 1/75))
y <- c(0.25, 0.75, 0.85)

library(MASS)
ginv(t(X) %*% V_inv %*% X) %*% t(X) %*% V_inv %*% y

         [,1]
[1,] 0.278913
[2,] 0.278913

Estes não correspondem aos valores da lm()saída. O que estou fazendo de errado?

r linear-model weighted-regression

— Clarinetist
fonte

A matriz deve ser não Além disso, você deve ser , não . $X$

[\begin{matrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{matrix}],

$\begin{bmatrix} 1 & 0\\ 1 & 1\\ 1 & 2 \end{bmatrix},$

[\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}] .

$\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}.$ V_invdiag(weights)diag(1/weights)

x <- c(0, 1, 2)
y <- c(0.25, 0.75, 0.85)
weights <- c(50, 85, 75)
X <- cbind(1, x)

> solve(t(X) %*% diag(weights) %*% X, t(X) %*% diag(weights) %*% y)
       [,1]
  0.3495122
x 0.2834146

— mark999
fonte

Obrigado por limpar a matriz de design incorreta, especialmente! Estou bastante enferrujado com esse material. Então, como uma última pergunta, isso significa que nas suposições do WLS?

Var (ϵ) = diag (1 / weights)

$\text{Var}(\boldsymbol\epsilon) = \text{diag}(1/\text{weights})$

— Clarinetist

Sim, embora os pesos tenham apenas que ser proporcionais a 1 / variância, não necessariamente iguais. Por exemplo, se você usar weights <- c(50, 85, 75)/2no seu exemplo, obtém o mesmo resultado.

— mark999

Para responder de maneira mais concisa, a regressão ponderada de mínimos quadrados usando weightsin Rfaz as seguintes suposições: suponha que sim weights = c(w_1, w_2, ..., w_n). Seja , seja uma matriz de design , seja um vetor de parâmetro e seja um vetor de erro com média e matriz de variação , onde . Então, Seguindo os mesmos passos da derivação na postagem original, temos $\mathbf{y} \in \mathbb{R}^n$ $\mathbf{X}$ $n \times p$ $\boldsymbol\beta\in\mathbb{R}^p$ $\boldsymbol\epsilon \in \mathbb{R}^n$ $\mathbf{0}$ $\sigma^2\mathbf{V}$ $\sigma^2 > 0$

V = diag (1 / w_{1}, 1 / w_{2}, \dots, 1 / w_{n}) .

$\mathbf{V} = \text{diag}(1/w_1, 1/w_2, \dots, 1/w_n)\text{.}$

\begin{aligned} \arg min_{β} {(y - X β)}^{T} V^{- 1} (y - X β) & = \arg min_{β} \sum_{i = 1}^{n} (1 / w_{i})^{- 1} (y_{i} - x_{i}^{T} β)^{2} \\ = \arg min_{β} \sum_{i = 1}^{n} w_{i} (y_{i} - x_{i}^{T} β)^{2} \end{aligned}

$\begin{align} \arg\min_{\boldsymbol \beta}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)^{T}\mathbf{V}^{-1}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)&= \arg\min_{\boldsymbol \beta}\sum_{i=1}^{n}(1/w_i)^{-1}(y_i-\mathbf{x}^{T}_i\boldsymbol\beta)^2 \\ &= \arg\min_{\boldsymbol \beta}\sum_{i=1}^{n}w_i(y_i-\mathbf{x}^{T}_i\boldsymbol\beta)^2 \end{align}$ e são estimados usando do GLS suposições .

β

$\boldsymbol\beta$

\hat{β} = (X^{T} V^{- 1} X)^{- 1} X^{T} V^{- 1} y

$\hat{\boldsymbol\beta} = (\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{y}$

— Clarinetist
fonte