Prove a relação entre a distância e a alavancagem de Mahalanobis?

Eu já vi fórmulas na Wikipedia. que relacionam a distância e a alavancagem de Mahalanobis:

A distância de Mahalanobis está intimamente relacionada à estatística de alavancagem, $h$ , mas tem uma escala diferente:
$D^{2} = (N - 1) (h - \frac{1}{N}) .$ $D^2 = (N - 1)(h - \tfrac{1}{N}).$

Em um artigo vinculado , a Wikipedia descreve $h$ nestes termos:

No modelo de regressão linear, a pontuação de alavancagem para a $i^{th}$ unidade de dados é definida como:
$h_{i i} = (H)_{i i},$ $h_{ii}=(H)_{ii},$ o $i^{th}$ elemento diagonal da matriz do chapéu $H=X(X^{\top}X)^{-1}X^{\top}$ , onde denota a transposição da matriz. $^{\top}$

Não consigo encontrar uma prova em lugar nenhum. Tentei começar pelas definições, mas não posso fazer nenhum progresso. Alguém pode dar alguma dica?

— dave2d
fonte

Minha descrição da distância de Mahalanobis na parte inferior à parte superior da explicação da distância de Mahalanobis? inclui dois resultados principais:

Por definição, não muda quando os regressores são deslocados uniformemente.
O quadrado de Mahalanobis distância entre vectores de $x$ e $y$ é dada por
$D^{2} (x, y) = (x - y)^{'} Σ^{- 1} (x - y)$ $D^2(x,y) = (x-y)^\prime \Sigma^{-1}(x-y)$ onde $\Sigma$ é a covariância dos dados.

(1) nos permite assumir que as médias dos regressores são todas nulas. Resta calcular $h_i$ . No entanto, para que a afirmação seja verdadeira, precisamos adicionar mais uma suposição:

O modelo deve incluir uma interceptação.

Permitindo a este, que haja $k \ge 0$ regressores e $n$ de dados, escrevendo o valor do regressor $j$ para observação $i$ como $x_{ij}$ . Seja escrito o vetor da coluna desses $n$ valores para o regressor $j$ $\mathbf{x}_{,j}$ e o vetor de linha desses $k$ valores para a observação $i$ seja $\mathbf{x}_i$ . Então a matriz do modelo é

X = (\begin{matrix} 1 & x_{11} & \dots & x_{1 k} \\ 1 & x_{21} & \dots & x_{2 k} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 1 & x_{n 1} & \dots & x_{n k} \end{matrix})

$X = \pmatrix{ 1 &x_{11} &\cdots &x_{1k} \\ 1 &x_{21} &\cdots &x_{2k} \\ \vdots &\vdots &\vdots &\vdots \\ 1 &x_{n1} &\cdots &x_{nk}}$

e, por definição, a matriz hat é

H = X (X^{'} X)^{- 1} X^{'},

$H = X(X^\prime X)^{-1} X^\prime,$

de onde a entrada $i$ na diagonal é

\begin{matrix} (1) & h_{i} = h_{i i} = (1; x_{i}) (X^{'} X)^{- 1} (1; x_{i})^{'} . \end{matrix}

$h_i = h_{ii} = (1; \mathbf{x}_i) (X^\prime X)^{-1} (1; \mathbf{x}_i)^\prime.\tag{1}$

Não há nada a não ser descobrir a matriz central inversa - mas em virtude do primeiro resultado-chave, é fácil, especialmente quando a escrevemos na forma de matriz de bloco:

X^{'} X = n (\begin{matrix} 1 & 0^{'} \\ 0 & C \end{matrix})

$X^\prime X = n\pmatrix{1 & \mathbf{0}^\prime \\ \mathbf{0} & C}$

onde $\mathbf{0} = (0,0,\ldots,0)^\prime$ e

C_{j k} = \frac{1}{n} \sum_{i = 1}^{n} x_{i j} x_{i k} = \frac{n - 1}{n} Cov (x_{j}, x_{k}) = \frac{n - 1}{n} Σ_{j k} .

$C_{jk} = \frac{1}{n} \sum_{i=1}^n x_{ij} x_{ik} = \frac{n-1}{n}\operatorname{Cov}(\mathbf{x}_j, \mathbf{x}_k) = \frac{n-1}{n}\Sigma_{jk}.$

(Eu escrevi $\Sigma$ para a matriz de covariância de amostra dos regressores.) Como essa é a diagonal do bloco, seu inverso pode ser encontrado simplesmente invertendo os blocos:

(X^{'} X)^{- 1} = \frac{1}{n} (\begin{matrix} 1 & 0^{'} \\ 0 & C^{- 1} \end{matrix}) = (\begin{matrix} \frac{1}{n} & 0^{'} \\ 0 & \frac{1}{n - 1} Σ^{- 1} \end{matrix}) .

$(X^\prime X)^{-1} = \frac{1}{n}\pmatrix{1 & \mathbf{0}^\prime \\ \mathbf{0} & C^{-1}} = \pmatrix{\frac{1}{n} & \mathbf{0}^\prime \\ \mathbf{0} & \frac{1}{n-1}\Sigma^{-1}}.$

Da definição $(1)$ we obtain

\begin{aligned} h_{i} & = (1; x_{i}) (\begin{matrix} \frac{1}{n} & 0^{'} \\ 0 & \frac{1}{n - 1} Σ^{- 1} \end{matrix}) (1; x_{i})^{'} \\ = \frac{1}{n} + \frac{1}{n - 1} x_{i} Σ^{- 1} x_{i}^{'} \\ = \frac{1}{n} + \frac{1}{n - 1} D^{2} (x_{i}, 0) . \end{aligned}

$\eqalign{ h_i &= (1; \mathbf{x}_i) \pmatrix{\frac{1}{n} & \mathbf{0}^\prime \\ \mathbf{0} & \frac{1}{n-1}\Sigma^{-1}}(1; \mathbf{x}_i)^\prime \\ &=\frac{1}{n} + \frac{1}{n-1}\mathbf{x}_i \Sigma^{-1}\mathbf{x}_i^\prime \\ &=\frac{1}{n} + \frac{1}{n-1} D^2(\mathbf{x}_i, \mathbf{0}). }$

Solving for the squared Mahalanobis length $D_i^2 = D^2(\mathbf{x}_i, \mathbf{0})$ yields

D_{i}^{2} = (n - 1) (h_{i} - \frac{1}{n}),

$D_i^2 = (n-1)\left(h_i - \frac{1}{n}\right),$

QED.

Looking back, we may trace the additive term $1/n$ to the presence of an intercept, which introduced the column of ones into the model matrix $X$ . The multiplicative term $n-1$ appeared after assuming the Mahalanobis distance would be computed using the sample covariance estimate (which divides the sums of squares and products by $n-1$ ) rather than the covariance matrix of the data (which divides the sum of squares and products by $n$ ).

The chief value of this analysis is to impart a geometric interpretation to the leverage, which measures how much a unit change in the response at observation $i$ will change the fitted value at that observation: high-leverage observations are at large Mahalanobis distances from the centroid of the regressors, exactly as a mechanically efficient lever operates at a large distance from its fulcrum.

R code to show that the relation indeed holds:

x <- mtcars

# Compute Mahalanobis distances
h <- hat(x, intercept = TRUE); names(h) <- rownames(mtcars)
M <- mahalanobis(x, colMeans(x), cov(x))

# Compute D^2 of the question
n <- nrow(x); D2 <- (n-1)*(h - 1/n)

# Compare.
all.equal(M, D2)               # TRUE
print(signif(cbind(M, D2), 3))

— whuber
fonte

Excellent answer, very well rounded with rigor and intuition. Cheers!

— cgrudz

Thanks for the post @whuber ! For sanity check, here is R code to show that the relation indeed holds: x <- mtcars rownames(x) <- NULL colnames(x) <- NULL n <- nrow(x) h <- hat(x, T) mahalanobis(x, colMeans(x), cov(x)) (n-1)*(h - 1/n) all.equal(mahalanobis(x, colMeans(x), cov(x)), (n-1)*(h - 1/n))

— Tal Galili

@Tal I didn't think I needed a sanity check--but thank you for the code. :-) I have made modifications to clarify it and its output a little.

— whuber

@whuber, I wanted an example that shows how to make the equality works (making clear to me that I got the assumptions right). I've also extended the relevant Wiki entry: en.wikipedia.org/wiki/… (feel free to also expend on it there, as you see fit :) )

— Tal Galili