Qual é um exemplo de multicolinearidade perfeita?

12

Qual é um exemplo de colinearidade perfeita em termos da matriz de design $X$ ?

Gostaria um exemplo onde não pode ser calculado porque não pode ser invertida. $\hat \beta = (X'X)^{-1}X'Y$ $(X'X)$

— TsTeaTime
fonte

Analisei o post recomendado para Colinearity e achei que era suficiente entender, mas um exemplo simples de usar dados acrescentaria clareza.

— TsTeaTime

2

O que você quer dizer com "em termos de X e Y"? A colinearidade existe entre as variáveis X, Y não tem nada a ver com isso.

— gung - Restabelece Monica

1

Eu ajustei a questão para ser mais específico em relação ao

X

$X$

— TsTeaTime

1

Como a multicolinearidade de

mostra na singularidade de

você também pode ler esta pergunta: stats.stackexchange.com/q/70899/3277 .

X

$X$

X^{'} X

$X'X$

— precisa saber é

10

Aqui está um exemplo com 3 variáveis, $y$ , $x_1$ e $x_2$ , relacionadas pela equação

y = x_{1} + x_{2} + ε

$y = x_1 + x_2 + \varepsilon$

onde $\varepsilon \sim N(0,1)$

Os dados específicos são

         y x1 x2
1 4.520866  1  2
2 6.849811  2  4
3 6.539804  3  6

Portanto, é evidente que $x_2$ é um múltiplo de $x_1$ portanto, temos uma colinearidade perfeita.

Podemos escrever o modelo como

Y = X β + ε

$Y = X \beta + \varepsilon$

Onde:

Y = [\begin{matrix} 4.52 \\ 6,85 \\ 6,54 \end{matrix}]

$Y = \begin{bmatrix}4.52 \\6.85 \\6.54\end{bmatrix}$

X = [\begin{matrix} 1 & 1 & 2 \\ 1 & 2 & 4 \\ 1 & 3 & 6 \end{matrix}]

$X = \begin{bmatrix}1 & 1 & 2\\1 & 2 & 4 \\1 & 3 & 6\end{bmatrix}$

Então nós temos

X X^{'} = [\begin{matrix} 1 & 1 & 2 \\ 1 & 2 & 4 \\ 1 & 3 & 6 \end{matrix}] [\begin{matrix} 1 & 1 & 1 \\ 1 & 2 & 3 \\ 2 & 4 & 6 \end{matrix}] = [\begin{matrix} 6 & 11 & 16 \\ 11 & 21 & 31 \\ 16 & 31 & 46. \end{matrix}]

$XX' = \begin{bmatrix}1 & 1 & 2\\1 & 2 & 4 \\1 & 3 & 6\end{bmatrix} \begin{bmatrix}1 & 1 & 1\\1 & 2 & 3 \\2 & 4 & 6\end{bmatrix} = \begin{bmatrix}6 & 11 & 16\\11 & 21 & 31 \\16 & 31 & 46\end{bmatrix}$

Agora calculamos o determinante de $XX'$ :

det X X^{'} = 6 | \begin{matrix} 21 & 31 \\ 31 & 46. \end{matrix} | - 11 | \begin{matrix} 11 & 31 \\ 16 & 46. \end{matrix} | + 16 | \begin{matrix} 11 & 21 \\ 16 & 31 \end{matrix} | = 0 0

$\det XX' = 6\begin{vmatrix}21 & 31 \\31 & 46\end{vmatrix} - 11 \begin{vmatrix}11 & 31 \\16 & 46\end{vmatrix} + 16\begin{vmatrix}11 & 21 \\16 & 31\end{vmatrix}= 0$

Em R, podemos mostrar isso da seguinte maneira:

> x1 <- c(1,2,3)

criar x2, um múltiplo dex1

> x2 <- x1*2

crie y, uma combinação linear de x1, x2e alguma aleatoriedade

> y <- x1 + x2 + rnorm(3,0,1)

observe aquilo

> summary(m0 <- lm(y~x1+x2))

falha ao estimar um valor para o x2coeficiente:

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.9512     1.6457   2.401    0.251
x1            1.0095     0.7618   1.325    0.412
x2                NA         NA      NA       NA

Residual standard error: 0.02583 on 1 degrees of freedom
Multiple R-squared:      1,     Adjusted R-squared:  0.9999 
F-statistic: 2.981e+04 on 1 and 1 DF,  p-value: 0.003687

A matriz do modelo $X$ é:

> (X <- model.matrix(m0))

(Intercept) x1 x2
1           1  1  2
2           1  2  4
3           1  3  6

Então $XX'$ é

> (XXdash <- X %*% t(X))
   1  2  3
1  6 11 16
2 11 21 31
3 16 31 46

que não é invertível, como mostra a

> solve(XXdash)
Error in solve.default(XXdash) : 
  Lapack routine dgesv: system is exactly singular: U[3,3] = 0

Ou:

det (XXdash) [1] 0

— Robert Long
fonte

21

Aqui estão alguns cenários bastante comuns que produzem multicolinearidade perfeita, ou seja, situações nas quais as colunas da matriz de design são linearmente dependentes. Lembre-se da álgebra linear que isso significa que existe uma combinação linear de colunas da matriz de design (cujos coeficientes não são todos zero) que é igual a zero. Incluí alguns exemplos práticos para ajudar a explicar por que essa armadilha ocorre com tanta frequência - encontrei quase todas!

Uma variável é um múltiplo de outra , independentemente de haver um termo de interceptação: talvez porque você tenha gravado a mesma variável duas vezes usando unidades diferentes (por exemplo, "comprimento em centímetros" é precisamente 100 vezes maior que "comprimento em metros") ou porque você registrou uma variável uma vez como número bruto e uma vez como proporção ou porcentagem, quando o denominador é fixo (por exemplo, "área de placa de Petri colonizada" e "porcentagem de placa de Petri colonizada" serão múltiplos exatos uns dos outros se a área de cada placa de Petri é o mesmo). Temos colinearidade porque se onde e são variáveis (colunas da sua matriz de design) e $w_i = ax_i$ $w$ $x$ $a$ é uma constante escalar, $1(\vec w) - a(\vec x)$ é uma combinação linear de variáveis que é igual a zero.
Há um termo de intercepto e um difere variáveis de outro por uma constante : isto irá acontecer se você centralizar uma variável ( ) e incluem tanto a matéria- e centrado em sua regressão. É também acontecerá se as variáveis são medidas em diferentes sistemas de unidades que diferem por uma constante, por exemplo, se é "a temperatura em graus Kelvin" e como "temperatura em ° C", em seguida, . Se considerarmos o termo de interceptação como uma variável que é sempre (representada como uma coluna de uns, $w_i = x_i - \bar x$ $x$ $w$ $w$ $x$ $w_i = x_i + 273.15$ $1$ $\vec 1_n$ , na matriz de desenho), em seguida, tendo para alguma constante $w_i = x_i + k$ significa que é uma combinação linear de , e colunas da matriz de design que são iguais a zero. $k$ $1(\vec w) - 1(\vec x) - k(\vec 1_n)$ $w$ $x$ $1$
Há um termo intercepção e uma variável é dada por uma transformação afim de outro : ou seja, tem variáveis e , relacionadas por , onde e são constantes. Por exemplo, isso acontece se você padronizar uma variável como $w$ $x$ $w_i = ax_i + b$ $a$ $b$ $z_i = \frac{x_i - \bar x}{s_x}$ and include both raw $x$ and standardized $z$ variables in your regression. It also happens if you record $w$ as "temperature in °F" and $x$ as "temperature in °C", since those unit systems do not share a common zero but are related by $w_i = 1.8x_i + 32$ . Or in a business context, suppose there is fixed cost $b$ (e.g. covering delivery) for each order, as well as a cost $\$a$ per unit sold; then if $\$w_i$ is the cost of order $i$ and $x_i$ is the number of units ordered, we have $w_i = ax_i + b$ . The linear combination of interest is $1(\vec w) - a(\vec x) - b(\vec 1_n) = \vec 0$ . Note that if $a=1$ , then (3) includes (2) as a special case; if $b=0$ , then (3) includes (1) as a special case.
There is an intercept term and the sum of several variables is fixed (e.g. in the famous "dummy variable trap"): for example if you have "percentage of satisfied customers", "percentage of dissatisfied customers" and "percentage of customers neither satisfied nor dissatisfied" then these three variables will always (barring rounding error) sum to 100. One of these variables — or alternatively, the intercept term — needs to be dropped from the regression to prevent collinearity. The "dummy variable trap" occurs when you use indicator variables (more commonly but less usefully called "dummies") for every possible level of a categorical variable. For instance, suppose vases are produced in red, green or blue color schemes. If you recorded the categorical variable "color" by three indicator variables (red, green and blue would be binary variables, stored as 1 for "yes" and 0 for "no") then for each vase only one of the variables would be a one, and hence red + green + blue = 1. Since there is a vector of ones for the intercept term, the linear combination 1(red) + 1(green) + 1(blue) - 1(1) = 0. The usual remedy here is either to drop the intercept, or drop one of the indicators (e.g. leave out red) which becomes a baseline or reference level. In this case, the regression coefficient for green would indicate the change in the mean response associated with switching from a red vase to a green one, holding other explanatory variables constant.
large + medium + small = 11(large) + 1(medium) + 1(small) - 1(red) - 1(green) - 1(blue) = 0 $u, v, w, x$ $u_i + v_i = k_1$ $x_i + y_i = k_2$ then $k_2(\vec u) + k_2(\vec v) - k_1(\vec w) - k_1(\vec x) = \vec 0$ .
One variable is defined as a linear combination of several other variables: for instance, if you record the length $l$ , width $w$ and perimeter $p$ of each rectangle, then $p_i = 2l_i + 2w_i$ so we have the linear combination $1(\vec p) - 2(\vec l) - 2(\vec w) = \vec 0$ . An example with an intercept term: suppose a mail-order business has two product lines, and we record that order $i$ consisted of $u_i$ of the first product at unit cost $\$a$ and $v_i$ of the second at unit cost $\$b$ , with fixed delivery charge $\$c$ . If we also include the order cost $\$x$ as an explanatory variable, then $x_i = a u_i + b v_i + c$ and so $1(\vec x) - a(\vec u) - b(\vec v) -c(\vec 1_n) = \vec 0$ . This is an obvious generalization of (3). It also gives us a different way of thinking about (4): once we know all bar one of the subset of variables whose sum is fixed, then the remaining one is their complement so can be expressed as a linear combination of them and their sum. If we know 50% of customers were satisfied and 20% were dissatisfied, then 100% - 50% - 20% = 30% must be neither satisfied nor dissatisfied; if we know the vase is not red (red=0) and it is green (green=1) then we know it is not blue (blue = 1(1) - 1(red) - 1(green) = 1 - 0 - 1 = 0).
One variable is constant and zero, regardless of whether there is an intercept term: in an observational study, a variable will be constant if your sample does not exhibit sufficient (any!) variation. There may be variation in the population that is not captured in your sample, e.g. if there is a very common modal value: perhaps your sample size is too small and was therefore unlikely to include any values that differed from the mode, or your measurements were insufficiently accurate to detect small variations from the mode. Alternatively, there may be theoretical reasons for the lack of variation, particularly if you are studying a sub-population. In a study of new-build properties in Los Angeles, it would not be surprising that every data point has AgeOfProperty = 0 and State = California! In an experimental study, you may have measured an independent variable that is under experimental control. Should one of your explanatory variables $x$ be both constant and zero, then we have immediately that the linear combination $1(\vec x)$ (with coefficient zero for any other variables) is $\vec 0$ .
There is an intercept term and at least one variable is constant: if $x$ is constant so that each $x_i = k \neq 0$ , then the linear combination $1(\vec x) - k(\vec 1_n) = \vec 0$ .
At least two variables are constant, regardless of whether there is an intercept term: if each $w_i = k_1 \neq 0$ and $x_i = k_2 \neq 0$ , then the linear combination $k_2(\vec w) - k_1(\vec x) = \vec 0$ .
Number of columns of design matrix, $k$ , exceeds number of rows, $n$ : even when there is no conceptual relationship between your variables, it is mathematically necessitated that the columns of your design matrix will be linearly dependent when $k > n$ . It simply isn't possible to have $k$ linearly independent vectors in a space with a number of dimensions lower than $k$ : for instance, while you can draw two independent vectors on a sheet of paper (a two-dimensional plane, $\mathbb R^2$ ) any further vector drawn on the page must lie within their span, and hence be a linear combination of them. Note that an intercept term contributes a column of ones to the design matrix, so counts as one of your $k$ columns. (This scenario is often called the "large $p$ , small $n$ " problem: see also this related CV question.)

Data examples with R code

Each example gives a design matrix $X$ , the matrix $X'X$ (note this is always square and symmetrical) and $\det (X'X)$ . Note that if $X'X$ is singular (zero determinant, hence not invertible) then we cannot estimate $\hat \beta = (X'X)^{-1}X'y$ . The condition that $X'X$ be non-singular is equivalent to the condition that $X$ has full rank so its columns are linearly independent: see this Math SE question, or this one and its converse.

(1) One column is multiple of another

# x2 = 2 * x1
# Note no intercept term (column of 1s) is needed
X <- matrix(c(2, 4, 1, 2, 3, 6, 2, 4), ncol = 2, byrow=TRUE)

X
#     [,1] [,2]
#[1,]    2    4
#[2,]    1    2
#[3,]    3    6
#[4,]    2    4


t(X) %*% X
#     [,1] [,2]
#[1,]   18   36
#[2,]   36   72

round(det(t(X) %*% X), digits = 9)
#0

(2) Intercept term and one variable differs from another by constant

# x1 represents intercept term
# x3 = x2 + 2
X <- matrix(c(1, 2, 4, 1, 1, 3, 1, 3, 5, 1, 0, 2), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    2    4
#[2,]    1    1    3
#[3,]    1    3    5
#[4,]    1    0    2


t(X) %*% X
#     [,1] [,2] [,3]
#[1,]    4    6   14
#[2,]    6   14   26
#[3,]   14   26   54

round(det(t(X) %*% X), digits = 9)
#0

# NB if we drop the intercept, cols now linearly independent
# x2 = x1 + 2 with no intercept column
X <- matrix(c(2, 4, 1, 3, 3, 5, 0, 2), ncol = 2, byrow=TRUE)

X
#     [,1] [,2]
#[1,]    2    4
#[2,]    1    3
#[3,]    3    5
#[4,]    0    2


t(X) %*% X
#     [,1] [,2]
#[1,]   14   26
#[2,]   26   54
# Can you see how this matrix is related to the previous one, and why?

round(det(t(X) %*% X), digits = 9)
#80
# Non-zero determinant so X'X is invertible

(3) Intercept term and one variable is affine transformation of another

# x1 represents intercept term
# x3 = 2*x2 - 3
X <- matrix(c(1, 2, 1, 1, 1, -1, 1, 3, 3, 1, 0, -3), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    2    1
#[2,]    1    1   -1
#[3,]    1    3    3
#[4,]    1    0   -3


t(X) %*% X
#     [,1] [,2] [,3]
#[1,]    4    6    0
#[2,]    6   14   10
#[3,]    0   10   20

round(det(t(X) %*% X), digits = 9)
#0

# NB if we drop the intercept, cols now linearly independent
# x2 = 2*x1 - 3 with no intercept column
X <- matrix(c(2, 1, 1, -1, 3, 3, 0, -3), ncol = 2, byrow=TRUE)

X
#     [,1] [,2]
#[1,]    2    1
#[2,]    1   -1
#[3,]    3    3
#[4,]    0   -3


t(X) %*% X
#     [,1] [,2]
#[1,]   14   10
#[2,]   10   20
# Can you see how this matrix is related to the previous one, and why?

round(det(t(X) %*% X), digits = 9)
#180
# Non-zero determinant so X'X is invertible

(4) Intercept term and sum of several variables is fixed

# x1 represents intercept term
# x2 + x3 = 10
X <- matrix(c(1, 2, 8, 1, 1, 9, 1, 3, 7, 1, 0, 10), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    2    8
#[2,]    1    1    9
#[3,]    1    3    7
#[4,]    1    0   10


t(X) %*% X
#     [,1] [,2] [,3]
#[1,]    4    6   34
#[2,]    6   14   46
#[3,]   34   46  294

round(det(t(X) %*% X), digits = 9)
#0

# NB if we drop the intercept, then columns now linearly independent
# x1 + x2 = 10 with no intercept column
X <- matrix(c(2, 8, 1, 9, 3, 7, 0, 10), ncol = 2, byrow=TRUE)

X
#     [,1] [,2]
#[1,]    2    8
#[2,]    1    9
#[3,]    3    7
#[4,]    0   10

t(X) %*% X
#     [,1] [,2]
#[1,]   14   46
#[2,]   46  294
# Can you see how this matrix is related to the previous one, and why?

round(det(t(X) %*% X), digits = 9)
#2000
# Non-zero determinant so X'X is invertible

(4a) Intercept term with dummy variable trap

# x1 represents intercept term
# x2 + x3 + x4 = 1
X <- matrix(c(1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0), ncol = 4, byrow=TRUE)

X
#     [,1] [,2] [,3] [,4]
#[1,]    1    0    0    1
#[2,]    1    1    0    0
#[3,]    1    0    1    0
#[4,]    1    1    0    0
#[5,]    1    0    1    0

t(X) %*% X
#     [,1] [,2] [,3] [,4]
#[1,]    5    2    2    1
#[2,]    2    2    0    0
#[3,]    2    0    2    0
#[4,]    1    0    0    1
# This matrix has a very natural interpretation - can you work it out?

round(det(t(X) %*% X), digits = 9)
#0

# NB if we drop the intercept, then columns now linearly independent
# x1 + x2 + x3 = 1 with no intercept column
X <- matrix(c(0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0), ncol = 3, byrow=TRUE)  

X
#     [,1] [,2] [,3]
#[1,]    0    0    1
#[2,]    1    0    0
#[3,]    0    1    0
#[4,]    1    0    0
#[5,]    0    1    0

t(X) %*% X
#     [,1] [,2] [,3]
#[1,]    2    0    0
#[2,]    0    2    0
#[3,]    0    0    1
# Can you see how this matrix is related to the previous one?

round(det(t(X) %*% X), digits = 9)
#4
# Non-zero determinant so X'X is invertible

(5) Two subsets of variables with fixed sum

# No intercept term needed
# x1 + x2 = 1
# x3 + x4 = 1
X <- matrix(c(0,1,0,1,1,0,0,1,0,1,1,0,1,0,0,1,1,0,1,0,0,1,1,0), ncol = 4, byrow=TRUE)

X
#     [,1] [,2] [,3] [,4]
#[1,]    0    1    0    1
#[2,]    1    0    0    1
#[3,]    0    1    1    0
#[4,]    1    0    0    1
#[5,]    1    0    1    0
#[6,]    0    1    1    0

t(X) %*% X
#     [,1] [,2] [,3] [,4]
#[1,]    3    0    1    2
#[2,]    0    3    2    1
#[3,]    1    2    3    0
#[4,]    2    1    0    3
# This matrix has a very natural interpretation - can you work it out?

round(det(t(X) %*% X), digits = 9)
#0

(6) One variable is linear combination of others

# No intercept term
# x3 = x1 + 2*x2
X <- matrix(c(1,1,3,0,2,4,2,1,4,3,1,5,1,2,5), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    1    3
#[2,]    0    2    4
#[3,]    2    1    4
#[4,]    3    1    5
#[5,]    1    2    5

t(X) %*% X
#     [,1] [,2] [,3]
#[1,]   15    8   31
#[2,]    8   11   30
#[3,]   31   30   91

round(det(t(X) %*% X), digits = 9)
#0

(7) One variable is constant and zero

# No intercept term
# x3 = 0
X <- matrix(c(1,1,0,0,2,0,2,1,0,3,1,0,1,2,0), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    1    0
#[2,]    0    2    0
#[3,]    2    1    0
#[4,]    3    1    0
#[5,]    1    2    0

t(X) %*% X
#     [,1] [,2] [,3]
#[1,]   15    8    0
#[2,]    8   11    0
#[3,]    0    0    0

round(det(t(X) %*% X), digits = 9)
#0

(8) Intercept term and one constant variable

# x1 is intercept term, x3 = 5
X <- matrix(c(1,1,5,1,2,5,1,1,5,1,1,5,1,2,5), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    1    5
#[2,]    1    2    5
#[3,]    1    1    5
#[4,]    1    1    5
#[5,]    1    2    5

t(X) %*% X
#     [,1] [,2] [,3]
#[1,]    5    7   25
#[2,]    7   11   35
#[3,]   25   35  125

round(det(t(X) %*% X), digits = 9)
#0

(9) Two constant variables

# No intercept term, x2 = 2, x3 = 5
X <- matrix(c(1,2,5,2,2,5,1,2,5,1,2,5,2,2,5), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    2    5
#[2,]    2    2    5
#[3,]    1    2    5
#[4,]    1    2    5
#[5,]    2    2    5

t(X) %*% X
#     [,1] [,2] [,3]
#[1,]   11   14   35
#[2,]   14   20   50
#[3,]   35   50  125

round(det(t(X) %*% X), digits = 9)
#0

(10) $k > n$

# Design matrix has 4 columns but only 3 rows
X <- matrix(c(1,1,1,1,1,2,4,8,1,3,9,27), ncol = 4, byrow=TRUE)

X
#     [,1] [,2] [,3] [,4]
#[1,]    1    1    1    1
#[2,]    1    2    4    8
#[3,]    1    3    9   27

t(X) %*% X
#     [,1] [,2] [,3] [,4]
#[1,]    3    6   14   36
#[2,]    6   14   36   98
#[3,]   14   36   98  276
#[4,]   36   98  276  794

round(det(t(X) %*% X), digits = 9)
#0

— Silverfish
fonte

4

Some trivial examples to help intuition:

is height in centimeters. is height in meters. Then:
- $\mathbf{x_1} = 100 \mathbf{x_2}$ , and your design matrix $X$ will not have linearly independent columns.
(i.e. you include a constant in your regression), is temperature in fahrenheit, and is temperature in celsius. Then:
- $\mathbf{x_2} = \frac{9}{5}\mathbf{x_3} + 32 \mathbf{x_1}$ , and your design matrix $X$ will not have linearly independent columns.
Everyone starts school at age 5, (i.e. constant value of 1 across all observations), is years of schooling, is age, and no one has left school. Then:
- $\mathbf{x_2} = \mathbf{x_3} - 5\mathbf{x_1}$ , and your design matrix $X$ will not have linearly independent columns.

There a multitude of ways such that one column of data will be a linear function of your other data. Some of them are obvious (eg. meters vs. centimeters) while others can be more subtle (eg. age and years of schooling for younger children).

Notational notes: Let $\mathbf{x_1}$ denote the first column of $X$ , $\mathbf{x_2}$ the second column etc..., and $\mathbf{1}$ denotes a vector of ones, which is what's included in the design matrix X if you include a constant in your regression.

— Matthew Gunn
fonte

1

O exemplo de escolaridade e idade é muito bom, embora valha a pena ressaltar que o relacionamento é válido apenas enquanto todos ainda estão na escola! A extensão lógica disso é quando você tem idade, anos de estudo e anos de trabalho, o que pode continuar o relacionamento além da graduação. (É claro que, na prática, essa multicolinearidade raramente tende a ser perfeita - sempre há exceções, como crianças que começaram a escola em uma idade diferente porque vieram de um país diferente - mas geralmente é bastante grave.)

— Silverfish

@Silverfish bons pontos! Acabei de fazer algumas edições / correções.

— Matthew Gunn