A correlação assume a estacionariedade dos dados?

A análise entre mercados é um método de modelar o comportamento do mercado por meio da busca de relacionamentos entre diferentes mercados. Muitas vezes, uma correlação é calculada entre dois mercados, como o S&P 500 e os títulos do Tesouro dos EUA em 30 anos. Esses cálculos geralmente são baseados em dados de preços, o que é óbvio para todos que não se encaixa na definição de séries temporais estacionárias.

Possíveis soluções à parte (usando retornos), o cálculo da correlação cujos dados são não estacionários é mesmo um cálculo estatístico válido?

Você diria que esse cálculo de correlação não é confiável ou é simplesmente um absurdo?

correlation stationarity

— Milktrader
fonte

o que você quer dizer com "cálculo estatístico válido", você deve dizer o cálculo estatístico (estimativa) válido de alguma coisa. Aqui o algo é muito importante. Correlação é um cálculo válido da relação linear entre dois conjuntos de dados. Não vejo por que você precisa de estacionariedade, você quis dizer correlação automática?

— 22611 robin girard

existe um novo site que pode ser mais adequado para sua pergunta: quant.stackexchange.com . Agora você está claramente confundindo cálculo com interpretação.

— Mvctas

@mpiktas, a comunidade quant está decidida a usar retornos x preços devido à estacionariedade dos retornos e à não estacionaridade dos preços. Estou pedindo aqui algo mais do que uma explicação intuitiva de por que isso deveria ser assim.

— Milktrader

@robin, há várias coisas que podem fazer você questionar uma análise estatística. O tamanho da amostra vem à mente, assim como coisas mais óbvias, como dados manipulados. A não estacionariedade dos dados põe em questão um cálculo de correlação?

— Milktrader

não o cálculo, talvez a interpretação, se a correlação não for alta. Se for alta, significa alta correlação (relação linear ou seja, alta), duas séries de tempo não stationnary dizer

pode ser, potencialmente, altamente correlacionados (por exemplo, quando

(X_{t})

$(X_t)$

(Y_{t})

$(Y_t)$

X_{t} = Y_{t}

$X_t=Y_t$

— robin Girard

Respostas:

A correlação mede a relação linear. No contexto informal, relacionamento significa algo estável. Quando calculamos a correlação da amostra para variáveis estacionárias e aumentamos o número de pontos de dados disponíveis, essa correlação da amostra tende a uma correlação verdadeira.

Pode-se demonstrar que, para preços, que geralmente são passeios aleatórios, a correlação da amostra tende a variável aleatória. Isso significa que, independentemente da quantidade de dados que temos, o resultado será sempre diferente.

Nota : tentei expressar intuição matemática sem a matemática. Do ponto de vista matemático, a explicação é muito clara: momentos de amostra de processos estacionários convergem em probabilidade para constantes. Momentos de amostra de passeios aleatórios convergem para integrais do movimento browniano, que são variáveis aleatórias. Como o relacionamento geralmente é expresso como um número e não como uma variável aleatória, torna-se evidente o motivo para não calcular a correlação para variáveis não estacionárias.

Atualização Como estamos interessados na correlação entre duas variáveis, assuma primeiro que elas provêm do processo estacionário . A estacionariedade implica que e não dependem de . Então correlação $Z_t=(X_t,Y_t)$ $EZ_t$ $cov(Z_t,Z_{t-h})$ $t$

c o r r (X_{t}, Y_{t}) = \frac{c o v (X_{t}, Y_{t})}{\sqrt{D X_{t} D Y_{t}}}

$corr(X_t,Y_t)=\frac{cov(X_t,Y_t)}{\sqrt{DX_tDY_t}}$

também não depende de , já que todas as quantidades na fórmula vêm da matriz , que não depende de . Portanto, o cálculo da correlação da amostra $t$ $cov(Z_t)$ $t$

faz sentido, uma vez que pode ter esperança razoável de que correlação amostra irá estimar. Acontece que essa esperança não é infundado, uma vez que para processo estacionário satisfazendo certas condições temos que

\hat{ρ} = \frac{\frac{1}{T} \sum_{t = 1}^{T} (X_{t} - \bar{X}) (Y_{t} - \bar{Y})}{\sqrt{\frac{1}{T^{2}} \sum_{t = 1}^{T} (X_{t} - \bar{X})^{2} \sum_{t = 1}^{T} (Y_{t} - \bar{Y})^{2}}}

$\hat{\rho}=\frac{\frac{1}{T}\sum_{t=1}^T(X_t-\bar{X})(Y_t-\bar{Y})}{\sqrt{\frac{1}{T^2}\sum_{t=1}^T(X_t-\bar{X})^2\sum_{t=1}^T(Y_t-\bar{Y})^2}}$

ρ = c o r r (X_{t}, Y_{t})

$\rho=corr(X_t,Y_t)$

\hat{ρ} \to ρ

$\hat{\rho}\to\rho$ , como

em probabilidade. Além disso

T \to \infty

$T\to\infty$

em distribuição, de modo que podemos testar as hipóteses sobre

\sqrt{T} (\hat{ρ} - ρ) \to N (0, σ_{ρ}^{2})

$\sqrt{T}(\hat{\rho}-\rho)\to N(0,\sigma_{\rho}^2)$

ρ

$\rho$ .

$Z_t$ $corr(X_t,Y_t)$ may depend on $t$ . So when we observe a sample of size $T$ we potentialy need to estimate $T$ different correlations $\rho_t$ . This is of course infeasible, so in best case scenario we only can estimate some functional of $\rho_t$ such as mean or variance. But the result may not have sensible interpretation.

Agora, vamos examinar o que acontece com a correlação da caminhada aleatória do processo não estacionário provavelmente mais estudada. Chamamos o processo uma caminhada aleatória se , onde é um processo estacionário. Por simplicidade, assuma que . Então $Z_t=(X_t,Y_t)$ $Z_t=\sum_{s=1}^t(U_t,V_t)$ $C_t=(U_t,V_t)$ $EC_t=0$

\begin{aligned} c o r r (X_{t} Y_{t}) = \frac{E X_{t} Y_{t}}{\sqrt{D X_{t} D Y_{t}}} = \frac{E \sum_{s = 1}^{t} U_{t} \sum_{s = 1}^{t} V_{t}}{\sqrt{D \sum_{s = 1}^{t} U_{t} D \sum_{s = 1}^{t} V_{t}}} \end{aligned}

$\begin{align} corr(X_tY_t)=\frac{EX_tY_t}{\sqrt{DX_tDY_t}}=\frac{E\sum_{s=1}^tU_t\sum_{s=1}^tV_t}{\sqrt{D\sum_{s=1}^tU_tD\sum_{s=1}^tV_t}} \end{align}$

$C_t=(U_t,V_t)$ is a white noise. This means that all correlations $E(C_tC_{t+h})$ are zero for $h>0$ . Note that this does not restrict $corr(U_t,V_t)$ to zero.

Then

\begin{aligned} c o r r (X_{t}, Y_{t}) = \frac{t E U_{t} V_{t}}{\sqrt{t^{2} D U_{t} D V_{t}}} = c o r r (U_{0}, V_{0}) . \end{aligned}

$\begin{align} corr(X_t,Y_t)=\frac{tEU_tV_t}{\sqrt{t^2DU_tDV_t}}=corr(U_0,V_0). \end{align}$

So far so good, though the process is not stationary, correlation makes sense, although we had to make same restrictive assumptions.

Now to see what happens to sample correlation we will need to use the following fact about random walks, called functional central limit theorem:

\begin{aligned} \frac{1}{\sqrt{T}} Z_{[T s]} = \frac{1}{\sqrt{T}} \sum_{t = 1}^{[T s]} C_{t} \to (c o v (C_{0}))^{- 1 / 2} W_{s}, \end{aligned}

$\begin{align} \frac{1}{\sqrt{T}}Z_{[Ts]}=\frac{1}{\sqrt{T}}\sum_{t=1}^{[Ts]}C_t\to (cov(C_0))^{-1/2}W_s, \end{align}$ in distribution, where

s \in [0, 1]

$s\in[0,1]$ and

W_{s} = (W_{1 s}, W_{2 s})

$W_s=(W_{1s},W_{2s})$ is bivariate Brownian motion (two-dimensional Wiener process). For convenience introduce definition

M_{s} = (M_{1 s}, M_{2 s}) = (c o v (C_{0}))^{- 1 / 2} W_{s}

$M_s=(M_{1s},M_{2s})=(cov(C_0))^{-1/2}W_s$ .

Again for simplicity let us define sample correlation as

\begin{aligned} \hat{ρ} = \frac{\frac{1}{T} \sum_{t = 1}^{T} X_{t} Y_{t}}{\sqrt{\frac{1}{T} \sum_{t = 1}^{T} X_{t}^{2} \frac{1}{T} \sum_{t = 1}^{T} Y_{t}^{2}}} \end{aligned}

$\begin{align} \hat{\rho}=\frac{\frac{1}{T}\sum_{t=1}^TX_{t}Y_t}{\sqrt{\frac{1}{T}\sum_{t=1}^TX_t^2\frac{1}{T}\sum_{t=1}^TY_t^2}} \end{align}$

Let us start with the variances. We have

\begin{aligned} E \frac{1}{T} \sum_{t = 1}^{T} X_{t}^{2} = \frac{1}{T} E \sum_{t = 1}^{T} {(\sum_{s = 1}^{t} U_{t})}^{2} = \frac{1}{T} \sum_{t = 1}^{T} t σ_{U}^{2} = σ_{U} \frac{T + 1}{2} . \end{aligned}

$\begin{align} E\frac{1}{T}\sum_{t=1}^TX_t^2=\frac{1}{T}E\sum_{t=1}^T\left(\sum_{s=1}^tU_t\right)^2=\frac{1}{T}\sum_{t=1}^Tt\sigma_U^2=\sigma_U\frac{T+1}{2}. \end{align}$

This goes to infinity as $T$ increases, so we hit the first problem, sample variance does not converge. On the other hand continuous mapping theorem in conjunction with functional central limit theorem gives us

\begin{aligned} \frac{1}{T^{2}} \sum_{t = 1}^{T} X_{t}^{2} = \sum_{t = 1}^{T} \frac{1}{T} {(\frac{1}{\sqrt{T}} \sum_{s = 1}^{t} U_{t})}^{2} \to \int_{0}^{1} M_{1 s}^{2} d s \end{aligned}

$\begin{align} \frac{1}{T^2}\sum_{t=1}^TX_t^2=\sum_{t=1}^T\frac{1}{T}\left(\frac{1}{\sqrt{T}}\sum_{s=1}^tU_t\right)^2\to \int_0^1M_{1s}^2ds \end{align}$ where convergence is convergence in distribution, as

T \to \infty

$T\to \infty$ .

Similarly we get

\begin{aligned} \frac{1}{T^{2}} \sum_{t = 1}^{T} Y_{t}^{2} \to \int_{0}^{1} M_{2 s}^{2} d s \end{aligned}

$\begin{align} \frac{1}{T^2}\sum_{t=1}^TY_t^2\to \int_0^1M_{2s}^2ds \end{align}$ and

\begin{aligned} \frac{1}{T^{2}} \sum_{t = 1}^{T} X_{t} Y_{t} \to \int_{0}^{1} M_{1 s} M_{2 s} d s \end{aligned}

$\begin{align} \frac{1}{T^2}\sum_{t=1}^TX_tY_t\to \int_0^1M_{1s}M_{2s}ds \end{align}$

So finally for sample correlation of our random walk we get

\begin{aligned} \hat{ρ} \to \frac{\int_{0}^{1} M_{1 s} M_{2 s} d s}{\sqrt{\int_{0}^{1} M_{1 s}^{2} d s \int_{0}^{1} M_{2 s}^{2} d s}} \end{aligned}

$\begin{align} \hat{\rho}\to \frac{\int_0^1M_{1s}M_{2s}ds}{\sqrt{\int_0^1M_{1s}^2ds\int_0^1M_{2s}^2ds}} \end{align}$ in distribution as

T \to \infty

$T\to \infty$ .

So although correlation is well defined, sample correlation does not converge towards it, as in stationary process case. Instead it converges to a certain random variable.

— mpiktas
fonte

The mathematical point of view explanation is what I was looking for. It gives me something to contemplate and explore further. Thanks.

— Milktrader

This response seems to sidestep the original question: Aren't you just saying that yes, calculating correlation makes sense for stationary processes?

— whuber

@whuber, I was answering the question having in mind the comment, but I reread the question again and as far as I understand the OP asks about calculation of correlation for non-stationary data. Calculation of correlation for stationary processes makes sense, all the macroeconometric analysis (VAR, VECM) relies on that.

— mpiktas

I'll try to clarify my question with a response.

— whuber

@whuber my take away from the answer is that a correlation based on non-stationary data yields a random variable, which may or may not be useful. Correlation based on stationary data converges to a constant. This may explain why traders are attracted to "x-day rolling correlation" because the correlated behavior is fleeting and spurious. Whether "x-day rolling correlation" is valid or useful is for another question.

— Milktrader

...is the computation of correlation whose data is non-stationary even a valid statistical calculation?

Let $W$ be a discrete random walk. Pick a positive number $h$ . Define the processes $P$ and $V$ by $P(0) = 1$ , $P(t+1) = -P(t)$ if $V(t) > h$ , and otherwise $P(t+1) = P(t)$ ; and $V(t) = P(t)W(t)$ . In other words, $V$ starts out identical to $W$ but every time $V$ rises above $h$ , it switches signs (otherwise emulating $W$ in all respects).

enter image description here

(In this figure (for $h=5$ ) $W$ is blue and $V$ is red. There are four switches in sign.)

In effect, over short periods of time $V$ tends to be either perfectly correlated with $W$ or perfectly anticorrelated with it; however, using a correlation function to describe the relationship between $V$ and $W$ wouldn't be useful (a word that perhaps more aptly captures the problem than "unreliable" or "nonsense").

Mathematica code to produce the figure:

With[{h=5},
pv[{p_, v_}, w_] := With[{q=If[v > h, -p, p]}, {q, q w}];
w = Accumulate[RandomInteger[{-1,1}, 25 h^2]];
{p,v} = FoldList[pv, {1,0}, w] // Transpose;
ListPlot[{w,v}, Joined->True]]

— whuber
fonte

it is good that your answer points that out but I wouldn't say the process are correlated, I would say they are dependent. This is the point. Calculation of correlation is valide and here it will say "no correlation" and we all know this does not mean "no dependence".

— robin girard

@robin That's a good point, but I constructed this example specifically so that for potentially long periods of time these two processes are perfectly correlated. The issue is not one of dependence versus correlation but inherently is related to a subtler phenomenon: that the relationship between the processes changes at random periods. That, in a nutshell, is exactly what can happen in real markets (or at least we ought to worry that it can happen!).

— whuber

@whubert yes, and this is a very good example showing that there are processes that have very high correlation for potentially long periods of time and still are not correlated at all (but highly dependent) when regarding the larger temporal scale.

— robin girard

@robin girard, I think the key here is that for non-stationary processes the theoretical correlation varies with time, when for the stationary processes theoretical correlation stays the same. So with sample correlation which basically is one number, it is impossible to capture the variation of true correlations in case of non-stationary processes.

— mpiktas