Como a distribuição gama inversa está relacionada a

Dado que a estimativa posterior de de uma probabilidade normal e uma gama inversa anterior a é: $\sigma'^{2}$ $\sigma^2$

σ^{' 2} \sim IG (α + \frac{n}{2}, β + \frac{\sum_{i = 1}^{n} (y_{i} - μ)^{2}}{2})

$\sigma'^{2}\sim\textrm{IG}\left(\alpha + \frac{n}{2}, \beta +\frac{\sum_{i=1}^n{(y_i-\mu)^2}}{2}\right)$

que é equivalente a

σ^{' 2} \sim IG (\frac{n}{2}, \frac{n σ^{2}}{2})

$\sigma'^{2}\sim\textrm{IG}\left( \frac{n}{2}, \frac{n\sigma^2}{2}\right)$

uma vez que um fraco anterior em remove e do eqn 1: $\textrm{IG}(\alpha, \beta)$ $\sigma^2$ $\alpha$ $\beta$

σ^{' 2} \sim IG (\frac{n}{2}, \frac{\sum_{i = 1}^{n} (y_{i} - μ)^{2}}{2})

$\sigma'^{2}\sim\textrm{IG}\left( \frac{n}{2}, \frac{\sum_{i=1}^n{(y_i-\mu)^2}}{2}\right)$

É aparente que a estimativa posterior de é uma função do tamanho da amostra e da soma dos quadrados da probabilidade. Mas o que isso significa? Há uma derivação na Wikipedia que eu não sigo exatamente. $\sigma^2$

Tenho as seguintes perguntas

Posso chegar a esta segunda equação sem invocar a regra de Bayes? Estou curioso para saber se há algo inerente aos parâmetros de um GI relacionado à média e variância, independentemente da probabilidade normal.
Posso usar o tamanho da amostra e o desvio padrão de um estudo anterior para estimar um anterior informado sobre e atualizar o anterior com novos dados? Isso parece simples, mas não consigo encontrar exemplos de como fazê-lo ou justificativa para que essa seja uma abordagem legítima - além do que pode ser visto no posterior. $\sigma^2$
Existe um livro popular de probabilidade ou estatística que eu possa consultar para obter mais explicações?

bayesian prior conjugate-prior

— Abe
fonte

Você não quer dizer uma probabilidade de gama inversa e uma gama de inversa anterior?

— Neil G

Antes de tudo, vejo na sua pergunta vários mal-entendidos: do teorema de Bayes você não obtém estimativas posteriores, mas toda a distribuição posterior. O segundo ponto é que essa distribuição posterior não depende "da soma dos quadrados da probabilidade". Simplesmente depende do tamanho da sua amostra (ou seja, n) e dos valores da amostra, o que é perfeitamente natural e razoável. Essas dependências afetam suas estimativas posteriores de média, variância etc. Por exemplo, o parâmetro de média posterior da variância é igual a

\frac{1}{n - 2} \sum {(y_{i} - μ)}^{2}

$\frac{1}{n-2}\sum \left ( y_{i}-\mu \right )^{2}$

— Tomas

@ Thomas por estimativa, eu quis dizer estimativa da distribuição posterior;. O termo soma dos quadrados no posterior não é exatamente o mesmo cálculo que o termo ss na probabilidade normal?

— Abe

@ Talvez eu tenha feito (e respondido) recentemente uma pergunta relacionada à sua pergunta nº. 2. É dado ao SD e ao SD do SD como calcular a gama correspondente antes da precisão de uma distribuição normal: A pergunta é aqui: stats.stackexchange.com/questions/41187/…

— Rasmus Bååth

Respostas:

Eu acho que é mais correto falar da distribuição posterior do seu parâmetro vez de sua estimativa posterior. Para maior clareza das anotações, deixarei o primo em seguir. $\sigma'^{2}$ $\sigma'^{2}$

Suponha que seja distribuído como , - eu largo por enquanto para fazer um exemplo heurístico - e é distribuído como e é independente de . $X$ $\mathcal{N}(0, \sigma^2)$ $\mu$ $1/\sigma^2 = \sigma^{-2}$ $\Gamma(\alpha, \beta)$ $X$

O pdf de dado é gaussiano, ie $X$ $\sigma^{-2}$

f (x | σ^{- 2}) = \frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{x^{2}}{2 σ^{2}}) .

$f(x|\sigma^{-2}) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left(-\frac{x^2}{2\sigma^2}\right).$

O pdf conjunto de , é obtido pela multiplicação de por - o pdf de . Isso sai como $(X, \sigma^{-2})$ $f(x,\sigma^{-2})$ $f(x|\sigma^{-2})$ $g(\sigma^{-2})$ $\sigma^{-2}$

f (x, σ^{- 2}) = \frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{x^{2}}{2 σ^{2}}) \frac{β^{α}}{Γ (α)} \exp (- \frac{β}{σ^{2}}) \frac{1}{σ^{2 (α - 1)}} .

$f(x, \sigma^{-2}) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left(-\frac{x^2}{2\sigma^2}\right) \frac{\beta^{\alpha}}{\Gamma(\alpha)}\exp \left(-\frac{\beta}{ \sigma^2}\right)\frac{1}{\sigma^{2(\alpha-1)}}.$

We can group similar terms and rewrite this as follows

f (x, σ^{- 2}) \propto σ^{- 2 (α - 1 / 2)} \exp (- σ^{- 2} (β + x^{2} / 2)) .

$f(x, \sigma^{-2}) \propto \sigma^{-2(\alpha-1/2)} \exp\left(-\sigma^{-2} \left(\beta + x^2/2 \right)\right).$

The posterior distribution of $\sigma^{-2}$ is by definition the pdf of $\sigma^{-2}$ given $x$ , which is $f(x, \sigma^{-2}) / f(x)$ by Bayes' formula. To answer your question 1. I don't think there is a way to express $f(\sigma^{-2}|x)$ from $f(x, \sigma^{-2})$ without using Bayes' formula. On with the computation, we recognize in the formula above something that looks like a $\Gamma$ function, so integrating $\sigma^{-2}$ out to get $f(x)$ is fairly easy.

f (x) \propto (β + x^{2} / 2)^{- (α + 1 / 2)},

$f(x) \propto (\beta + x^2/2)^{-(\alpha+1/2)},$

so by dividing we get

f (σ^{- 2} | x) \propto (β + x^{2} / 2) {(σ^{- 2} (β + x^{2} / 2))}^{α - 1 / 2} \exp (- σ^{- 2} (β + x^{2} / 2)) \propto {(σ^{- 2} (β + x^{2} / 2))}^{α - 1 / 2} \exp (- σ^{- 2} (β + x^{2} / 2)) .

$f(\sigma^{-2}|x) \propto \left(\beta + x^2/2 \right) \left( \sigma^{-2} \left(\beta + x^2/2 \right) \right)^{\alpha-1/2} \exp\left(-\sigma^{-2} \left(\beta + x^2/2 \right)\right) \\ \propto \left( \sigma^{-2} \left(\beta + x^2/2 \right) \right)^{\alpha-1/2} \exp\left(-\sigma^{-2} \left(\beta + x^2/2 \right)\right).$

And here in the last formula we recognize a $\Gamma$ distribution with parameters $(\alpha + 1/2, \beta + x^2/2)$ .

If you have an IID sample $((x_1, \sigma_1^{-2}), ..., (x_n, \sigma^{-2}_n))$ , by integrating out all the $\sigma_i^{-2}$ , you would get $f(x_1, ..., x_n)$ and then $f(\sigma_1^{-2}, ..., \sigma_n^{-2}|x_1, ..., x_n)$ as a product of the following terms:

f (σ_{1}^{- 2}, . . ., σ_{n}^{- 2} | x_{1}, . . ., x_{n}) \propto \prod_{i = 1}^{n} {(σ_{i}^{- 2} (β + x_{i}^{2} / 2))}^{α - 1 / 2} \exp (- σ_{i}^{- 2} (β + x_{i}^{2} / 2)),

$f(\sigma_1^{-2}, ..., \sigma_n^{-2}|x_1, ..., x_n) \propto \prod_{i=1}^n \left( \sigma_i^{-2} \left(\beta + x_i^2/2 \right) \right)^{\alpha-1/2} \exp\left(-\sigma_i^{-2} \left(\beta + x_i^2/2 \right)\right),$

Which is a product of $\Gamma$ variables. And we are stuck here because of the multiplicity of the $\sigma_i^{-2}$ . Besides, the distribution of the mean of those independent $\Gamma$ variables is not straightforward to compute.

However, if we assume that all the observations $x_i$ share the same value of $\sigma^{-2}$ (which seems to be your case) i.e. that the value of $\sigma^{-2}$ was drawn only once from a $\Gamma(\alpha, \beta)$ and that all $x_i$ were then drawn with that value of $\sigma^{-2}$ , we obtain

f (x_{1}, . . ., x_{n}, σ^{- 2}) \propto σ^{- 2 (α + n / 2)} \exp (- σ^{- 2} (β + \frac{1}{2} \sum_{i = 1}^{n} x_{i}^{2})),

$f(x_1, ..., x_n, \sigma^{-2}) \propto \sigma^{-2 (\alpha + n/2)} \exp\left(-\sigma^{-2} \left(\beta + \frac{1}{2} \sum_{i=1}^n x_i^2\right) \right),$

from which we derive the posterior distribution of $\sigma^{-2}$ as your equation 1 by applying Bayes' formula.

The posterior distribution of $\sigma^{-2}$ is a $\Gamma$ that depends on $\alpha$ and $\beta$ , your prior parameters, the sample size $n$ and the observed sum of squares. The prior mean of $\sigma^{-2}$ is $\alpha/\beta$ and the variance is $\alpha/\beta^2$ , so if $\alpha = \beta$ and the value is very small, the prior carries very little information about $\sigma^{-2}$ because the variance becomes huge. The values being small, you can drop them from the above equations and you end up with your equation 3.

In that case the posterior distribution becomes independent of the prior. This formula says that the inverse of the variance has a $\Gamma$ distribution that depends only on the sample size and the sum of squares. You can show that for Gaussian variables of known mean, $S^2$ , the estimator of the variance, has the same distribution, except that it is a function of the sample size and the true value of the parter $\sigma^2$ . In the Bayesian case, this is the ditribution of the parameter, in the frequentist case, it is the distribution of the estimator.

Regarding your question 2. you can of course use the values obtained in a previous experiment as your priors. Because we established a parallel between Bayesian and frequentist interpretation in the above, we can elaborate and say that it is like computing a variance from a small sample size and then collecting more data points: you would update your estimate of the variance rather than throw away the first data points.

Regarding your question 3. I like the Introduction to Mathematical Statistics by Hogg, McKean and Craig, which usually gives the detail of how to derive these equations.

— gui11aume
fonte

For question 1, the second equation follows from Bayes' rule as you point out, and I don't see how to avoid that.

For question 2, yes, you can do this. Just use a prior of the same form as your second equation.

For question 3, I would look for something about exponential families. Maybe someone will recommend a good resource.

— Neil G
fonte