Divergência de Kullback-Leibler entre duas distribuições gama

15

Optando por parametrizar a distribuição gama $\Gamma(b,c)$ pelo pdf $g(x;b,c) = \frac{1}{\Gamma(c)}\frac{x^{c-1}}{b^c}e^{-x/b}$ A divergência de Kullback-Leibler entre $\Gamma(b_q,c_q)$ e $\Gamma(b_p,c_p)$ é dada por [1] como

\begin{aligned} K L_{G a} (b_{q}, c_{q}; b_{p}, c_{p}) & = (c_{q} - 1) Ψ (c_{q}) - \log b_{q} - c_{q} - \log Γ (c_{q}) + \log Γ (c_{p}) \\ + c_{p} \log b_{p} - (c_{p} - 1) (Ψ (c_{q}) + \log b_{q}) + \frac{b_{q} c_{q}}{b_{p}} \end{aligned}

$\begin{align} KL_{Ga}(b_q,c_q;b_p,c_p) &= (c_q-1)\Psi(c_q) - \log b_q - c_q - \log\Gamma(c_q) + \log\Gamma(c_p)\\ &\qquad+ c_p\log b_p - (c_p-1)(\Psi(c_q) + \log b_q) + \frac{b_qc_q}{b_p} \end{align}$

Estou supondo que $\Psi(x):= \Gamma'(x)/\Gamma(x)$ é a função digamma .

Isso é dado sem derivação. Não consigo encontrar nenhuma referência que derive isso. Qualquer ajuda? Uma boa referência seria suficiente. A parte difícil é integrar o em um pdf gama. $\log x$

[1] WD Penny, KL-Divergences of Densities Normal, Gamma, Dirichlet e Wishart , disponível em: www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps

kullback-leibler gamma-distribution exponential-family

— Ian Langmore
fonte

2

Tomar a derivada do pdf em relação a

introduz o fator de

você está procurando: é por isso que o digamma aparece.

c

$c$

l o g (x)

$log(x)$

— whuber

Se você se deparar com Pierre Baldi e Laurent Itti (2010) “De bits e surpresa: uma teoria bayesiana da surpresa com aplicações em atenção” Neural Networks 23: 649-666, você encontrará a Equação 73 que fornece uma divergência de KL entre dois PDFs gama. Tome cuidado, porém, parece que a fórmula foi impressa incorretamente.

— Clarinet

Estou à procura de uma solução para o mesmo problema e encontrar este um é útil.

— Yi Yang

15

A divergência KL é uma diferença de integrais da forma

$$ \ eqalign {I (a, b, c, d) & = \ int_0 ^ {\ infty} \ log \ left (\ frac {e ^ {- x / a} x ^ {b-1}} {a ^ b \ Gamma (b)} \ right) \ frac {e ^ {- x / c} x ^ {d-1}} {c ^ d \ Gamma (d)} dx \

& = - \ frac {1} {a} \ int_0 ^ \ infty \ frac {x ^ de ^ {- x / c}} {c ^ d \ Gamma (d)} \, dx - \ log (a ^ b \ Gamma (b)) \ int_0 ^ \ infty \ frac {e ^ {- x / c} x ^ {d-1}} {c ^ d \ Gamma (d)} \, dx \ & \ quad + (b- 1) \ int_0 ^ \ infty \ log (x) \ frac {e ^ {- x / c} x ^ {d-1}} {c ^ d \ Gamma (d)} \, dx \

&=-\frac{cd}{a} - \log(a^b\Gamma(b)) + (b-1)\int_0^\infty \log(x) \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\,dx }$$

We just have to deal with the right hand integral, which is obtained by observing

\begin{aligned} \frac{\partial}{\partial d} Γ (d) = & \frac{\partial}{\partial d} \int_{0}^{\infty} e^{- x / c} \frac{x^{d - 1}}{c^{d}} d x \\ = & \frac{\partial}{\partial d} \int_{0}^{\infty} e^{- x / c} \frac{(x / c)^{d - 1}}{c} d x \\ = & \int_{0}^{\infty} e^{- x / c} \frac{x^{d - 1}}{c^{d}} \log \frac{x}{c} d x \\ = & \int_{0}^{\infty} \log (x) e^{- x / c} \frac{x^{d - 1}}{c^{d}} d x - \log (c) Γ (d) . \end{aligned}

$\eqalign{ \frac{\partial}{\partial d}\Gamma(d) =& \frac{\partial}{\partial d}\int_0^{\infty}e^{-x/c}\frac{x^{d-1}}{c^d}dx\\ =& \frac{\partial}{\partial d} \int_0^\infty e^{-x/c} \frac{(x/c)^{d-1}}{c}\,dx\\ =&\int_0^\infty e^{-x/c}\frac{x^{d-1}}{c^d} \log\frac{x}{c} \,dx\\ =&\int_0^{\infty}\log(x)e^{-x/c}\frac{x^{d-1}}{c^d}dx - \log(c)\Gamma(d). }$

Whence

\frac{b - 1}{Γ (d)} \int_{0}^{\infty} \log (x) e^{- x / c} (x / c)^{d - 1} d x = (b - 1) \frac{Γ^{'} (d)}{Γ (d)} + (b - 1) \log (c) .

$\frac{b-1}{\Gamma(d)}\int_0^{\infty} \log(x)e^{-x/c}(x/c)^{d-1}dx = (b-1)\frac{\Gamma'(d)}{\Gamma(d)} + (b-1)\log(c).$

Plugging into the preceding yields

I (a, b, c, d) = \frac{- c d}{a} - \log (a^{b} Γ (b)) + (b - 1) \frac{Γ^{'} (d)}{Γ (d)} + (b - 1) \log (c) .

$I(a,b,c,d)=\frac{-cd}{a} -\log(a^b\Gamma(b))+(b-1)\frac{\Gamma'(d)}{\Gamma(d)} + (b-1)\log(c).$

The KL divergence between $\Gamma(c,d)$ and $\Gamma(a,b)$ equals $I(c,d,c,d) - I(a,b,c,d)$ , which is straightforward to assemble.

Implementation Details

Gamma functions grow rapidly, so to avoid overflow don't compute Gamma and take its logarithm: instead use the log-Gamma function that will be found in any statistical computing platform (including Excel, for that matter).

The ratio $\Gamma^\prime(d)/\Gamma(d)$ is the logarithmic derivative of $\Gamma,$ generally called $\psi,$ the digamma function. If it's not available to you, there are relatively simple ways to approximate it, as described in the Wikipedia article.

Here, to illustrate, is a direct R implementation of the formula in terms of $I$ . This does not exploit an opportunity to simplify the result algebraically, which would make it a little more efficient (by eliminating a redundant calculation of $\psi$ ).

#
# `b` and `d` are Gamma shape parameters and
# `a` and `c` are scale parameters.
# (All, therefore, must be positive.)
#
KL.gamma <- function(a,b,c,d) {
  i <- function(a,b,c,d)
    - c * d / a - b * log(a) - lgamma(b) + (b-1)*(psigamma(d) + log(c))
  i(c,d,c,d) - i(a,b,c,d)
}
print(KL.gamma(1/114186.3, 202, 1/119237.3, 195), digits=12)

— whuber
fonte

2

Good answer. Thanks! I believe that there is a sign error however in the fourth equality. Also, your gamma pdf should have an extra factor of 'c' in the denominator. Would you like me to edit it?

— Ian Langmore

@Ian You're right; I usually write the measure as

d x / x

$dx/x$ and by not doing that I omitted that extra factor of

c

$c$ . Good catch on the sign mistake. If you would like to make the edits, feel free!

— whuber

2

I made the corrections.

— Ian Langmore

10

The Gamma distribution is in the exponential family because its density can be expressed as:

\begin{aligned} f (x ∣ θ) & = \exp (η (θ) \cdot T (x) - g (θ) + h (x)) \end{aligned}

$\begin{align} \newcommand{\mbx}{\mathbf{x}} \newcommand{\btheta}{\boldsymbol{\theta}} f(\mbx \mid \btheta) &= \exp\bigl(\eta(\btheta) \cdot T(\mbx) - g(\btheta) + h(\mbx)\bigr) \end{align}$

Looking at the Gamma density function, its log-normalizer is

g (θ) = \log (Γ (c)) + c \log (b)

$g(\btheta) = \log(\Gamma(c)) + c\log(b)$ with natural parameters

θ = [\begin{matrix} c - 1 \\ - \frac{1}{b} \end{matrix}]

$\btheta = \left[\begin{matrix}c-1\\-\frac1 b\end{matrix}\right]$

All distributions in the exponential family have KL divergence:

\begin{aligned} K L (q; p) & = g (θ_{p}) - g (θ_{q}) - (θ_{p} - θ_{q}) \cdot \nabla g (θ_{q}) . \end{aligned}

$\begin{align} KL(q; p) &= g(\btheta_p) - g(\btheta_q) - (\btheta_p-\btheta_q) \cdot \nabla g(\btheta_q). \end{align}$

There's a really nice proof of that in:

Frank Nielsen, École Polytechnique, and Richard Nock, Entropies and cross-entropies of exponential families.

— Neil G
fonte

Didn't know this. Just a quick question - the

g (.)

$g(.)$ function, does it have to be the same for

θ_{p}

$\theta_p$ as for

θ_{q}

$\theta_q$ ? So for example, would the above formula be valid for KL divergence of normal pdf from gamma pdf?

— probabilityislogic

1

Yes, this formula is for two distributions in the same exponential family.

— Neil G