Eu li que HMMs, Particle Filters e Kalman filter são casos especiais de redes Bayes dinâmicas. No entanto, conheço apenas HMMs e não vejo a diferença nas redes dinâmicas de Bayes.

Alguém poderia explicar?

Seria bom se sua resposta pudesse ser semelhante à seguinte, mas para a Bayes Networks:

## Modelos ocultos de Markov

Um modelo Markov oculto (HMM) é uma 5-tupla :$\lambda =(S,O,A,B,\mathrm{\Pi})$

- $S\ne \mathrm{\varnothing}$ : um conjunto de estados (por exemplo, "início do fonema", "meio do fonema", "fim do fonema")
- $O\ne \mathrm{\varnothing}$ : um conjunto de possíveis observações (sinais de áudio)
- $A\in {\mathbb{R}}^{|S|\times |S|}$: A stochastic matrix which gives probabilites $({a}_{ij})$ to get from state $i$ to state $j$.
- $B\in {\mathbb{R}}^{|S|\times |O|}$: A stochastic matrix which gives probabilites $({b}_{kl})$ to get in state $k$ the observation $l$.
- $\mathrm{\Pi}\in {\mathbb{R}}^{|S|}$: Initial distribution to start in one of the states.

It is usually displayed as a directed graph, where each node corresponds to one state $s\in S$ and the transition probabilities are denoted on the edges.

Hidden Markov Models are called "hidden", because the current state is hidden. The algorithms have to guess it from the observations and the model itself. They are called "Markov", because for the next state only the current state matters.

For HMMs, you give a fixed topology (number of states, possible edges). Then there are 3 possible tasks

**Evaluation**: given a HMM $\lambda $, how likely is it to get observations ${o}_{1},\dots ,{o}_{t}$ (Forward algorithm)**Decoding**: given a HMM $\lambda $ and a observations ${o}_{1},\dots ,{o}_{t}$, what is the most likely sequence of states ${s}_{1},\dots ,{s}_{t}$ (Viterbi algorithm)**Learning**: learn $A,B,\mathrm{\Pi}$: Baum-Welch algorithm, which is a special case of Expectation maximization.

## Bayes networks

Bayes networks are directed acyclical graphs (DAGs) $G=(\mathcal{X},\mathcal{E})$. The nodes represent random variables $X\in \mathcal{X}$. For every $X$, there is a probability distribution which is conditioned on the parents of $X$:

There seem to be (please clarify) two tasks:

**Inference**: Given some variables, get the most likely values of the others variables. Exact inference is NP-hard. Approximately, you can use MCMC.**Learning**: How you learn those distributions depends on the exact problem (source):- known structure, fully observable: maximum likelihood estimation (MLE)
- known structure, partially observable: Expectation Maximization (EM) or Markov Chain Monte Carlo (MCMC)
- unknown structure, fully observable: search through model space
- unknown structure, partially observable: EM + search through model space

## Dynamic Bayes networks

I guess dynamic Bayes networks (DBNs) are also directed probabilistic graphical models. The variability seems to come from the network changing over time. However, it seems to me that this is equivalent to only copying the same network and connecting every node at time $t$ with every the corresponding node at time $t+1$. Is that the case?