Sequential Data and Markov Models
Most machine learning algorithms assume that data points are independent and identically distributed (i.i.d.). However, in many real-world scenarios—like speech recognition, stock prices, or DNA sequencing—the order of the data matters.
1. The Markov Property
A sequence of random variables satisfies the First-Order Markov Property if the probability of the next state depends only on the current state:
This drastically simplifies the modeling of sequential data, as we only need to specify a Transition Matrix , where .
Markov Chain State Transitions
In a Markov model, the probability of transitioning to the next state (S_t+1) depends only on the current state (S_t), not the entire history.
2. Hidden Markov Models (HMM)
In many cases, the state we are interested in is hidden (latent), and we only see noisy observations generated by that state.
An HMM is defined by:
- Hidden States:
- Observations:
- Transition Probabilities:
- Emission Probabilities:
Hidden Markov Model (HMM) Architecture
An HMM consists of hidden states (z) that evolve over time and observations (x) that are generated by those states.
3. The Three Fundamental Problems for HMMs
- Evaluation: Given the model, how likely is a certain sequence of observations? (Solved by the Forward Algorithm).
- Decoding: Given the observations, what is the most likely sequence of hidden states? (Solved by the Viterbi Algorithm).
- Learning: Given the observations, how do we adjust the model parameters to maximize likelihood? (Solved by the Baum-Welch Algorithm, a specific form of EM).
Dynamic Programming: All the algorithms mentioned above (Forward, Viterbi, etc.) rely on dynamic programming to avoid the exponential cost of searching through all possible state sequences.