Machine Learning
Sequential Data & Markov Models
Index

Sequential Data and Markov Models

Most machine learning algorithms assume that data points are independent and identically distributed (i.i.d.). However, in many real-world scenarios—like speech recognition, stock prices, or DNA sequencing—the order of the data matters.


1. The Markov Property

A sequence of random variables S1,S2,,STS_1, S_2, \dots, S_T satisfies the First-Order Markov Property if the probability of the next state depends only on the current state:

p(StSt1,St2,,S1)=p(StSt1)p(S_t \mid S_{t-1}, S_{t-2}, \dots, S_1) = p(S_t \mid S_{t-1})

This drastically simplifies the modeling of sequential data, as we only need to specify a Transition Matrix AA, where Aij=p(St=jSt1=i)A_{ij} = p(S_t = j \mid S_{t-1} = i).

Markov Chain State Transitions

In a Markov model, the probability of transitioning to the next state (S_t+1) depends only on the current state (S_t), not the entire history.

SUN
State 1
0.2
0.8
RAIN
State 2

2. Hidden Markov Models (HMM)

In many cases, the state we are interested in is hidden (latent), and we only see noisy observations generated by that state.

An HMM is defined by:

  1. Hidden States: z1,z2,,zTz_1, z_2, \dots, z_T
  2. Observations: x1,x2,,xTx_1, x_2, \dots, x_T
  3. Transition Probabilities: p(ztzt1)p(z_t \mid z_{t-1})
  4. Emission Probabilities: p(xtzt)p(x_t \mid z_t)

Hidden Markov Model (HMM) Architecture

An HMM consists of hidden states (z) that evolve over time and observations (x) that are generated by those states.

z1
Hidden State
z2
Hidden State
z3
Hidden State
...
x1
Observation
x2
Observation
x3
Observation
...
Transitions: p(zt | zt-1) - How the hidden world changes.
Emissions: p(xt | zt) - How the hidden world produces evidence.

3. The Three Fundamental Problems for HMMs

  1. Evaluation: Given the model, how likely is a certain sequence of observations? (Solved by the Forward Algorithm).
  2. Decoding: Given the observations, what is the most likely sequence of hidden states? (Solved by the Viterbi Algorithm).
  3. Learning: Given the observations, how do we adjust the model parameters to maximize likelihood? (Solved by the Baum-Welch Algorithm, a specific form of EM).
🧠

Dynamic Programming: All the algorithms mentioned above (Forward, Viterbi, etc.) rely on dynamic programming to avoid the exponential cost of searching through all possible state sequences.