Sequential Data and Markov Models

Most machine learning algorithms assume that data points are independent and identically distributed (i.i.d.). However, in many real-world scenarios—like speech recognition, stock prices, or DNA sequencing—the order of the data matters.

1. The Markov Property

A sequence of random variables $S_1, S_2, \dots, S_T$ satisfies the First-Order Markov Property if the probability of the next state depends only on the current state:

p(S_t \mid S_{t-1}, S_{t-2}, \dots, S_1) = p(S_t \mid S_{t-1})

This drastically simplifies the modeling of sequential data, as we only need to specify a Transition Matrix $A$ , where $A_{ij} = p(S_t = j \mid S_{t-1} = i)$ .

Markov Chain State Transitions

In a Markov model, the probability of transitioning to the next state (S_t+1) depends only on the current state (S_t), not the entire history.

SUN

State 1

0.2

0.8

RAIN

State 2

2. Hidden Markov Models (HMM)

In many cases, the state we are interested in is hidden (latent), and we only see noisy observations generated by that state.

An HMM is defined by:

Hidden States: $z_1, z_2, \dots, z_T$
Observations: $x_1, x_2, \dots, x_T$
Transition Probabilities: $p(z_t \mid z_{t-1})$
Emission Probabilities: $p(x_t \mid z_t)$

Hidden Markov Model (HMM) Architecture

An HMM consists of hidden states (z) that evolve over time and observations (x) that are generated by those states.

z₁

Hidden State

z₂

Hidden State

z₃

Hidden State

...

x₁

Observation

x₂

Observation

x₃

Observation

...

Transitions: p(z_t | z_t-1) - How the hidden world changes.

Emissions: p(x_t | z_t) - How the hidden world produces evidence.

3. The Three Fundamental Problems for HMMs

Evaluation: Given the model, how likely is a certain sequence of observations? (Solved by the Forward Algorithm).
Decoding: Given the observations, what is the most likely sequence of hidden states? (Solved by the Viterbi Algorithm).
Learning: Given the observations, how do we adjust the model parameters to maximize likelihood? (Solved by the Baum-Welch Algorithm, a specific form of EM).

🧠

Dynamic Programming: All the algorithms mentioned above (Forward, Viterbi, etc.) rely on dynamic programming to avoid the exponential cost of searching through all possible state sequences.

Index Evaluation Metrics