next up previous contents
Next: AIC Up: Akaike Information Criterion (AIC) Previous: A fitting problem

Kullback-Leibler information number and log-likelihood

Assume that our data follow the true distribution g(y), and our statistical model to approximate g(y) is f(y).

Kullback-Leibler information number A ruler to measure the similarity between the statistical model and the true distribution.  
 \begin{displaymath}
 I(g;f) = \text{E}_Y \log \{ \frac{g(Y)}{f(Y)} \}
 = \int_{-\infty}^{\infty} \log \{ \frac{g(y)}{f(y)} \} g(y) dy\end{displaymath} (17)

Properties of KL information number:

Pf. For simplicity, we only consider the discrete case here. Assume that p and q are probability distributions satisfying $p_i \gt 0, q_i \gt 0 (i = 1, \cdots, m), \sum_{i=1}^{m} p_i = 1$ and $\sum_{i=1}^{m} q_i = 1$.

Let $h(\eta) = \log \eta - \eta + 1$ for $\eta \gt 0$. $h(\eta)$ attains its maximum value 0 only at $\eta = 1$. Thus $\log \eta \leq \eta - 1$ and the equality holds only when $\eta = 1$. By putting $\eta = q_i / p_i$, we have

\begin{displaymath}
\log \frac{q_i}{p_i} \leq \frac{q_i}{p_i} - 1 \; (i = 1, \cdots, m).\end{displaymath}

It follows that

The equality only holds when $p_i = q_i, \forall i$. Q.E.D.

Thus we know $f \rightarrow g$ as $I(g;f) \rightarrow 0$.

It can be shown that -I(g;f) is the entropy. To minimize the KL information number is to maximize entropy.

How to estimate I(g;f)? From (17) we have
\begin{displaymath}
I(g;f) = \text{E}_Y \log g(Y) - \text{E}_Y \log f(Y).\end{displaymath} (20)
Only the second term is important in evaluating the statistical model f(y), which can be written as

as $N \rightarrow \infty$. Therefore, $\sum_{n=1}^{N} \log f(y_n)$ can replace the KL information number as a criterion of evaluating models. One wants to find the greatest possible $\sum_{n=1}^{N} \log f(y_n)$.

Log-likelihood
\begin{displaymath}
\ell = \sum_{n=1}^{N} \log f(y_n)
 \end{displaymath} (21)
Likelihood
\begin{displaymath}
L = e^{\ell} = \prod_{n=1}^{N} f(y_n)
 \end{displaymath} (22)

Example: Consider $f(y\vert\mu) \sim N(\mu,1)$, i.e.,
\begin{displaymath}
f(y\vert\mu) = \frac{1}{\sqrt{2 \pi}} \exp \{ -\frac{(y - \mu)^2}{2} \}.\end{displaymath} (23)
The log-likelihood function is
\begin{displaymath}
\ell(\mu) = -\frac{N}{2} \log 2\pi -
 \frac{1}{2} \sum_{i=1}^{n} (y_i - \mu)^2.\end{displaymath} (24)
The maximization is $\ell(\mu)$ is equivalent to the minimization of $S(\mu) = \sum_{i=1}^{n} (y_i - \mu)^2$. For normal distribution, the maximum log-likelihood estimation and the least squares fitting give identical results.


next up previous contents
Next: AIC Up: Akaike Information Criterion (AIC) Previous: A fitting problem