Next: AIC Up: Akaike Information Criterion (AIC) Previous: A fitting problem

## Kullback-Leibler information number and log-likelihood

Assume that our data follow the true distribution g(y), and our statistical model to approximate g(y) is f(y).

Kullback-Leibler information number A ruler to measure the similarity between the statistical model and the true distribution.
 (17)

Properties of KL information number:

Pf. For simplicity, we only consider the discrete case here. Assume that p and q are probability distributions satisfying and .

Let for . attains its maximum value 0 only at . Thus and the equality holds only when . By putting , we have

It follows that

The equality only holds when . Q.E.D.

Thus we know as .

It can be shown that -I(g;f) is the entropy. To minimize the KL information number is to maximize entropy.

How to estimate I(g;f)? From (17) we have
 (20)
Only the second term is important in evaluating the statistical model f(y), which can be written as

as . Therefore, can replace the KL information number as a criterion of evaluating models. One wants to find the greatest possible .

Log-likelihood
 (21)
Likelihood
 (22)

Example: Consider , i.e.,
 (23)
The log-likelihood function is
 (24)
The maximization is is equivalent to the minimization of . For normal distribution, the maximum log-likelihood estimation and the least squares fitting give identical results.

Next: AIC Up: Akaike Information Criterion (AIC) Previous: A fitting problem