In practical data analysis, we ususally don't know what function should be used to fit the observed data. We tend to use a famailiar function, but how do we know it is better than other possible choices?

Consider *N* data points, (*x*_{i}, *y*_{i}) for , and a model with *K* parameters, . An example is a polynomial function .

The estimation of parameters is usually done by minimizing a *sum of the squared residual* (SSR) defined by

(15) |

An important problem is to select the optimal number of parameters *K ^{*}*.

A merit function like *SSR*_{K} in linear least square fitting has the following relationship

(16) |