In practical data analysis, we ususally don't know what function should be used to fit the observed data. We tend to use a famailiar function, but how do we know it is better than other possible choices?
Consider N data points, (xi, yi) for , and a model with K parameters, . An example is a polynomial function .
The estimation of parameters is usually done by minimizing a sum of the squared residual (SSR) defined by
An important problem is to select the optimal number of parameters K*.
A merit function like SSRK in linear least square fitting has the following relationship