Sunday, February 08, 2009

Noise and regularization in function approximation

Problem: function approximation using standard kernel method, which kernel should I use? Given a finite training set of data, you can easily overfit by choosing a narrow kernel, which is equivalent to just memorizing the data points.
If we know there's noise in the system, we can do a little better in terms of choosing the right kernel by means of regularization. The idea is that similar points should be mapped to similar points through the function; it's a generalization of continuous functions in a topological space setting. The noise in the input space can be used to define the similarity in that space. In case of real valued functions, the system is of the form: Y = f(x+N), and you want to approximate with y = sum alpha_i K(x_i, x). The noise variable inside the true function makes your conditional expectation smoother than the original.

However, additive noise does not work the same way. It does not smooth the function at all! Only when the function is linear they would be equivalent.

Same argument works when the output space is class labels or clusters.

No comments: