Why is input distribution not important in regression?
Maximum likelihood estimation of the w given the data is
w* = argmax_w p(y|x,w),
that is, you are searching for the best set of parameters that maximizes the probability that you observe y, given the inputs x. Note that the probability distribution of x itself does not have any role here. The randomness that generates the likelihood function is actually the noise (or error) term.
Now, consider a generative model that also takes the input distribution of the input into account. (X,Y) = f(e; w). This problem is no longer a regression; it is a parametric joint distribution estimation (or model fitting). Here, e is not a noise; it is the source of all randomness.
For regression, you don't really need the entire joint distribution. The conditional is enough!
P.S. Of course, if you are doing linear regression where you have non-linear functions or non-Guassian noise, this is not true. Even the MSE cost in a regression problem cannot be computed without the input distribution.
No comments:
Post a Comment