Friday, September 19, 2008

You don't need input distribution for Regression.

Why is input distribution not important in regression?

Let's consider a parametric regression of the data pairs {x(i), y(i)} to Y = f(X;w) + e where f is the function with w as the paramter, and e is the (unobservable) noise term.
Maximum likelihood estimation of the w given the data is
w* = argmax_w p(y|x,w),
that is, you are searching for the best set of parameters that maximizes the probability that you observe y, given the inputs x. Note that the probability distribution of x itself does not have any role here. The randomness that generates the likelihood function is actually the noise (or error) term.

Now, consider a generative model that also takes the input distribution of the input into account. (X,Y) = f(e; w). This problem is no longer a regression; it is a parametric joint distribution estimation (or model fitting). Here, e is not a noise; it is the source of all randomness.

For regression, you don't really need the entire joint distribution. The conditional is enough!

P.S. Of course, if you are doing linear regression where you have non-linear functions or non-Guassian noise, this is not true. Even the MSE cost in a regression problem cannot be computed without the input distribution.

Tuesday, September 02, 2008

[Lecturenote] Statistical inference

Squobble

Squobble: Sudoku with color and numbers!
I played twice in the beginner level, first time was 10 mins, and second time was 5 min 15 sec. At first the overwhelming information confuses the mind, but once you get used to it the color makes it much easier to solve the puzzle. It's been quite several years since this has been online, and I am surprised that nobody replicated the idea.

Monday, September 01, 2008

Computation <--> Learning and Memory

The speed of thinking process gives us approximate range of time scale that we do real-time computation. And our very long-lasting memory tells us the time scale of the corresponding memory processes. The former computation is most likely performed by fast dynamical systems, and the latter is hypothesized to be based on long-term synaptic plasticity. The important question is "How are these time scales connected"?

Among the underlying biophysical systems, the spike-timing dependent plasticity (STDP) is one of the possible mechanisms. If learning and memory is trying to store and recall the fast dynamics that it is performing now, it is necessary to be able to do prediction of its own dynamics. Hence, mechanisms like STDP that enhances the causal connections would be a very good start.