Friday, February 27, 2009

Memory dump from COSYNE day 2

This is my second time coming to COSYNE, so I know the 7:30am-11:30pm schedule will exhaust my short-term memory very quickly, so I am trying to write down my interpretation and ideas about the interesting talks while in the dinner break. Hopefully, I'll be able to understand it later on. :)

Earl K. Miller from MIT used decoding techniques to determine the order of information processing in the brain areas LIP, FEF , and PRC. The firing rate increases almost immediately after the stimulation, but it does not encode the decision that is made until some time after. It seems to be an interesting technique, but it heavily relies on the ability to detect such coding. Extraction of relevant information using conditional entropy might be a good thing to try (Sohan!). He also proposes that LFP oscillation cycles in the beta range might cause the internal attention shift cycle. Furthermore, he shows that the averaging window time lock to the onset is poorer than using the last two LFP cycles. However, an audience asked an excellent question that the LFP does not look like sine waves unless band-pass filtered and smoothed. Also, the action potential and LFP are always in a egg-or-chicken dilemma.

Vikaas Sohal from Stanford showed that a cortical microcircuit with a piramidal neuron and a fast spiking inhibitory neuron enhances gamma-band oscillation using various fancy biological methods. He had three different stimulation pattern that he injected via dynamic clamp or light-induced current to mimic EPSCs: non-rhythmic, and two different frequencies. When I asked him if the non-rhythmic stimulation was frozen Poisson process, but he said it was not frozen, and not Poisson. I shall ask him (if I can find him free somewhere during the conference) how he can be sure that the reduction of variability is not coming from reduction of variability in the input signal.

Antonio Rangel from Caltech talked about neuroeconimocs. He described a diffusion based model to describe how internal value and attention can fit various psychophysical results. Take home message: the longer you look at things you like, the more likely you would choose it. (The longer you look at disgusting things, the less likely that you would eat it)

Robert Wilson from UPenn talked about the change point problem; to detect where the signal was non-stationary in a simple case. The algorithm has to distinguish variation due to noise and jumps; to do so it has to estimate the noise, and the frequency of change. Using a Bayesian approach, they were able to come up with almost exact inference algorithm. I wonder if autocorrentropy estimator can do similar things without the complicated algorithm (I think like an engineer too...sometimes)

Misha Tsodyks talked about things that most of them I already read, but listening in person is a world of a difference. He was trying to promote the computational role of short-term plasticity (his dynamics synpase model) via the notion of population spike. One work I wasn't aware of was the work on auditory coding which intrigued me because I did some reading on Meddis hair cell model recently.
After working on the BORNs for a while, the work on working memory in non-spiking state which looked like a trivial idea (although published on nature, I think) sounds very interesting as well. The past is encoded in the distribution of phases instead of facillitation variables of the synapses in our case. :D
Overall, I didn't like his talk too much, because although related to many of my research problems, he does not have a strong experimental support of what he describes in model exist in the real brain.

I also had a random idea about how to better model the inter-population spike-interval using some sort of ghost of saddle type of dynamics, instead of completely using the depression time constant which is around the range of 1 second while we need something n the order of 4~10 seconds.

Cori Bargmann from Rockefeller University gave a nice talk about the neuromodulatory mechanisms that are not described in the full 'connectome' of C. elegance. Using diverse genetic manipulations and calcium imaging, her group showed some amazing results of how seemingly feedforward network had feedback modulation through modulatory peptides.

P.S. Due to a stupid mistake I missed a flight and as a consequence the keynote speech and posters of day 1 which was about olfactory coding. :'(

Sunday, February 22, 2009

Comparing likelihoods of heterogeneous models

Given a dataset X, which model is the best model to fit it?
In a parametric family of models, often this problem is tackled by maximum likelihood (estimation of the parameters). However if the family of models has too much freedom, it may overfit, the same way function approximation and regression does. If a model A predicts the dataset will be X and only X, given the dataset X, it would obviously have the maximum likelihood. It is like doing a pdf estimation with kernel density estimation with a Dirac delta kernel.
How can we avoid this situation?
Since it is caused by the model's lack of generalization ability, one obvious way is to use a test set (or cross validation). If the likelihood value for the test set and training set does not differ significantly, it is not overfitting.
Can we compare the likelihoods of two MLEs on two different model families that does not overfit the data? I think so.

Sunday, February 08, 2009

Noise and regularization in function approximation

Problem: function approximation using standard kernel method, which kernel should I use? Given a finite training set of data, you can easily overfit by choosing a narrow kernel, which is equivalent to just memorizing the data points.
If we know there's noise in the system, we can do a little better in terms of choosing the right kernel by means of regularization. The idea is that similar points should be mapped to similar points through the function; it's a generalization of continuous functions in a topological space setting. The noise in the input space can be used to define the similarity in that space. In case of real valued functions, the system is of the form: Y = f(x+N), and you want to approximate with y = sum alpha_i K(x_i, x). The noise variable inside the true function makes your conditional expectation smoother than the original.

However, additive noise does not work the same way. It does not smooth the function at all! Only when the function is linear they would be equivalent.

Same argument works when the output space is class labels or clusters.

Saturday, February 07, 2009

Rozin's talk inspired me

A psychologist Paul Rozin came to UF to give a talk on the smell and taste seminar last week. He started off by talking about the importance of first describing phenomena in contrary to explaining them. I have forgotten about looking for novel phenomena that were not properly described in the literature! I have been working too hard to model and explain things, and not paid attention to the spirit of description.
The reason he brought this up is because he is kind of non-mainstream psychologist. Many of his research ends up not being able to be published. According to him, this is mostly because they are not subject of main stream academia.

The main topic was in human, there are many things that should be naturally avoided, but you learn to like it later in life; such as hotness (almost no animal likes hot food, except for maybe parrots), bitterness, sadness, pain, scariness, disgusting blue cheese, etc. A new born babies would not like any of these, and at some age, perhaps due to social context, they develop a preference for them!

The seminar was hilarious, and it was definitely the most entertaining and inspiring talks in 2 years for me.

P.S. He also talked about his experience in one of the best restaurants in the world, El Bulli in Spain.