Lesson 3: Building the likelihood (by fitting the covariance matrix)#
For many neuroscientific questions, we are interested in the relationship between neural codes and objective stimulus featuers. For example, we might want to know how the brain represents numbers, orientatoins, or spatial positions, and how these representatios change as a function of task demands, attention, or prior expectations.
One particular powerful approach is to decode stimulus features from neural data in a Bayesian fashion (e.g., van Bergen et al., 2015; Baretto-Garcia et al, 2023).
Inverting the encoding models#
Here, we show how we go from a determistic forward model (i.e., a model that predicts neural responses from stimulus features) to a probabilistic inverse model (i.e., a model that predicts stimulus features from neural responses). We will do so using a Bayesian inversion scheme:
where \(s\) is a n-dimensional point in stimulus space, and \(x\) is a n-dimensional activation pattern in neural space, and \(p(s | x; \theta)\) is the posterior probability of stimulus \(s\) given neural response \(x\) and model parameters \(\theta\).
A multivariate likelihood function#
The crucial element that is still lacking for this Bayesian inversion scheme is a likelihood function. Note stanard encoding models do not have a likelihood function, because they are deterministic (\(f(x;\theta): s \mapsto x\)). They give us the average neural respons to a certain stimulus, but they don’t tell us how likely a certain neural response is, given a certain stimulus. However, we can easily derive a likelihood function from the forward model by adding Gaussian noise:
where \(\epsilon\) is a multivariate Normal distribution
Estimating \(\Sigma\)#
The problem with high-dimensional covariance matrices#
How to set the covariance matrix \(\Sigma\)? One approach would be to assume independent noise across neural dimensions (e.g., fMRI voxels), and use a spherical covariance matrix \(\Sigma = \tau^T\tau I\), where \(\tau\) is a vector containing the standard deviation of the residuals of the encoding models and I is the identity matrix. However, if there is substantial covariance between the noise terms of different neural dimensions (i.e., voxels), this could have severe consequences for the decoding performance. In particular, the posterior might be overly confident in its predictions, assuming independt sources of information that are not. Under some circumstances, the mean posterior, the point estimate of the decoded stimulus, can also be affected. Van Bergen et al. (2015, 2017) showed that modeling some of the covariance is ineed crucial for making correct inferences about neural data. However, we generally have a large number of voxels and very limited data. Therefore, estimating the full covariance matrix is not feasible (note that the number of free parameters scales quadratically with the number of voxels \(p= n \times \frac{n-1}{2}\)). Note that this is a general problem of estimating covariance, and not specific to our use case (e.g., Ledoit, …).
Van Bergen et al. (2015, 2017) proposed a two-part solution.
Regularizing the covariance matrix#
The first proposal of van Bergen et al. (2015, 2017), based on the work of Ledoit (…), is to use a shrinkage estimator to estimate the covariance matrix. Specifically, the free parameter \(\rho\) scales between a perfectly-correlated covariance matrix (\(\rho = 1\)) and a diagonal covariance matrix (\(\rho = 0\)):
Using the sample covariance#
Note that additional elements can be added to the covariance matrix. For one, we can add a proportion \(\lambda\) of the empirical noise covariance matrix \(S\) to the regularized covariance matrix, to allow for a more sophisticated noise model:
Using the anatomical distance#
Similarly, one could add a term that accounts for the physical distance between different neural sources (i.e., voxels).
Fitting \(\Sigma\)#
braincoder
contains the ResidualFitter
-class that can be used
to fit a noise covariance matrix, and thereby a likelihood function
to a fitted EncodingModel
.
Here we first set up a simple VonmisesPRF
-model and simulate some data
with covarying noise.
from braincoder.models import VonMisesPRF
import numpy as np
import pandas as pd
# Set up six evenly spaced von Mises PRFs
centers = np.linspace(0.0, 2*np.pi, 6, endpoint=False)
parameters = pd.DataFrame({'mu':centers, 'kappa':1., 'amplitude':1.0, 'baseline':0.0},
index=pd.Index([f'Voxel {i+1}' for i in range(6)], name='voxel')).astype(np.float32)
# We have only 3 voxels, each with a linear combination of the 6 von Mises functions:
weights = np.array([[1, 0, 1],
[1, .5, 1],
[0, 1, 0],
[0, .5, 0],
[0, 0, 1],
[0, 0, 1]]).astype(np.float32)
model = VonMisesPRF(parameters=parameters, weights=weights)
# 50 random orientations
paradigm = np.random.rand(50) * np.pi*2
# Arbitrary covariance matrix
cov = np.array([[.5, 0.0, 0.0],
[.25, .75, .25],
[.25, .25, .75]])
data = model.simulate(noise=cov, weights=weights, paradigm=paradigm)
Now we can import the ResidualFitter and estimate \(\Sigma\) (here called omega for legacy reasons):
# Import ResidualFitter
from braincoder.optimize import ResidualFitter
fitter = ResidualFitter(model, data, paradigm, parameters, weights)
# omega is the covariance matrix, dof can be estimated when a
# multivariate t-distribution (rather than a normal distribution)
# is used
omega, dof = fitter.fit(progressbar=False)
print(omega)
# %%
init_tau: 0.5883835554122925, 0.9530800580978394
WWT max: 4.0
[[0.3484391 0.05987196 0.07491002]
[0.05987195 0.60535425 0.09866075]
[0.07491002 0.09866075 0.9463547 ]]
Note
Here we have used the generating parameters and weights to fit the
omega
-matrix. Note that in real data, we would use estimated parameters
and/or weights.
Note
The complete Python script and its output can be found here.
Summary#
- In this lesson we have seen:
We need to add a noise model to the classical encoding models \(f(s, \theta): s \mapsto x\) to get a likelihood function which we can invert.
Conctretely, we add a multivariate Gaussian noise model to the deterministic predictions of the encodig models
We need a noise covariance matrix \(\Sigma\) (or
omega
) for this to work.We use a regularised estimate of the covariance matrix.
In the:ref:next lesson<tutorial_lesson4>, we will further explore how we can use a likelihood function to map from neural responses to stimulus features.