Fisher information and expected uncertainty¶
Why the spiky FI curves you may have seen are real, and how to compute
Fisher information and decoder uncertainty reliably from an
EncodingModel.
Two questions, two APIs¶
After fitting an encoding model + noise covariance you typically want one of two things:
- How much information does the population carry about a stimulus value?
→
EncodingModel.get_fisher_information()
How well will an unbiased decoder recover the true stimulus, averaged over noise draws?
→
EncodingModel.get_expected_uncertainty()
Both work on the same fitted (parameters, omega, dof) tuple. The
Fisher information is the asymptotic lower bound on decoder variance
(Cramér–Rao); the expected uncertainty is an empirical readout from the
actual posterior implied by your model. They agree when the decoder is
well-fit and the posterior is locally Gaussian; they diverge when the
posterior is multimodal, bounded by the grid edge, or the noise model
mismatches the data.
Closed-form Fisher information¶
For Gaussian noise the population FI at stimulus \(s\) is
For multivariate Student-t noise with \(\nu\) degrees of freedom on a \(p\)-voxel residual,
As \(\nu \to \infty\) the prefactor approaches 1 and the Gaussian
formula is recovered. Both are now available as closed-form expressions
in get_fisher_information() (pass analytical=True, default).
The Monte-Carlo branch (analytical=False) is kept for backwards
compatibility and as a sanity check; it adds nothing in this regime
except MC noise and an int32 overflow risk at large
n × n_voxels × n_stim.
Why FI curves can look spiky — and when it’s real¶
Single voxels with Gaussian-like tuning have \(\partial \mu / \partial s = 0\) at the preferred stimulus and peaks on the flanks. A population FI is a sum of squared per-voxel gradients, so the curve dips at every voxel’s mode and bumps on either side. When the fitted mode population clusters — e.g. an ROI where many voxels prefer the same narrow range of values — the population FI gets a sharp dip-bump-dip pattern at the cluster, not because the model is wrong but because that’s what the population actually encodes.
Diagnose this in three steps:
Compute
get_fisher_information()withanalytical=True(no MC noise). If the curve is still spiky, the spikes are population structure.Plot a histogram of the fitted modes on the same x-axis. Spike positions in the FI should sit on the flanks of clusters in the mode histogram.
(Optional) call
get_expected_uncertainty()on the same grid;var_Eshould also be spiky in the same places, since the Cramér–Rao bound is sharp where FI is sharp.
When the spikes are not real:
If they move when you change MC seed or
n, they’re MC noise — use the analytical path.If they live only near
stimulus_range[0]orstimulus_range[-1]for a log-Gaussian model, they’re boundary cliffs from \(\log x / x\) blow-up; restrict the grid or use a model with no singularity at the lower boundary.If they shift when you fit
omegaagain (different random seed in ResidualFitter), they’re noise-model artifact; refit withResidualFitterlambdbetween 0.5 and 1 to bias toward a cleaner empirical covariance.
API reference¶
- class braincoder.models.EncodingModel(paradigm=None, data=None, parameters=None, weights=None, omega=None, verbosity=20)
Abstract base class for encoding models.
- get_expected_uncertainty(stimuli, omega, dof=None, parameters=None, weights=None, n_simulations=1000, batch_stimuli=None, decoder_stimulus_range=None, progress=False)
Simulate noisy responses, decode them, summarise per-stimulus.
Procedure (mirrors the analytical Fisher information but lets you read off posterior quantities directly):
For each stimulus
sinstimuli, drawn_simulationsindependent noisy responsesr ~ p(r | s, Ω, dof).Compute the posterior
p(s | r)over a finedecoder_stimulus_range(defaults tostimuliitself).Take the posterior expected value of each simulated trial and aggregate across the
n_simulationsrepeats per stimulus.
- Parameters:
stimuli (array_like) – Stimulus values to simulate from + use as the aggregation axis. Shape
(n_stim,)for 1-D models or(n_stim, n_dims).omega (array_like) – Noise covariance.
dof (float or None) – Degrees of freedom for multivariate Student-t noise.
Noneuses Gaussian.parameters (DataFrame or None) – Encoding-model parameters (one row per voxel). Defaults to
self.parameters.weights (DataFrame or None) – Basis weights for linear models. Defaults to
self.weights.n_simulations (int) – Noisy repeats per stimulus.
batch_stimuli (int or None) – Process this many stimuli per simulation/decoding batch.
Nonedoes them all at once. Use a small number (e.g. 25) to bound memory when the stimulus grid is large.decoder_stimulus_range (array_like or None) – Grid for posterior evaluation. Defaults to
stimuli. Pass a finer grid to makevar_Ereflect decoder resolution independent of the simulation grid.progress (bool) – If True, print a one-line status per batch.
- Returns:
Index = stimulus value (or MultiIndex for N-D stimuli). Columns:
mean_E, var_E, mean_error, mean_abs_error, n_sims.mean_Eis the average posterior mean;var_Eis its empirical variance across simulations (≈ 1/FI in well-fit regimes);mean_error = mean_E - true_valueis decoder bias;mean_abs_erroris average absolute decode error.- Return type:
DataFrame
Notes
var_Eis the variance of the posterior expected value across simulations — different from the average posterior variance. For unbiased decoders in a Gaussian/CRB regime,var_E ≈ 1 / Fisher_information. The two diverge when the posterior is skewed or bounded by the grid edges.
- get_fisher_information(stimuli, omega=None, dof=None, weights=None, parameters=None, n=1000, analytical=True)
Population Fisher information per stimulus value.
Two computation paths:
Analytical (default) — closed form
FI(s) = J(s)ᵀ Ω⁻¹ J(s)for Gaussian noise, orFI(s) = (ν + p) / (ν + p + 2) · J(s)ᵀ Ω⁻¹ J(s)for multivariate Student-t with degrees of freedomν = dofand dimensionalityp(number of voxels inΩ). Exact, fast, no MC noise; recommended for any fitted noise model.Monte-Carlo (
analytical=False) — average squared score overnsimulated noise draws. Useful as a sanity check and for non-quadratic likelihoods, but in our setup gives essentially identical curves with much higher variance, so prefer the analytical path.
The Student-t correction factor reduces FI relative to Gaussian by exactly
(ν + p) / (ν + p + 2) < 1for finiteν— heavy-tailed noise carries less information per voxel.
Worked example¶
import numpy as np
import pandas as pd
from braincoder.models import LogGaussianPRF
rng = np.random.default_rng(0)
pars = pd.DataFrame({
"mode": rng.normal(20, 4, 100).clip(2, 45).astype(np.float32),
"fwhm": rng.normal(8, 2, 100).clip(2, 25).astype(np.float32),
"amplitude": np.ones(100, dtype=np.float32),
"baseline": np.zeros(100, dtype=np.float32),
})
model = LogGaussianPRF(parameterisation="mode_fwhm_natural", parameters=pars)
omega = (0.25 ** 2) * np.eye(100, dtype=np.float32)
stim = np.linspace(1, 40, 100, dtype=np.float32)
fi_gauss = model.get_fisher_information(stim, omega=omega)
fi_t = model.get_fisher_information(stim, omega=omega, dof=5.0)
out = model.get_expected_uncertainty(
stim, omega=omega, dof=5.0, n_simulations=500, batch_stimuli=20)
# `out` is a DataFrame with mean_E, var_E, mean_error, mean_abs_error
# indexed by true stimulus. 1/out["var_E"] ≈ fi_t for an unbiased
# decoder away from the grid boundary.
Empirical audit¶
A 100-voxel log-Gaussian population with two mode clusters (centered at 12 and 32 CHF) was used to compare paths:
Metric |
Analytic |
MC n=200 |
MC n=1000 |
|---|---|---|---|
Mean FI (across grid) |
9.566 |
9.571 |
9.566 |
Std FI (across grid) |
6.901 |
6.992 |
6.933 |
RMS error vs analytical (raw) |
— |
1.13 |
0.37 |
RMS error vs analytical (% of μ) |
— |
11.8 % |
3.9 % |
The MC curves look spikier than the analytical reference because of
noise; the analytical curve is itself non-monotonic, confirming the
population-structure origin of the bumps. MC also costs an extra factor
of n × n_stim memory and triggered an int32 overflow inside
UnsortedSegmentSum at n × n_voxels × n_stim ≈ 5 × 10⁷ —
historically forcing users to shrink n_voxels to keep MC viable.
See tests/test_fisher_information.py for the regression suite
(Student-t prefactor, MC ≈ analytic agreement, FI ↔ 1/var_E
relationship, batched vs. unbatched get_expected_uncertainty).