Fisher information and expected uncertainty

Why the spiky FI curves you may have seen are real, and how to compute Fisher information and decoder uncertainty reliably from an EncodingModel.

Two questions, two APIs

After fitting an encoding model + noise covariance you typically want one of two things:

How much information does the population carry about a stimulus value?

EncodingModel.get_fisher_information()

How well will an unbiased decoder recover the true stimulus, averaged over noise draws?

EncodingModel.get_expected_uncertainty()

Both work on the same fitted (parameters, omega, dof) tuple. The Fisher information is the asymptotic lower bound on decoder variance (Cramér–Rao); the expected uncertainty is an empirical readout from the actual posterior implied by your model. They agree when the decoder is well-fit and the posterior is locally Gaussian; they diverge when the posterior is multimodal, bounded by the grid edge, or the noise model mismatches the data.

Closed-form Fisher information

For Gaussian noise the population FI at stimulus \(s\) is

\[\mathcal{I}(s) = J(s)^\top \, \Omega^{-1} \, J(s) \quad \text{where } J(s)_v = \frac{\partial \mu_v}{\partial s}.\]

For multivariate Student-t noise with \(\nu\) degrees of freedom on a \(p\)-voxel residual,

\[\mathcal{I}_t(s) = \frac{\nu + p}{\nu + p + 2} \cdot J(s)^\top \, \Omega^{-1} \, J(s).\]

As \(\nu \to \infty\) the prefactor approaches 1 and the Gaussian formula is recovered. Both are now available as closed-form expressions in get_fisher_information() (pass analytical=True, default). The Monte-Carlo branch (analytical=False) is kept for backwards compatibility and as a sanity check; it adds nothing in this regime except MC noise and an int32 overflow risk at large n × n_voxels × n_stim.

Why FI curves can look spiky — and when it’s real

Single voxels with Gaussian-like tuning have \(\partial \mu / \partial s = 0\) at the preferred stimulus and peaks on the flanks. A population FI is a sum of squared per-voxel gradients, so the curve dips at every voxel’s mode and bumps on either side. When the fitted mode population clusters — e.g. an ROI where many voxels prefer the same narrow range of values — the population FI gets a sharp dip-bump-dip pattern at the cluster, not because the model is wrong but because that’s what the population actually encodes.

Diagnose this in three steps:

  1. Compute get_fisher_information() with analytical=True (no MC noise). If the curve is still spiky, the spikes are population structure.

  2. Plot a histogram of the fitted modes on the same x-axis. Spike positions in the FI should sit on the flanks of clusters in the mode histogram.

  3. (Optional) call get_expected_uncertainty() on the same grid; var_E should also be spiky in the same places, since the Cramér–Rao bound is sharp where FI is sharp.

When the spikes are not real:

  • If they move when you change MC seed or n, they’re MC noise — use the analytical path.

  • If they live only near stimulus_range[0] or stimulus_range[-1] for a log-Gaussian model, they’re boundary cliffs from \(\log x / x\) blow-up; restrict the grid or use a model with no singularity at the lower boundary.

  • If they shift when you fit omega again (different random seed in ResidualFitter), they’re noise-model artifact; refit with ResidualFitter lambd between 0.5 and 1 to bias toward a cleaner empirical covariance.

API reference

class braincoder.models.EncodingModel(paradigm=None, data=None, parameters=None, weights=None, omega=None, verbosity=20)

Abstract base class for encoding models.

get_expected_uncertainty(stimuli, omega, dof=None, parameters=None, weights=None, n_simulations=1000, batch_stimuli=None, decoder_stimulus_range=None, progress=False)

Simulate noisy responses, decode them, summarise per-stimulus.

Procedure (mirrors the analytical Fisher information but lets you read off posterior quantities directly):

  1. For each stimulus s in stimuli, draw n_simulations independent noisy responses r ~ p(r | s, Ω, dof).

  2. Compute the posterior p(s | r) over a fine decoder_stimulus_range (defaults to stimuli itself).

  3. Take the posterior expected value of each simulated trial and aggregate across the n_simulations repeats per stimulus.

Parameters:
  • stimuli (array_like) – Stimulus values to simulate from + use as the aggregation axis. Shape (n_stim,) for 1-D models or (n_stim, n_dims).

  • omega (array_like) – Noise covariance.

  • dof (float or None) – Degrees of freedom for multivariate Student-t noise. None uses Gaussian.

  • parameters (DataFrame or None) – Encoding-model parameters (one row per voxel). Defaults to self.parameters.

  • weights (DataFrame or None) – Basis weights for linear models. Defaults to self.weights.

  • n_simulations (int) – Noisy repeats per stimulus.

  • batch_stimuli (int or None) – Process this many stimuli per simulation/decoding batch. None does them all at once. Use a small number (e.g. 25) to bound memory when the stimulus grid is large.

  • decoder_stimulus_range (array_like or None) – Grid for posterior evaluation. Defaults to stimuli. Pass a finer grid to make var_E reflect decoder resolution independent of the simulation grid.

  • progress (bool) – If True, print a one-line status per batch.

Returns:

Index = stimulus value (or MultiIndex for N-D stimuli). Columns: mean_E, var_E, mean_error, mean_abs_error, n_sims. mean_E is the average posterior mean; var_E is its empirical variance across simulations (≈ 1/FI in well-fit regimes); mean_error = mean_E - true_value is decoder bias; mean_abs_error is average absolute decode error.

Return type:

DataFrame

Notes

var_E is the variance of the posterior expected value across simulations — different from the average posterior variance. For unbiased decoders in a Gaussian/CRB regime, var_E 1 / Fisher_information. The two diverge when the posterior is skewed or bounded by the grid edges.

get_fisher_information(stimuli, omega=None, dof=None, weights=None, parameters=None, n=1000, analytical=True)

Population Fisher information per stimulus value.

Two computation paths:

  • Analytical (default) — closed form FI(s) = J(s)ᵀ Ω⁻¹ J(s) for Gaussian noise, or FI(s) = + p) / + p + 2) · J(s)ᵀ Ω⁻¹ J(s) for multivariate Student-t with degrees of freedom ν = dof and dimensionality p (number of voxels in Ω). Exact, fast, no MC noise; recommended for any fitted noise model.

  • Monte-Carlo (analytical=False) — average squared score over n simulated noise draws. Useful as a sanity check and for non-quadratic likelihoods, but in our setup gives essentially identical curves with much higher variance, so prefer the analytical path.

The Student-t correction factor reduces FI relative to Gaussian by exactly + p) / + p + 2) < 1 for finite ν — heavy-tailed noise carries less information per voxel.

Worked example

import numpy as np
import pandas as pd
from braincoder.models import LogGaussianPRF

rng = np.random.default_rng(0)
pars = pd.DataFrame({
    "mode":      rng.normal(20, 4, 100).clip(2, 45).astype(np.float32),
    "fwhm":      rng.normal(8, 2, 100).clip(2, 25).astype(np.float32),
    "amplitude": np.ones(100, dtype=np.float32),
    "baseline":  np.zeros(100, dtype=np.float32),
})
model = LogGaussianPRF(parameterisation="mode_fwhm_natural", parameters=pars)
omega = (0.25 ** 2) * np.eye(100, dtype=np.float32)
stim = np.linspace(1, 40, 100, dtype=np.float32)

fi_gauss = model.get_fisher_information(stim, omega=omega)
fi_t     = model.get_fisher_information(stim, omega=omega, dof=5.0)
out      = model.get_expected_uncertainty(
    stim, omega=omega, dof=5.0, n_simulations=500, batch_stimuli=20)

# `out` is a DataFrame with mean_E, var_E, mean_error, mean_abs_error
# indexed by true stimulus. 1/out["var_E"] ≈ fi_t for an unbiased
# decoder away from the grid boundary.

Empirical audit

A 100-voxel log-Gaussian population with two mode clusters (centered at 12 and 32 CHF) was used to compare paths:

Metric

Analytic

MC n=200

MC n=1000

Mean FI (across grid)

9.566

9.571

9.566

Std FI (across grid)

6.901

6.992

6.933

RMS error vs analytical (raw)

1.13

0.37

RMS error vs analytical (% of μ)

11.8 %

3.9 %

The MC curves look spikier than the analytical reference because of noise; the analytical curve is itself non-monotonic, confirming the population-structure origin of the bumps. MC also costs an extra factor of n × n_stim memory and triggered an int32 overflow inside UnsortedSegmentSum at n × n_voxels × n_stim 5 × 10⁷ — historically forcing users to shrink n_voxels to keep MC viable.

See tests/test_fisher_information.py for the regression suite (Student-t prefactor, MC ≈ analytic agreement, FI ↔ 1/var_E relationship, batched vs. unbatched get_expected_uncertainty).