Posted: 2016-09-24
, Modified: 2016-09-24
Tags: neuroscience
fMRI: \(10^5\) \((3mm)^3\) voxels, measuring blood flow.
Prior work
- Mitchell 08: pictures of concrete nouns
- Naselaris 09: images of scenes
- Pereira 11: generating words
- N11: reconstruct movie images
- Wehbe 14: chapter of Harry Potter (cf. speed reading)
- Huth 16: auditory stories
Goal
- Decode fMRI response semantics. Match fMRI responses to annotations
- Sherlock scenes annotated. 2000 scenes.
- Leverage multiple subject views to extract better semantics. (16 subjects)
Methods
- Pick a few brain regions to focus on. Ex. default node network (2006): related to narrative.
- Hypothesis: this region does the best.
- \(\approx 2000\) voxels for each mask. (Masks are a small part of \(10^5\).)
- Shared response model \[\amin_{W^TW = I;S} \sumo ik \ve{X_i-W_iS}_F.\] (Columns of \(W\) orthogonal. voxels\(\times\)features (\(2000 \times 20\)))
- Probabilistic model \(s_t\sim N(0,\Si_s)\). \(x_{it}|s_t\sim N(W_is_t+\mu_i, \rh_i^2I)\).
Distributional hypothesis of meaning: meaning comes from co-occurrence.
We have multiple words in each annotation. Approaches:
- Unweighted: Averaging
- Weighted: By inverse of frequency
(Note: words have different meanings. Use DL to split up words into atoms. Ignores polysemy.)
Let \(A=\)fMRI, \(B=\)text. We learn a linear map \(\Om A\approx B\). We can vary the way we constrain the maps.
- \(\Om\) orthogonal.
Ridge regression (penalizes by norm of columns).
- 20 dimensional SRM vs. averaging
- Weighted vs. unweighted
- Procrustes vs. ridge
Temporal average subtraction vs. not.
Annotation vectors 1000-dimensional.
Is true chunk in top 5? (See table in paper.)
Average, else correlated
Model
- Unweighted \(\Pj(w|c) = \fc{\exp(v_w^T c)}{Z}\).
- Weighted \(\Pj(w|c) = \al p(w) + (1-\al) \fc{\exp(v_w^T c)}{Z}\), \(\al\in [0,1]\). (Ex. more accurate for common words.)