Analysis-Ready Preprocessing Pipeline
Overview
A two-layer pipeline that takes fMRIPrep + NORDIC (?) outputs and produces analysis-ready data for three distinct analysis types, all sharing a single quality-control layer.
Layer 1 — Shared base (fMRIPrep outputs + human QC decisions)
↓
Layer 2 — Stream-specific cleaning (applied on output from layer 1)
├── ready/glmsingle/ ← TB sessions + block-design localizers
├── ready/naturalistic/ ← NAT sessions + pRF localizer
└── ready/connectivity/ ← resting-state sessions
Key design principles:
- QC decisions (run exclusions, bad TRs, NORDIC choice) are made once at layer 1 and propagated to all streams. Comparisons across analysis types are valid because the QC substrate is identical.
- The BOLD timeseries is never modified for the GLMSingle stream — only a curated confounds file and TR exclusion flags are produced. GLMSingle handles its own noise modeling internally.
- Bandpass filtering is stream-specific: required for connectivity (Cordes et al., 2001), harmful for GLMSingle betas (Prince et al., 2022), unnecessary for naturalistic ISFC/ISC and pRF.
- Localizer runs route to streams by analysis type, not by session type.
- Dual output space: streams B and C produce both
MNI152NLin2009cAsym res-2(volumetric NIfTI) andfsaverage6(surface gifti) for every run.
Layer 1: Shared Base
For each subject/session/run, layer 1 is the fMRIPrep output (original or NORDIC-denoised) plus a QC decisions file recording all human-review outcomes.
NORDIC: a per-run flag in the QC decisions file controls whether the
NORDIC-denoised or original fMRIPrep BOLD is used as the source. Once NORDIC
is validated and deployed, the default will be true for TB and resting
sessions; NAT sessions are evaluated separately.
QC decisions
One TSV per subject/session at
derivatives/preprocessing_qc/sub-XX/sub-XX_ses-YY_qc_decisions.tsv:
| Column | Type | Description |
|---|---|---|
task |
str | BIDS task label |
run |
str | run index or n/a |
exclude |
bool | Exclude run entirely from all streams |
exclude_reason |
str | Free text reason |
nordic |
bool | Use NORDIC-denoised BOLD as source |
fd_threshold |
float | FD cutoff for TR flagging (default 0.5 mm) |
n_outlier_trs |
int | Number of TRs flagged |
outlier_trs |
str | Comma-separated 0-indexed TR indices |
notes |
str | Reviewer notes |
Stubs are auto-generated from fMRIPrep framewise displacement data (FD stats computed, TRs flagged at default threshold, all runs included by default). A human reviewer then opens the TSV alongside the fMRIPrep HTML report and MRIQC outputs and edits as needed. The file is version-controlled as a record of all QC decisions.
Layer 2: Analysis Streams
Stream A — GLMSingle (ready/glmsingle/)
Sessions: TB (TBencoding, TBretrieval, TBmath, TBresting) + block-design localizers (fLoc, motor, auditory, tone)
What it produces per run:
*_desc-confounds_ready.tsv— curated confounds with one spike regressor per outlier TR*_desc-outliers_mask.tsv— boolean mask of bad TRs
What it does NOT do: modify the BOLD NIfTI. The source BOLD (fMRIPrep or NORDIC fMRIPrep) is read directly by GLMSingle or the localizer GLM.
Confound strategy (36-parameter + spikes):
- 24 motion parameters (Friston 24: 6 realignment params + derivatives + quadratics)
- 6 anatomical CompCor components (combined WM+CSF mask)
- Cosine drift regressors (handles low-frequency drift without bandpass)
- One spike regressor per outlier TR
Spike regressors are the appropriate way to handle outlier TRs in a GLM framework — the affected timepoint is down-weighted without removing it and breaking the temporal structure of the design matrix.
Localizer runs — stream assignment
Localizer sessions (ses-02, ses-03, ses-30) split across streams by analysis type:
- Block-design GLMs (fLoc, motor, auditory, tone) → GLMSingle stream. Same recipe: BOLD untouched, curated confounds + TR flags passed to the analysis tool.
- pRF / travelling wave (prf task) → Naturalistic stream. prfpy expects confounds regressed, high-pass filtered, no bandpass, no smoothing, surface space preferred. The naturalistic stream’s default dual-space output satisfies all of these with no special handling.
Stream B — Naturalistic (ready/naturalistic/)
Sessions: NAT (movie-viewing, free recall, cued recall) + pRF localizer Analyses: pattern similarity (RSA), ISFC, ISC, pRF model fitting
What it produces per run (both spaces):
*_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz*_hemi-L_space-fsaverage6_desc-preproc_bold.func.gii*_hemi-R_space-fsaverage6_desc-preproc_bold.func.gii*_desc-confounds_ready.tsv(shared across spaces)
Cleaning steps:
- Select confounds: 24HMP + 6 aCompCor + cosines
- Interpolate outlier TRs (cubic spline) — before regression, to prevent contamination of confound estimates
- Regress confounds via OLS
- High-pass filter only (0.01 Hz / 100s)
- No spatial smoothing
Why no bandpass: ISFC/ISC operate on shared inter-subject variance across all frequencies; pRF stimuli modulate BOLD at a specific sweep frequency. Both are harmed by lowpass filtering. High-pass only is the standard approach for naturalistic paradigms.
Stream C — Connectivity (ready/connectivity/)
Sessions: TBresting (and any dedicated resting runs added in future) Analyses: seed-based FC, ICA, graph metrics
What it produces per run (both spaces):
*_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz*_hemi-L_space-fsaverage6_desc-preproc_bold.func.gii*_hemi-R_space-fsaverage6_desc-preproc_bold.func.gii*_desc-confounds_ready.tsv(shared across spaces)*_desc-outliers_mask.tsv
Cleaning steps:
- Select confounds: 24HMP + 6 aCompCor + cosines
- Interpolate outlier TRs (cubic spline)
- Regress confounds
- Bandpass filter: 0.01–0.1 Hz
- Scrub interpolated TRs from output (flagged in outliers mask)
- Spatial smoothing: 4mm FWHM (volumetric for MNI; geodesic for fsaverage6)
Global signal regression (GSR): not applied by default due to introduced anticorrelations. Deferred to when FC analyses begin.
Filesystem layout
derivatives/
├── fmriprep/ # Original fMRIPrep outputs (unchanged)
├── fmriprep_nordic/ # NORDIC fMRIPrep outputs (unchanged)
├── preprocessing_qc/
│ └── sub-XX/
│ └── sub-XX_ses-YY_qc_decisions.tsv
└── ready/
├── glmsingle/
│ └── sub-XX/ses-YY/func/
│ ├── *_desc-confounds_ready.tsv
│ └── *_desc-outliers_mask.tsv
├── naturalistic/
│ └── sub-XX/ses-YY/func/
│ ├── *_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz
│ ├── *_hemi-L_space-fsaverage6_desc-preproc_bold.func.gii
│ ├── *_hemi-R_space-fsaverage6_desc-preproc_bold.func.gii
│ └── *_desc-confounds_ready.tsv
└── connectivity/
└── sub-XX/ses-YY/func/
├── *_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz
├── *_hemi-L_space-fsaverage6_desc-preproc_bold.func.gii
├── *_hemi-R_space-fsaverage6_desc-preproc_bold.func.gii
├── *_desc-confounds_ready.tsv
└── *_desc-outliers_mask.tsv
Brain masks are not duplicated — consumers read them from fmriprep/ or
fmriprep_nordic/ directly (same space, same subject).
Confound strategy reference
| Component | Columns | Notes |
|---|---|---|
| Motion (Friston 24) | trans_x/y/z, rot_x/y/z + derivatives + quadratics |
24 total |
| Anatomical CompCor | a_comp_cor_00–a_comp_cor_05 |
Combined WM+CSF mask |
| Cosine drift | all cosine* columns |
Handles drift without bandpass (GLMSingle stream) |
| Spike regressors | Generated per outlier TR | GLMSingle stream only |
Not included by default: global signal (anticorrelations), mean WM/CSF signals (superseded by aCompCor), temporal CompCor.
Open questions
-
NORDIC for NAT sessions: The NORDIC pilot covered a TB session. A NAT session pilot (longer timeseries, ~600+ TRs, movie-viewing) should be run before committing the NORDIC default for those sessions.
-
T1w (native) space output: Desirable for analyses requiring native resolution (HippUnfold, sub-millimetre ROI work). Deferred on disk space grounds — T1w BOLD is 2–4× larger per run than MNI res-2. fsaverage6 is comparatively cheap (~10% of volume size) and is included by default.
-
Global signal regression for connectivity: Deferred to when resting-state FC analyses begin; will be implemented as an opt-in variant.