Analysis-Ready Preprocessing Pipeline

Overview

A two-layer pipeline that takes fMRIPrep + NORDIC (?) outputs and produces analysis-ready data for three distinct analysis types, all sharing a single quality-control layer.

Layer 1 — Shared base (fMRIPrep outputs + human QC decisions)
    ↓
Layer 2 — Stream-specific cleaning (applied on output from layer 1)
    ├── ready/glmsingle/      ← TB sessions + block-design localizers
    ├── ready/naturalistic/   ← NAT sessions + pRF localizer
    └── ready/connectivity/   ← resting-state sessions

Key design principles:

QC decisions (run exclusions, bad TRs, NORDIC choice) are made once at layer 1 and propagated to all streams. Comparisons across analysis types are valid because the QC substrate is identical.
The BOLD timeseries is never modified for the GLMSingle stream — only a curated confounds file and TR exclusion flags are produced. GLMSingle handles its own noise modeling internally.
Bandpass filtering is stream-specific: required for connectivity (Cordes et al., 2001), harmful for GLMSingle betas (Prince et al., 2022), unnecessary for naturalistic ISFC/ISC and pRF.
Localizer runs route to streams by analysis type, not by session type.
Dual output space: streams B and C produce both MNI152NLin2009cAsym res-2 (volumetric NIfTI) and fsaverage6 (surface gifti) for every run.

Layer 1: Shared Base

For each subject/session/run, layer 1 is the fMRIPrep output (original or NORDIC-denoised) plus a QC decisions file recording all human-review outcomes.

NORDIC: a per-run flag in the QC decisions file controls whether the NORDIC-denoised or original fMRIPrep BOLD is used as the source. Once NORDIC is validated and deployed, the default will be true for TB and resting sessions; NAT sessions are evaluated separately.

QC decisions

One TSV per subject/session at derivatives/preprocessing_qc/sub-XX/sub-XX_ses-YY_qc_decisions.tsv:

Column	Type	Description
`task`	str	BIDS task label
`run`	str	run index or `n/a`
`exclude`	bool	Exclude run entirely from all streams
`exclude_reason`	str	Free text reason
`nordic`	bool	Use NORDIC-denoised BOLD as source
`fd_threshold`	float	FD cutoff for TR flagging (default 0.5 mm)
`n_outlier_trs`	int	Number of TRs flagged
`outlier_trs`	str	Comma-separated 0-indexed TR indices
`notes`	str	Reviewer notes

Stubs are auto-generated from fMRIPrep framewise displacement data (FD stats computed, TRs flagged at default threshold, all runs included by default). A human reviewer then opens the TSV alongside the fMRIPrep HTML report and MRIQC outputs and edits as needed. The file is version-controlled as a record of all QC decisions.

Layer 2: Analysis Streams

Stream A — GLMSingle (`ready/glmsingle/`)

Sessions: TB (TBencoding, TBretrieval, TBmath, TBresting) + block-design localizers (fLoc, motor, auditory, tone)

What it produces per run:

*_desc-confounds_ready.tsv — curated confounds with one spike regressor per outlier TR
*_desc-outliers_mask.tsv — boolean mask of bad TRs

What it does NOT do: modify the BOLD NIfTI. The source BOLD (fMRIPrep or NORDIC fMRIPrep) is read directly by GLMSingle or the localizer GLM.

Confound strategy (36-parameter + spikes):

24 motion parameters (Friston 24: 6 realignment params + derivatives + quadratics)
6 anatomical CompCor components (combined WM+CSF mask)
Cosine drift regressors (handles low-frequency drift without bandpass)
One spike regressor per outlier TR

Spike regressors are the appropriate way to handle outlier TRs in a GLM framework — the affected timepoint is down-weighted without removing it and breaking the temporal structure of the design matrix.

Localizer runs — stream assignment

Localizer sessions (ses-02, ses-03, ses-30) split across streams by analysis type:

Block-design GLMs (fLoc, motor, auditory, tone) → GLMSingle stream. Same recipe: BOLD untouched, curated confounds + TR flags passed to the analysis tool.
pRF / travelling wave (prf task) → Naturalistic stream. prfpy expects confounds regressed, high-pass filtered, no bandpass, no smoothing, surface space preferred. The naturalistic stream’s default dual-space output satisfies all of these with no special handling.

Stream B — Naturalistic (`ready/naturalistic/`)

Sessions: NAT (movie-viewing, free recall, cued recall) + pRF localizer Analyses: pattern similarity (RSA), ISFC, ISC, pRF model fitting

What it produces per run (both spaces):

*_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz
*_hemi-L_space-fsaverage6_desc-preproc_bold.func.gii
*_hemi-R_space-fsaverage6_desc-preproc_bold.func.gii
*_desc-confounds_ready.tsv (shared across spaces)

Cleaning steps:

Select confounds: 24HMP + 6 aCompCor + cosines
Interpolate outlier TRs (cubic spline) — before regression, to prevent contamination of confound estimates
Regress confounds via OLS
High-pass filter only (0.01 Hz / 100s)
No spatial smoothing

Why no bandpass: ISFC/ISC operate on shared inter-subject variance across all frequencies; pRF stimuli modulate BOLD at a specific sweep frequency. Both are harmed by lowpass filtering. High-pass only is the standard approach for naturalistic paradigms.

Stream C — Connectivity (`ready/connectivity/`)

Sessions: TBresting (and any dedicated resting runs added in future) Analyses: seed-based FC, ICA, graph metrics

What it produces per run (both spaces):

*_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz
*_hemi-L_space-fsaverage6_desc-preproc_bold.func.gii
*_hemi-R_space-fsaverage6_desc-preproc_bold.func.gii
*_desc-confounds_ready.tsv (shared across spaces)
*_desc-outliers_mask.tsv

Cleaning steps:

Select confounds: 24HMP + 6 aCompCor + cosines
Interpolate outlier TRs (cubic spline)
Regress confounds
Bandpass filter: 0.01–0.1 Hz
Scrub interpolated TRs from output (flagged in outliers mask)
Spatial smoothing: 4mm FWHM (volumetric for MNI; geodesic for fsaverage6)

Global signal regression (GSR): not applied by default due to introduced anticorrelations. Deferred to when FC analyses begin.

Filesystem layout

derivatives/
├── fmriprep/                     # Original fMRIPrep outputs (unchanged)
├── fmriprep_nordic/              # NORDIC fMRIPrep outputs (unchanged)
├── preprocessing_qc/
│   └── sub-XX/
│       └── sub-XX_ses-YY_qc_decisions.tsv
└── ready/
    ├── glmsingle/
    │   └── sub-XX/ses-YY/func/
    │       ├── *_desc-confounds_ready.tsv
    │       └── *_desc-outliers_mask.tsv
    ├── naturalistic/
    │   └── sub-XX/ses-YY/func/
    │       ├── *_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz
    │       ├── *_hemi-L_space-fsaverage6_desc-preproc_bold.func.gii
    │       ├── *_hemi-R_space-fsaverage6_desc-preproc_bold.func.gii
    │       └── *_desc-confounds_ready.tsv
    └── connectivity/
        └── sub-XX/ses-YY/func/
            ├── *_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz
            ├── *_hemi-L_space-fsaverage6_desc-preproc_bold.func.gii
            ├── *_hemi-R_space-fsaverage6_desc-preproc_bold.func.gii
            ├── *_desc-confounds_ready.tsv
            └── *_desc-outliers_mask.tsv

Brain masks are not duplicated — consumers read them from fmriprep/ or fmriprep_nordic/ directly (same space, same subject).

Confound strategy reference

Component	Columns	Notes
Motion (Friston 24)	`trans_x/y/z`, `rot_x/y/z` + derivatives + quadratics	24 total
Anatomical CompCor	`a_comp_cor_00`–`a_comp_cor_05`	Combined WM+CSF mask
Cosine drift	all `cosine*` columns	Handles drift without bandpass (GLMSingle stream)
Spike regressors	Generated per outlier TR	GLMSingle stream only

Not included by default: global signal (anticorrelations), mean WM/CSF signals (superseded by aCompCor), temporal CompCor.

Open questions

NORDIC for NAT sessions: The NORDIC pilot covered a TB session. A NAT session pilot (longer timeseries, ~600+ TRs, movie-viewing) should be run before committing the NORDIC default for those sessions.
T1w (native) space output: Desirable for analyses requiring native resolution (HippUnfold, sub-millimetre ROI work). Deferred on disk space grounds — T1w BOLD is 2–4× larger per run than MNI res-2. fsaverage6 is comparatively cheap (~10% of volume size) and is included by default.
Global signal regression for connectivity: Deferred to when resting-state FC analyses begin; will be implemented as an opt-in variant.