Data Organization
MMMData follows the Brain Imaging Data Structure (BIDS) v1.9.0 standard. Data are organized into three tiers:
- BIDS raw: Converted NIfTI + JSON + events TSV files
- Source data: Original DICOMs, PsychoPy output, audio recordings
- Derivatives: Preprocessed outputs (fMRIPrep, MRIQC)
Per-Session Scan Inventories (scans.tsv)
Each subject/session directory contains a sub-XX_ses-YY_scans.tsv file
listing every primary NIfTI scan in that session. These are
BIDS-standard scans files
extended with custom columns:
| Column | Description |
|---|---|
filename |
Relative path to the NIfTI file (BIDS required) |
acq_time |
Acquisition timestamp from JSON sidecar |
n_volumes |
Number of volumes (4th dimension; 1 for 3D scans) |
duration_s |
Scan duration in seconds |
n_events |
Row count of matching _events.tsv (n/a if none) |
physio_cardiac, physio_pulse, physio_respiratory |
Whether each physio channel exists |
eyetracking |
Whether eyetracking recording exists |
has_sbref, has_json, has_events_json |
Companion file presence |
Custom columns are documented in the BIDS-root scans.json sidecar (via
BIDS inheritance). Regenerate with:
.venv/bin/python3 scripts/build_scans_tsv.py
Manifest Database
A SQLite database at inventory/manifest.db aggregates all scans.tsv files,
sourcedata metadata, derivative file listings, and session metadata into a
single queryable store. Key tables:
| Table | Contents |
|---|---|
files |
Every BIDS file with parsed entities (subject, session, task, run, suffix) |
nifti_meta |
NIfTI header info (dimensions, voxel size, TR) |
events_meta |
Events file stats (row count, columns, onset range) |
physio_meta |
Physio recording info (channel, sampling rate) |
sourcedata |
Raw data inventory (DICOMs, behavioral, audio, eyetracking) |
derivatives |
fMRIPrep and MRIQC output files |
session_metadata |
Per-session info from _sessions.tsv (date, notes, physiology) |
validation_results |
Automated check results (pass/fail/warn) |
The manifest is rebuilt on demand and is not version-controlled:
.venv/bin/python3 scripts/build_manifest.py # full rebuild
.venv/bin/python3 scripts/build_manifest.py --skip-nifti # fast (skip NIfTI headers)
Validation
The dataset expectations schema (dataset_expectations.toml) defines what
should exist — expected tasks, run counts, volume counts, event
structures, and physio/eyetracking coverage. The validation engine compares
the manifest against these expectations:
.venv/bin/python3 -m validation.run # full validation
.venv/bin/python3 -m validation.run --checks file_presence # specific check
.venv/bin/python3 -m validation.run --subjects sub-03 # specific subject
Known deviations (documented as [[exceptions]] in the schema) are
automatically matched and downgraded to informational status.