nm000229 NEMAR-native dataset

Gwilliams et al. 2023 — Introducing MEG-MASC: a high-quality magneto-encephalography dataset for evaluating natural speech processing

MEG-MASC is a high-quality magnetoencephalography dataset comprising raw MEG recordings from 27 English speakers listening to approximately two hours of naturalistic stories from the Manually Annotated Sub-Corpus (MASC). The dataset includes precise temporal annotations of word and phoneme onsets/offsets, organized according to the Brain Imaging Data Structure (BIDS) standard. This benchmark dataset enables large-scale encoding and decoding analyses of neural responses to natural speech processing, with accompanying code for validation analyses including temporal decoding of phonetic features and word frequency effects.

ANAT MEG

Issues GitHub

Download this dataset

Pick a method. Large datasets skip the zip and use the streaming methods below — all resumable. Full download guide →

Download archive (.zip) — 78.5 GB
A single zip of the published version. Best for small/medium datasets.

Download zip
NEMAR CLI recommended
Pulls the pinned version + annexed data and resumes cleanly. Install nemar-cli →
```
nemar dataset download nm000229
```

DataLad

Clone the dataset repo and fetch file content on demand. Docs →

datalad clone https://github.com/nemarDatasets/nm000229 nm000229
cd nm000229 && datalad get .

git-annex

Plain git + git-annex against the dataset repo. Docs →

git clone https://github.com/nemarDatasets/nm000229 nm000229
cd nm000229 && git annex get .

Direct files (wget / curl / rclone)
Every file with a stable, range-resumable URL from the manifest. Needs curl, jq, wget (or rclone/aria2c). Docs →
```
curl -s https://data.nemar.org/nm000229/v1.0.1/manifest.json | jq -r '.[].bytes_url' > urls.txt
wget -xc -i urls.txt
```

Compute on this dataset

Two routes today, with a third (in-browser one-click submission) landing soon.

NeuroScience Gateway (NSG) portal.
NSG runs EEGLAB / Brainstorm / MNE pipelines on supercomputing time donated by SDSC. Create an account, point a job at this dataset's S3 prefix (s3://nemar/nm000229), and submit.
nsgportal.org →
Local processing with nemar-cli.
Pull the dataset to your machine and run any toolbox locally. Honors the published version pinning.
```
npm install -g nemar-cli
nemar dataset clone nm000229
cd nm000229 && nemar dataset get
```
Just the files.
rclone, aria2c, or any HTTPS client works against data.nemar.org/nm000229/ — the manifest carries presigned S3 URLs.

Direct compute access is coming soon. One-click NSG submission from this page is scoped for a follow-up phase. Tracked on nemarOrg/website#6.

Loading demographics…

Files

Loading file index…

Cite this dataset

DOI 10.82901/nemar.nm000229

Gwilliams, L., Flick, G., Marantz, A., Pylkkänen, L., Poeppel, D., & King, J. (2026). Gwilliams et al. 2023 — Introducing MEG-MASC: a high-quality magneto-encephalography dataset for evaluating natural speech processing (Version v1.0.1) [Data set]. NEMAR. https://doi.org/10.82901/nemar.nm000229

License

CC0

Modalities

ANAT MEG

Tasks

0 1 2 3

BIDS datatypes

anat meg

Sessions

Published

Jun 3, 2026 2 months ago

Authors

Laura Gwilliams
Graham Flick
Alec Marantz
Liina Pylkkänen
David Poeppel
Jean-Rémi King

Funding

ANR-17-EURE-0017
G1001

Keywords

Magnetoencephalography MEG speech processing natural language phonetic decoding brain imaging BIDS

Related identifiers

IsDerivedFrom 10.1038/s41597-023-02752-5
IsDescribedBy github.com/nemarDatasets/nm000229…
IsDescribedBy nemar.org/dataexplorer/detail?dataset_id=nm000229…