nm000104 NEMAR-native dataset

emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography

emg2qwerty is the largest public surface electromyography (sEMG) dataset to date, comprising 1,135 sessions from 108 participants performing touch typing on a QWERTY keyboard. The dataset captures wrist-based sEMG signals (32 channels, 2000 Hz sampling rate) synchronized with keystroke ground truth, totaling 346.4 hours of data and 5.26 million keystrokes. Designed to enable keyboard-free text input through decoding of typing intent from neuromuscular activity, the dataset supports research in sequence-to-sequence learning, cross-user generalization, domain adaptation, and neuromotor interfaces for AR/VR and accessibility applications.

EMG

Issues GitHub

Download this dataset

Archive (>100 GB) removed to save space; use the per-file direct download (#752).. Use one of the streaming methods below — all resumable. Full download guide →

NEMAR CLI recommended
Pulls the pinned version + annexed data and resumes cleanly. Install nemar-cli →
```
nemar dataset download nm000104
```

DataLad

Clone the dataset repo and fetch file content on demand. Docs →

datalad clone https://github.com/nemarDatasets/nm000104 nm000104
cd nm000104 && datalad get .

git-annex

Plain git + git-annex against the dataset repo. Docs →

git clone https://github.com/nemarDatasets/nm000104 nm000104
cd nm000104 && git annex get .

Direct files (wget / curl / rclone)
Every file with a stable, range-resumable URL from the manifest. Needs curl, jq, wget (or rclone/aria2c). Docs →
```
curl -s https://data.nemar.org/nm000104/v2.0.0/manifest.json | jq -r '.[].bytes_url' > urls.txt
wget -xc -i urls.txt
```

Compute on this dataset

Two routes today, with a third (in-browser one-click submission) landing soon.

NeuroScience Gateway (NSG) portal.
NSG runs EEGLAB / Brainstorm / MNE pipelines on supercomputing time donated by SDSC. Create an account, point a job at this dataset's S3 prefix (s3://nemar/nm000104), and submit.
nsgportal.org →
Local processing with nemar-cli.
Pull the dataset to your machine and run any toolbox locally. Honors the published version pinning.
```
npm install -g nemar-cli
nemar dataset clone nm000104
cd nm000104 && nemar dataset get
```
Just the files.
rclone, aria2c, or any HTTPS client works against data.nemar.org/nm000104/ — the manifest carries presigned S3 URLs.

Direct compute access is coming soon. One-click NSG submission from this page is scoped for a follow-up phase. Tracked on nemarOrg/website#6.

Loading demographics…

Files

Loading file index…

Cite this dataset

DOI 10.82901/nemar.nm000104

Sivakumar, V., Seely, J., Du, A., Bittner, S. R., Berenzweig, A., Bolarinwa, A., Gramfort, A., & Mandel, M. I. (2026). emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography (Version v2.0.0) [Data set]. NEMAR. https://doi.org/10.82901/nemar.nm000104

License

CC-BY-NC-SA-4.0

Modalities

EMG

Tasks

typing

BIDS datatypes

emg

Sessions

1135

Published

Feb 23, 2026 5 months ago

Authors

Viswanath Sivakumar
Jeffrey Seely
Alan Du
Sean R. Bittner
Adam Berenzweig
Anuoluwapo Bolarinwa

+ 2 more

Alexandre Gramfort
Michael I. Mandel

Funding

CTRL-labs

Keywords

Electromyography motor control sequence-to-sequence learning domain adaptation brain-computer interfaces keystroke decoding keystroke dynamics keystroke recognition transfer learning sEMG wrist-based EMG wearable sensors accessibility neuromotor interfaces

Related identifiers

IsVersionOf 10.5281/zenodo.17287903
IsIdenticalTo 10.5281/zenodo.17613953
IsVersionOf 10.82901/nemar.nm000104
IsDescribedBy 10.48550/arXiv.2410.20081
IsDescribedBy github.com/nemarDatasets/nm000104…
IsDescribedBy nemar.org/dataexplorer/detail?dataset_id=nm000104…