nm000104 NEMAR-native dataset

emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography

emg2qwerty is the largest public surface electromyography (sEMG) dataset to date, comprising 1,135 sessions from 108 participants performing touch typing on a QWERTY keyboard. The dataset captures wrist-based sEMG signals (32 channels, 2000 Hz sampling rate) synchronized with keystroke ground truth, totaling 346.4 hours of data and 5.26 million keystrokes. Designed to enable keyboard-free text input through decoding of typing intent from neuromuscular activity, the dataset supports research in sequence-to-sequence learning, cross-user generalization, domain adaptation, and neuromotor interfaces for AR/VR and accessibility applications.

AI-generated description, may include mistakes

Compute on this dataset

Two routes today, with a third (in-browser one-click submission) landing soon.

  1. NeuroScience Gateway (NSG) portal.

    NSG runs EEGLAB / Brainstorm / MNE pipelines on supercomputing time donated by SDSC. Create an account, point a job at this dataset's S3 prefix (s3://nemar/nm000104), and submit.
    nsgportal.org →

  2. Local processing with nemar-cli.

    Pull the dataset to your machine and run any toolbox locally. Honors the published version pinning.

    npm install -g nemar-cli
    nemar dataset clone nm000104
    cd nm000104 && nemar dataset get
  3. Just the files.

    rclone, aria2c, or any HTTPS client works against data.nemar.org/nm000104/ — the manifest carries presigned S3 URLs.

Direct compute access is coming soon. One-click NSG submission from this page is scoped for a follow-up phase. Tracked on nemarOrg/website#6.

Loading demographics…

Files

Loading file index…