Demo Data Guide

This document provides a comprehensive guide to the demo data included with PRISM. The demo folder contains synthetic data designed to help you learn PRISM workflows, test validation, and understand how to structure your own datasets.

Overview

The demo/ folder demonstrates the complete PRISM ecosystem:

Component

Purpose

templates/

Multilingual JSON sidecars (survey, biometrics)

docs/examples/

Excel templates for importing new surveys/biometrics

raw_data/

Example source files + bad examples for testing

derivatives/

Scoring recipes for computed measures

prism_structure_example/

Correctly organized PRISM dataset

flat_structure_example/

Common disorganized format (for comparison)


Excel Import Templates

If you are starting from scratch with your own survey or biometrics data, use these Excel templates to define your variables.

Both files include a Help sheet that explains every column name and provides examples of valid options.


Templates: Multilingual Survey & Biometrics

Location

demo/templates/
├── participants.json          # Participant variable definitions
├── survey/
│   └── survey-wellbeing.json  # Well-being questionnaire (DE/EN)
└── biometrics/
    └── biometrics-fitness.json  # Fitness assessment (DE/EN)

The Well-Being Survey (survey-wellbeing.json)

A 5-item synthetic questionnaire measuring general well-being:

Item

English Description

German Description

Scale

WB01

Life satisfaction

Lebenszufriedenheit

1-5

WB02

Happiness frequency

Häufigkeit von Glück

1-5

WB03

Stress level (reverse)

Stressniveau (umgekehrt)

1-5

WB04

Energy level

Energieniveau

1-5

WB05

Sleep quality satisfaction

Schlafqualität

1-5

Scale anchors (example for WB01):

  • 1 = Not at all satisfied / Überhaupt nicht zufrieden

  • 2 = Slightly satisfied / Wenig zufrieden

  • 3 = Moderately satisfied / Mäßig zufrieden

  • 4 = Very satisfied / Sehr zufrieden

  • 5 = Extremely satisfied / Äußerst zufrieden

I18n Structure

PRISM templates store both languages in a single file using nested objects:

{
  "Study": {
    "TaskName": "wellbeing",
    "OriginalName": {
      "de": "Wohlbefindens-Kurzskala",
      "en": "Well-Being Short Scale"
    },
    "Description": {
      "de": "5-Item Fragebogen zur Erfassung des allgemeinen Wohlbefindens",
      "en": "5-item questionnaire measuring general well-being"
    }
  },
  "I18n": {
    "Languages": ["de", "en"],
    "DefaultLanguage": "en"
  },
  "WB01": {
    "Description": {
      "de": "Wie zufrieden sind Sie im Allgemeinen mit Ihrem Leben?",
      "en": "In general, how satisfied are you with your life?"
    },
    "Levels": {
      "1": {
        "de": "Überhaupt nicht zufrieden",
        "en": "Not at all satisfied"
      },
      "5": {
        "de": "Äußerst zufrieden",
        "en": "Extremely satisfied"
      }
    }
  }
}

Benefits:

  • Single source of truth for translations

  • No sync issues between separate files

  • Compile to target language at export time

Compiling to Single Language

# Extract German version
python prism_tools.py library-compile demo/templates/survey/survey-wellbeing.json --lang de

# Extract English version
python prism_tools.py library-compile demo/templates/survey/survey-wellbeing.json --lang en

Raw Data: Source Files

Location

demo/raw_data/
├── survey_wellbeing_data.tsv      # Valid survey responses
├── biometrics_fitness_data.tsv    # Valid biometrics data
└── bad_examples/                  # 13 intentionally broken files

Valid Survey Data (survey_wellbeing_data.tsv)

10 synthetic participants with complete responses:

participant_id  session   age  sex  education  handedness  WB01  WB02  WB03  WB04  WB05  completion_date
DEMO001         baseline  28   f    4          r           4     4     2     3     4     2025-01-15
DEMO002         baseline  34   m    5          r           3     3     3     3     3     2025-01-16
DEMO003         baseline  22   f    3          r           5     5     1     5     5     2025-01-17
...

Column descriptions:

  • participant_id: Unique identifier (converted to sub-DEMO001 format)

  • session: Data collection session (becomes ses-baseline)

  • age, sex, education, handedness: Demographic variables

  • WB01-WB05: Survey item responses (1-5 Likert scale)

  • completion_date: When the survey was completed

Bad Examples for Testing

The bad_examples/ folder contains 13 intentionally malformed files to test error handling:

File

Issue

Expected Error

01_missing_id_column.tsv

No participant_id column

Cannot identify participants

02_wrong_delimiter.tsv

Semicolons instead of tabs

Column parsing failure

03_string_values.tsv

“very high” instead of 5

Non-numeric data error

04_out_of_range_values.tsv

Values like 99, -5, 300

Out of range warning

05_empty_values.tsv

Missing cells

Missing value handling

06_unknown_columns.tsv

Extra columns not in template

Unknown column warning

07_duplicate_ids.tsv

Same participant twice

Duplicate entry warning

08_inconsistent_columns.tsv

Varying column counts

Parsing error

09_mixed_types.tsv

Mix of numbers, N/A, NULL, #REF!

Type validation errors

10_empty_file.tsv

Completely empty

No data error

11_headers_only.tsv

Headers but no rows

No participant data

12_special_characters.tsv

HTML tags, quotes

Sanitization test

13_wrong_id_format.tsv

IDs not in sub-XXX format

Format warning

Usage:

# Test error handling via CLI
python prism_tools.py survey-convert \
    demo/raw_data/bad_examples/04_out_of_range_values.tsv \
    --library demo/templates \
    --output /tmp/test_output

# Or use the web interface Data Conversion page

Derivatives: Scoring Recipes

Location

demo/derivatives/
├── README.md
├── surveys/
│   └── wellbeing.json    # Wellbeing subscales & reverse coding
└── biometrics/
    └── fitness.json      # Fitness composite scores

Wellbeing Scoring Recipe (wellbeing.json)

Computes derived scores from raw survey items:

{
  "RecipeVersion": "1.0",
  "Kind": "survey",
  "Survey": {
    "Name": "Well-Being Short Scale",
    "TaskName": "wellbeing"
  },
  "Transforms": {
    "Invert": {
      "Scale": {"min": 1, "max": 5},
      "Items": ["WB03"]
    }
  },
  "Scores": [
    {
      "Name": "WB_total",
      "Description": "Total Well-Being Score",
      "Method": "sum",
      "Items": ["WB01", "WB02", "WB03", "WB04", "WB05"],
      "Range": {"min": 5, "max": 25},
      "Interpretation": {
        "5-10": "Low well-being",
        "11-17": "Moderate well-being",
        "18-25": "High well-being"
      }
    },
    {
      "Name": "WB_positive",
      "Description": "Positive Affect Subscale",
      "Method": "mean",
      "Items": ["WB02", "WB04"]
    },
    {
      "Name": "WB_satisfaction",
      "Description": "Life Satisfaction Subscale",
      "Method": "mean",
      "Items": ["WB01", "WB05"]
    }
  ]
}

Key features:

  • Reverse coding: WB03 (stress) is inverted so higher = better

  • Subscales: Total, Positive Affect, Life Satisfaction

  • Methods: sum or mean aggregation

  • Missing data: ignore skips missing values

Computing Derivatives

# Generate scored output from a PRISM dataset
python prism_tools.py derivatives-surveys /path/to/dataset \
    --recipe demo/derivatives/surveys/wellbeing.json \
    --output derivatives/

# Outputs: CSV, Excel (with codebook), and optional SPSS format

PRISM Structure Example

Location

demo/prism_structure_example/
├── .bidsignore
├── dataset_description.json
├── participants.json
├── participants.tsv
├── sub-001/
│   ├── eyetrack/
│   │   ├── sub-001_task-reading_eyetrack.tsv.gz
│   │   └── sub-001_task-reading_eyetrack.json
│   └── physio/
│       ├── sub-001_task-rest_physio.tsv.gz
│       └── sub-001_task-rest_physio.json
└── sub-002/
    ├── eyetrack/
    │   └── ...
    └── physio/
        └── ...

This is a correctly organized PRISM dataset demonstrating:

Dataset-Level Files

dataset_description.json - Required metadata:

{
  "Name": "PRISM Demo Dataset",
  "BIDSVersion": "1.8.0",
  "DatasetType": "raw",
  "License": "CC0",
  "Authors": ["Demo Author"],
  "Acknowledgements": "Synthetic demo data for PRISM testing"
}

participants.tsv - Participant demographics:

participant_id  age  sex  handedness
sub-001         28   F    R
sub-002         34   M    R

participants.json - Variable definitions:

{
  "age": {
    "Description": "Age of participant in years",
    "Units": "years"
  },
  "sex": {
    "Description": "Biological sex",
    "Levels": {
      "F": "Female",
      "M": "Male"
    }
  }
}

Subject-Level Structure

Each subject folder contains modality subfolders:

  • eyetrack/ - Eye tracking data

  • physio/ - Physiological recordings (ECG, EDA, etc.)

Naming convention:

sub-<id>_[ses-<session>_]task-<task>_<modality>.<ext>

Examples:

  • sub-001_task-reading_eyetrack.tsv.gz

  • sub-001_task-rest_physio.tsv.gz

Each data file has a corresponding .json sidecar with metadata.

Validating the Example

# Should pass with no errors
python prism.py demo/prism_structure_example/

# Output:
# ✓ Dataset validation complete
# 0 errors, 0 warnings

Flat Structure Example (Anti-Pattern)

Location

demo/flat_structure_example/
├── README.md
└── [messy files with inconsistent naming]

This folder demonstrates how data often arrives from experiments:

  • No standardized naming

  • Mixed file formats

  • No metadata sidecars

  • No participant organization

Purpose: Compare with prism_structure_example/ to understand why standardization matters.

# Will show many validation errors
python prism.py demo/flat_structure_example/

Hands-On Tutorials

Tutorial 1: Validate Demo Dataset

# Activate environment
source .venv/bin/activate

# Validate the well-organized example
python prism.py demo/prism_structure_example/
# → Should pass

# Try the flat structure
python prism.py demo/flat_structure_example/
# → Will show errors

Tutorial 2: Convert Survey Data

# Convert raw survey data to PRISM format
python prism_tools.py survey-convert \
    demo/raw_data/survey_wellbeing_data.tsv \
    --library demo/templates \
    --output /tmp/converted_survey \
    --force

# Validate the result
python prism.py /tmp/converted_survey

Tutorial 3: Compute Derivatives

# After converting, compute subscale scores
python prism_tools.py derivatives-surveys /tmp/converted_survey \
    --recipe demo/derivatives/surveys/wellbeing.json \
    --output /tmp/converted_survey/derivatives

# Check the output
ls /tmp/converted_survey/derivatives/surveys/
# → wellbeing_scores.csv, wellbeing_scores.xlsx, codebook.json

Tutorial 4: Test Error Handling

# Try importing a bad file
python prism_tools.py survey-convert \
    demo/raw_data/bad_examples/04_out_of_range_values.tsv \
    --library demo/templates \
    --output /tmp/bad_test

# Should produce clear error about values outside 1-5 range

Tutorial 5: Web Interface

  1. Start: python prism-studio.py

  2. Open: http://localhost:5001

  3. Go to Validate tab

  4. Upload demo/prism_structure_example/ → should pass

  5. Go to Data Conversion tab

  6. Select library: demo/templates

  7. Upload demo/raw_data/survey_wellbeing_data.tsv

  8. Watch real-time conversion log


Summary

Demo Component

What It Teaches

templates/survey/

Multilingual JSON sidecar format

templates/participants.json

Participant variable definitions

raw_data/survey_*.tsv

Typical source data format

raw_data/bad_examples/

Error handling & validation

derivatives/surveys/

Scoring recipe format

prism_structure_example/

Correct PRISM organization

flat_structure_example/

Why standardization matters

All demo data is completely synthetic and safe to share, modify, or use for testing.