Participant Variable Mapping Guide

Overview

When importing raw participant data into PRISM, your source files (e.g., wellbeing.tsv, fitness_data.tsv) may use custom encodings for demographic variables (numeric codes, different column names, etc.) that don’t match the standardized PRISM schema.

The participants_mapping.json file allows you to:

Specify which columns in your raw data represent participant demographics
Map those columns to standardized PRISM variable names
Transform numeric codes (e.g., 1, 2, 4) to standard codes (e.g., M, F, O)
Auto-convert participant data during dataset validation

Quick Start

Step 1: Create the mapping file

Place a participants_mapping.json file in your project’s code/library/ directory:

my_dataset/
├── code/
│   └── library/
│       └── participants_mapping.json    ← PUT IT HERE
├── sourcedata/
│   └── raw_data/
│       ├── wellbeing.tsv
│       └── fitness_data.tsv
├── dataset_description.json
├── participants.tsv
├── sub-001/
│   └── ...
└── ...

Alternatively, you can place it in sourcedata/ if that’s more convenient for your workflow:

my_dataset/
├── sourcedata/
│   ├── participants_mapping.json        ← OR HERE
│   └── raw_data/
│       └── wellbeing.tsv
└── ...

Step 2: Define your mappings

{
  "version": "1.0",
  "description": "Mapping for our study participant data",
  "mappings": {
    "participant_id": {
      "source_column": "participant_id",
      "standard_variable": "participant_id",
      "type": "string"
    },
    "sex": {
      "source_column": "sex",
      "standard_variable": "sex",
      "type": "string",
      "value_mapping": {
        "1": "M",
        "2": "F",
        "4": "O"
      }
    }
  }
}

Step 3: Run conversion

When you validate or convert your dataset, PRISM will:

✓ Auto-detect the participants_mapping.json file
✓ Apply transformations (numeric → standard codes)
✓ Generate participants.tsv with standardized values
✓ Log all transformations to the web terminal

Complete Example: Wellbeing Survey

Source data (raw_data/wellbeing.tsv):

participant_id   session   age   sex   education   handedness   WB01   ...
DEMO001          baseline  28    2     4           1            4      ...
DEMO002          baseline  34    1     5           1            3      ...

Mapping file (participants_mapping.json):

{
  "version": "1.0",
  "description": "Mapping for wellbeing survey raw data to PRISM standard",
  "source_file": "raw_data/wellbeing.tsv",
  "mappings": {
    "participant_id": {
      "source_column": "participant_id",
      "standard_variable": "participant_id",
      "type": "string",
      "description": "Unique participant identifier"
    },
    "age": {
      "source_column": "age",
      "standard_variable": "age",
      "type": "integer",
      "units": "years",
      "description": "Age in years"
    },
    "sex": {
      "source_column": "sex",
      "standard_variable": "sex",
      "type": "string",
      "value_mapping": {
        "1": "M",
        "2": "F",
        "4": "O"
      },
      "description": "Biological sex: 1=M, 2=F, 4=O"
    },
    "education": {
      "source_column": "education",
      "standard_variable": "education_level",
      "type": "string",
      "value_mapping": {
        "1": "1",
        "2": "2",
        "3": "3",
        "4": "4",
        "5": "5",
        "6": "6"
      },
      "description": "ISCED 2011 level"
    },
    "handedness": {
      "source_column": "handedness",
      "standard_variable": "handedness",
      "type": "string",
      "value_mapping": {
        "1": "R",
        "2": "L"
      },
      "description": "Hand preference: 1=R, 2=L"
    }
  }
}

Output (participants.tsv):

participant_id   age   sex   education_level   handedness
DEMO001          28    F     4                 R
DEMO002          34    M     5                 R

✓ Numeric codes automatically converted to standard codes!

Mapping Specification Format

Root level

Field	Type	Required	Description
`version`	string	Yes	Version of the mapping schema (e.g., “1.0”)
`description`	string	No	Human-readable description of this mapping
`source_file`	string	No	Path to the source file this mapping applies to
`instructions`	object	No	Custom instructions for users
`mappings`	object	Yes	Dictionary of column mappings

Per-column mapping

Each entry in mappings object:

{
  "my_mapping_name": {
    "source_column": "column_name_in_raw_data",
    "standard_variable": "standardized_variable_name",
    "type": "string|integer|float",
    "units": "years|cm|kg|...",
    "value_mapping": {
      "raw_value": "standard_value",
      "1": "M",
      "2": "F"
    },
    "description": "What this variable represents"
  }
}

Field	Type	Required	Description
`source_column`	string	Yes	Column name in the raw TSV file
`standard_variable`	string	Yes	Name of the standardized variable (from `participants.json` schema)
`type`	string	Yes	Data type: `string`, `integer`, `float`
`units`	string	No	Unit of measurement (e.g., “years”, “cm”)
`value_mapping`	object	No	Maps source values to standard values (for recoding)
`description`	string	No	Explanation of the variable

Standard PRISM Participant Variables

These are the standard variable names recognized by PRISM:

Core Demographics

participant_id - Unique identifier
age - Age in years
sex - Biological sex (M, F, O, n/a)
gender - Gender identity

Education & Employment

education_level - ISCED 2011 (0-8, n/a)
education_years - Years of formal education
employment_status - Employment category

Physical Traits

handedness - Hand dominance (R, L, A, n/a)
height - Height in cm
weight - Weight in kg
bmi - Body Mass Index

Health & Lifestyle

smoking_status - Smoking history
alcohol_consumption - Alcohol use frequency
physical_activity - Exercise frequency
medication_current - Current medications (yes/no)

Clinical

psychiatric_diagnosis - Mental health diagnosis history
neurological_diagnosis - Neurological condition history
vision - Visual acuity status
hearing - Hearing ability status

Other

group - Study group (control, patient, etc.)
marital_status - Partnership status
native_language - Language code (e.g., en, de)
country_of_birth - ISO 3166-1 code (e.g., DE, US)
country_of_residence - ISO 3166-1 code
ethnicity - Ethnic background category
income_bracket - Income range
residence_type - Urban/suburban/rural

See official/participants.json in the PRISM repository for complete definitions and expected values.

Value Mapping Reference

Common value mappings you might need:

Sex / Gender

"value_mapping": {
  "1": "M",
  "2": "F",
  "3": "O",
  "m": "M",
  "f": "F",
  "male": "M",
  "female": "F"
}

Handedness

"value_mapping": {
  "1": "R",
  "2": "L",
  "3": "A",
  "right": "R",
  "left": "L"
}

Education Level (ISCED 2011)

"value_mapping": {
  "1": "1",
  "2": "2",
  "3": "3",
  "4": "4",
  "5": "5",
  "6": "6",
  "7": "7",
  "8": "8"
}

Yes/No fields

"value_mapping": {
  "1": "yes",
  "0": "no",
  "yes": "yes",
  "no": "no",
  "y": "yes",
  "n": "no"
}

Workflow Integration

In the Web Interface

Upload dataset folder
PRISM detects participants_mapping.json at root
Shows: “Found participants mapping - Review transformations?”
Displays mapping summary and logs transformation
Auto-generates participants.tsv with standardized values

In the CLI

python prism-validator /path/to/my_dataset --apply-mapping

Or automatically applied during validation:

python prism-validator /path/to/my_dataset

File Location

The participants_mapping.json file should be placed in one of these project infrastructure locations:

code/library/participants_mapping.json (recommended)
- Standard location for preprocessing specifications in PRISM/BIDS YODA layout
- Part of the project’s code/methodology
- Not transferred to final dataset
sourcedata/participants_mapping.json (alternative)
- Alternative location if raw data lives in sourcedata/
- Clear association with raw data

Why not in the dataset root? The mapping file is a conversion specification, not part of the final BIDS/PRISM dataset:

It’s used to transform raw data INTO standardized format
Once data is imported into the dataset root structure, the mapping is no longer needed
Keeping it in code/ or sourcedata/ makes this clear
It’s automatically excluded from BIDS validation

BIDS Compatibility

The mapping file location (code/, sourcedata/) is standard for BIDS and automatically excluded from validation.

Troubleshooting

“No mapping found - continue without?”

Place participants_mapping.json in the dataset root
Check file name spelling (case-sensitive)
Ensure valid JSON syntax

“Source column ‘X’ not found”

Verify column name in raw TSV file matches exactly
Check for typos or whitespace
Column names are case-sensitive

“Values don’t match mapping”

Check numeric values (e.g., “1” vs 1)
Include all expected values in value_mapping
Use descriptive key names for troubleshooting

“Mapping validation failed”

Ensure version and mappings fields exist
Check JSON syntax (use online JSON validator)
Each mapping needs source_column and standard_variable

Examples

Example 1: Simple numeric sex coding

Raw data:

participant_id   sex
sub-001          1
sub-002          2

Mapping:

{
  "mappings": {
    "sex": {
      "source_column": "sex",
      "standard_variable": "sex",
      "type": "string",
      "value_mapping": {
        "1": "M",
        "2": "F"
      }
    }
  }
}

Output:

participant_id   sex
sub-001          M
sub-002          F

Example 2: Rename and recode education

Raw data:

participant_id   school_years
sub-001          12
sub-002          16

Mapping:

{
  "mappings": {
    "education": {
      "source_column": "school_years",
      "standard_variable": "education_years",
      "type": "integer",
      "units": "years"
    }
  }
}

Output:

participant_id   education_years
sub-001          12
sub-002          16

Example 3: Complex multipart mapping

Raw data:

participant_id   pid   visit   age_years   sex_code   handed
sub-001          P001  1       28          2          2
sub-002          P002  1       34          1          1

Mapping:

{
  "mappings": {
    "participant_id": {
      "source_column": "pid",
      "standard_variable": "participant_id"
    },
    "session": {
      "source_column": "visit",
      "standard_variable": "session",
      "value_mapping": {
        "1": "baseline",
        "2": "followup"
      }
    },
    "age": {
      "source_column": "age_years",
      "standard_variable": "age",
      "type": "integer",
      "units": "years"
    },
    "sex": {
      "source_column": "sex_code",
      "standard_variable": "sex",
      "value_mapping": {
        "1": "M",
        "2": "F"
      }
    },
    "handedness": {
      "source_column": "handed",
      "standard_variable": "handedness",
      "value_mapping": {
        "1": "R",
        "2": "L"
      }
    }
  }
}

Output:

participant_id   session   age   sex   handedness
P001             baseline  28    F     L
P002             baseline  34    M     R

Best Practices

Document your encoding - Always explain what numeric codes mean in the mapping
Be consistent - Use the same mapping across all files in a study
Keep it simple - Only map participant variables, not survey items
Validate mappings - Check for typos and missing value mappings
Version control - Commit participants_mapping.json to git with your dataset
Test first - Dry-run on a small dataset before full conversion

Getting Help

Check the examples in examples/workshop/
See official/participants.json for all standard variables and definitions
Run with verbose logging to see transformation details
Check .prismrc.json for validation settings