Participant Variable Mapping Guide

Overview

When importing raw participant data into PRISM, your source files (e.g., wellbeing.tsv, fitness_data.tsv) may use custom encodings for demographic variables (numeric codes, different column names, etc.) that don’t match the standardized PRISM schema.

The participants_mapping.json file allows you to:

  1. Specify which columns in your raw data represent participant demographics

  2. Map those columns to standardized PRISM variable names

  3. Transform numeric codes (e.g., 1, 2, 4) to standard codes (e.g., M, F, O)

  4. Auto-convert participant data during dataset validation


Quick Start

Step 1: Create the mapping file

Place a participants_mapping.json file in your project’s code/library/ directory:

my_dataset/
├── code/
│   └── library/
│       └── participants_mapping.json    ← PUT IT HERE
├── sourcedata/
│   └── raw_data/
│       ├── wellbeing.tsv
│       └── fitness_data.tsv
├── dataset_description.json
├── participants.tsv
├── sub-001/
│   └── ...
└── ...

Alternatively, you can place it in sourcedata/ if that’s more convenient for your workflow:

my_dataset/
├── sourcedata/
│   ├── participants_mapping.json        ← OR HERE
│   └── raw_data/
│       └── wellbeing.tsv
└── ...

Step 2: Define your mappings

{
  "version": "1.0",
  "description": "Mapping for our study participant data",
  "mappings": {
    "participant_id": {
      "source_column": "participant_id",
      "standard_variable": "participant_id",
      "type": "string"
    },
    "sex": {
      "source_column": "sex",
      "standard_variable": "sex",
      "type": "string",
      "value_mapping": {
        "1": "M",
        "2": "F",
        "4": "O"
      }
    }
  }
}

Step 3: Run conversion

When you validate or convert your dataset, PRISM will:

  • ✓ Auto-detect the participants_mapping.json file

  • ✓ Apply transformations (numeric → standard codes)

  • ✓ Generate participants.tsv with standardized values

  • ✓ Log all transformations to the web terminal


Complete Example: Wellbeing Survey

Source data (raw_data/wellbeing.tsv):

participant_id   session   age   sex   education   handedness   WB01   ...
DEMO001          baseline  28    2     4           1            4      ...
DEMO002          baseline  34    1     5           1            3      ...

Mapping file (participants_mapping.json):

{
  "version": "1.0",
  "description": "Mapping for wellbeing survey raw data to PRISM standard",
  "source_file": "raw_data/wellbeing.tsv",
  "mappings": {
    "participant_id": {
      "source_column": "participant_id",
      "standard_variable": "participant_id",
      "type": "string",
      "description": "Unique participant identifier"
    },
    "age": {
      "source_column": "age",
      "standard_variable": "age",
      "type": "integer",
      "units": "years",
      "description": "Age in years"
    },
    "sex": {
      "source_column": "sex",
      "standard_variable": "sex",
      "type": "string",
      "value_mapping": {
        "1": "M",
        "2": "F",
        "4": "O"
      },
      "description": "Biological sex: 1=M, 2=F, 4=O"
    },
    "education": {
      "source_column": "education",
      "standard_variable": "education_level",
      "type": "string",
      "value_mapping": {
        "1": "1",
        "2": "2",
        "3": "3",
        "4": "4",
        "5": "5",
        "6": "6"
      },
      "description": "ISCED 2011 level"
    },
    "handedness": {
      "source_column": "handedness",
      "standard_variable": "handedness",
      "type": "string",
      "value_mapping": {
        "1": "R",
        "2": "L"
      },
      "description": "Hand preference: 1=R, 2=L"
    }
  }
}

Output (participants.tsv):

participant_id   age   sex   education_level   handedness
DEMO001          28    F     4                 R
DEMO002          34    M     5                 R

✓ Numeric codes automatically converted to standard codes!


Mapping Specification Format

Root level

Field

Type

Required

Description

version

string

Yes

Version of the mapping schema (e.g., “1.0”)

description

string

No

Human-readable description of this mapping

source_file

string

No

Path to the source file this mapping applies to

instructions

object

No

Custom instructions for users

mappings

object

Yes

Dictionary of column mappings

Per-column mapping

Each entry in mappings object:

{
  "my_mapping_name": {
    "source_column": "column_name_in_raw_data",
    "standard_variable": "standardized_variable_name",
    "type": "string|integer|float",
    "units": "years|cm|kg|...",
    "value_mapping": {
      "raw_value": "standard_value",
      "1": "M",
      "2": "F"
    },
    "description": "What this variable represents"
  }
}

Field

Type

Required

Description

source_column

string

Yes

Column name in the raw TSV file

standard_variable

string

Yes

Name of the standardized variable (from participants.json schema)

type

string

Yes

Data type: string, integer, float

units

string

No

Unit of measurement (e.g., “years”, “cm”)

value_mapping

object

No

Maps source values to standard values (for recoding)

description

string

No

Explanation of the variable


Standard PRISM Participant Variables

These are the standard variable names recognized by PRISM:

Core Demographics

  • participant_id - Unique identifier

  • age - Age in years

  • sex - Biological sex (M, F, O, n/a)

  • gender - Gender identity

Education & Employment

  • education_level - ISCED 2011 (0-8, n/a)

  • education_years - Years of formal education

  • employment_status - Employment category

Physical Traits

  • handedness - Hand dominance (R, L, A, n/a)

  • height - Height in cm

  • weight - Weight in kg

  • bmi - Body Mass Index

Health & Lifestyle

  • smoking_status - Smoking history

  • alcohol_consumption - Alcohol use frequency

  • physical_activity - Exercise frequency

  • medication_current - Current medications (yes/no)

Clinical

  • psychiatric_diagnosis - Mental health diagnosis history

  • neurological_diagnosis - Neurological condition history

  • vision - Visual acuity status

  • hearing - Hearing ability status

Other

  • group - Study group (control, patient, etc.)

  • marital_status - Partnership status

  • native_language - Language code (e.g., en, de)

  • country_of_birth - ISO 3166-1 code (e.g., DE, US)

  • country_of_residence - ISO 3166-1 code

  • ethnicity - Ethnic background category

  • income_bracket - Income range

  • residence_type - Urban/suburban/rural

See official/participants.json in the PRISM repository for complete definitions and expected values.


Value Mapping Reference

Common value mappings you might need:

Sex / Gender

"value_mapping": {
  "1": "M",
  "2": "F",
  "3": "O",
  "m": "M",
  "f": "F",
  "male": "M",
  "female": "F"
}

Handedness

"value_mapping": {
  "1": "R",
  "2": "L",
  "3": "A",
  "right": "R",
  "left": "L"
}

Education Level (ISCED 2011)

"value_mapping": {
  "1": "1",
  "2": "2",
  "3": "3",
  "4": "4",
  "5": "5",
  "6": "6",
  "7": "7",
  "8": "8"
}

Yes/No fields

"value_mapping": {
  "1": "yes",
  "0": "no",
  "yes": "yes",
  "no": "no",
  "y": "yes",
  "n": "no"
}

Workflow Integration

In the Web Interface

  1. Upload dataset folder

  2. PRISM detects participants_mapping.json at root

  3. Shows: “Found participants mapping - Review transformations?”

  4. Displays mapping summary and logs transformation

  5. Auto-generates participants.tsv with standardized values

In the CLI

python prism-validator /path/to/my_dataset --apply-mapping

Or automatically applied during validation:

python prism-validator /path/to/my_dataset

File Location

The participants_mapping.json file should be placed in one of these project infrastructure locations:

  1. code/library/participants_mapping.json (recommended)

    • Standard location for preprocessing specifications in PRISM/BIDS YODA layout

    • Part of the project’s code/methodology

    • Not transferred to final dataset

  2. sourcedata/participants_mapping.json (alternative)

    • Alternative location if raw data lives in sourcedata/

    • Clear association with raw data

Why not in the dataset root? The mapping file is a conversion specification, not part of the final BIDS/PRISM dataset:

  • It’s used to transform raw data INTO standardized format

  • Once data is imported into the dataset root structure, the mapping is no longer needed

  • Keeping it in code/ or sourcedata/ makes this clear

  • It’s automatically excluded from BIDS validation

BIDS Compatibility

The mapping file location (code/, sourcedata/) is standard for BIDS and automatically excluded from validation.


Troubleshooting

“No mapping found - continue without?”

  • Place participants_mapping.json in the dataset root

  • Check file name spelling (case-sensitive)

  • Ensure valid JSON syntax

“Source column ‘X’ not found”

  • Verify column name in raw TSV file matches exactly

  • Check for typos or whitespace

  • Column names are case-sensitive

“Values don’t match mapping”

  • Check numeric values (e.g., “1” vs 1)

  • Include all expected values in value_mapping

  • Use descriptive key names for troubleshooting

“Mapping validation failed”

  • Ensure version and mappings fields exist

  • Check JSON syntax (use online JSON validator)

  • Each mapping needs source_column and standard_variable


Examples

Example 1: Simple numeric sex coding

Raw data:

participant_id   sex
sub-001          1
sub-002          2

Mapping:

{
  "mappings": {
    "sex": {
      "source_column": "sex",
      "standard_variable": "sex",
      "type": "string",
      "value_mapping": {
        "1": "M",
        "2": "F"
      }
    }
  }
}

Output:

participant_id   sex
sub-001          M
sub-002          F

Example 2: Rename and recode education

Raw data:

participant_id   school_years
sub-001          12
sub-002          16

Mapping:

{
  "mappings": {
    "education": {
      "source_column": "school_years",
      "standard_variable": "education_years",
      "type": "integer",
      "units": "years"
    }
  }
}

Output:

participant_id   education_years
sub-001          12
sub-002          16

Example 3: Complex multipart mapping

Raw data:

participant_id   pid   visit   age_years   sex_code   handed
sub-001          P001  1       28          2          2
sub-002          P002  1       34          1          1

Mapping:

{
  "mappings": {
    "participant_id": {
      "source_column": "pid",
      "standard_variable": "participant_id"
    },
    "session": {
      "source_column": "visit",
      "standard_variable": "session",
      "value_mapping": {
        "1": "baseline",
        "2": "followup"
      }
    },
    "age": {
      "source_column": "age_years",
      "standard_variable": "age",
      "type": "integer",
      "units": "years"
    },
    "sex": {
      "source_column": "sex_code",
      "standard_variable": "sex",
      "value_mapping": {
        "1": "M",
        "2": "F"
      }
    },
    "handedness": {
      "source_column": "handed",
      "standard_variable": "handedness",
      "value_mapping": {
        "1": "R",
        "2": "L"
      }
    }
  }
}

Output:

participant_id   session   age   sex   handedness
P001             baseline  28    F     L
P002             baseline  34    M     R

Best Practices

  1. Document your encoding - Always explain what numeric codes mean in the mapping

  2. Be consistent - Use the same mapping across all files in a study

  3. Keep it simple - Only map participant variables, not survey items

  4. Validate mappings - Check for typos and missing value mappings

  5. Version control - Commit participants_mapping.json to git with your dataset

  6. Test first - Dry-run on a small dataset before full conversion


Getting Help

  • Check the examples in examples/workshop/

  • See official/participants.json for all standard variables and definitions

  • Run with verbose logging to see transformation details

  • Check .prismrc.json for validation settings