# Participant Variable Mapping Guide

## Overview

When importing raw participant data into PRISM, your source files (e.g., `wellbeing.tsv`, `fitness_data.tsv`) may use custom encodings for demographic variables (numeric codes, different column names, etc.) that don't match the standardized PRISM schema.

The **`participants_mapping.json`** file allows you to:
1. **Specify** which columns in your raw data represent participant demographics
2. **Map** those columns to standardized PRISM variable names
3. **Transform** numeric codes (e.g., 1, 2, 4) to standard codes (e.g., M, F, O)
4. **Auto-convert** participant data during dataset validation

---

## Quick Start

### Step 1: Create the mapping file

Place a `participants_mapping.json` file in your project's **code/library/** directory:

```
my_dataset/
├── code/
│   └── library/
│       └── participants_mapping.json    ← PUT IT HERE
├── sourcedata/
│   └── raw_data/
│       ├── wellbeing.tsv
│       └── fitness_data.tsv
├── dataset_description.json
├── participants.tsv
├── sub-001/
│   └── ...
└── ...
```

Alternatively, you can place it in **sourcedata/** if that's more convenient for your workflow:

```
my_dataset/
├── sourcedata/
│   ├── participants_mapping.json        ← OR HERE
│   └── raw_data/
│       └── wellbeing.tsv
└── ...
```

### Step 2: Define your mappings

```json
{
  "version": "1.0",
  "description": "Mapping for our study participant data",
  "mappings": {
    "participant_id": {
      "source_column": "participant_id",
      "standard_variable": "participant_id",
      "type": "string"
    },
    "sex": {
      "source_column": "sex",
      "standard_variable": "sex",
      "type": "string",
      "value_mapping": {
        "1": "M",
        "2": "F",
        "4": "O"
      }
    }
  }
}
```

### Step 3: Run conversion

When you validate or convert your dataset, PRISM will:
- ✓ Auto-detect the `participants_mapping.json` file
- ✓ Apply transformations (numeric → standard codes)
- ✓ Generate `participants.tsv` with standardized values
- ✓ Log all transformations to the web terminal

---

## Complete Example: Wellbeing Survey

**Source data** (`raw_data/wellbeing.tsv`):
```
participant_id   session   age   sex   education   handedness   WB01   ...
DEMO001          baseline  28    2     4           1            4      ...
DEMO002          baseline  34    1     5           1            3      ...
```

**Mapping file** (`participants_mapping.json`):
```json
{
  "version": "1.0",
  "description": "Mapping for wellbeing survey raw data to PRISM standard",
  "source_file": "raw_data/wellbeing.tsv",
  "mappings": {
    "participant_id": {
      "source_column": "participant_id",
      "standard_variable": "participant_id",
      "type": "string",
      "description": "Unique participant identifier"
    },
    "age": {
      "source_column": "age",
      "standard_variable": "age",
      "type": "integer",
      "units": "years",
      "description": "Age in years"
    },
    "sex": {
      "source_column": "sex",
      "standard_variable": "sex",
      "type": "string",
      "value_mapping": {
        "1": "M",
        "2": "F",
        "4": "O"
      },
      "description": "Biological sex: 1=M, 2=F, 4=O"
    },
    "education": {
      "source_column": "education",
      "standard_variable": "education_level",
      "type": "string",
      "value_mapping": {
        "1": "1",
        "2": "2",
        "3": "3",
        "4": "4",
        "5": "5",
        "6": "6"
      },
      "description": "ISCED 2011 level"
    },
    "handedness": {
      "source_column": "handedness",
      "standard_variable": "handedness",
      "type": "string",
      "value_mapping": {
        "1": "R",
        "2": "L"
      },
      "description": "Hand preference: 1=R, 2=L"
    }
  }
}
```

**Output** (`participants.tsv`):
```
participant_id   age   sex   education_level   handedness
DEMO001          28    F     4                 R
DEMO002          34    M     5                 R
```

✓ Numeric codes automatically converted to standard codes!

---

## Mapping Specification Format

### Root level

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `version` | string | Yes | Version of the mapping schema (e.g., "1.0") |
| `description` | string | No | Human-readable description of this mapping |
| `source_file` | string | No | Path to the source file this mapping applies to |
| `instructions` | object | No | Custom instructions for users |
| `mappings` | object | Yes | Dictionary of column mappings |

### Per-column mapping

Each entry in `mappings` object:

```json
{
  "my_mapping_name": {
    "source_column": "column_name_in_raw_data",
    "standard_variable": "standardized_variable_name",
    "type": "string|integer|float",
    "units": "years|cm|kg|...",
    "value_mapping": {
      "raw_value": "standard_value",
      "1": "M",
      "2": "F"
    },
    "description": "What this variable represents"
  }
}
```

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `source_column` | string | Yes | Column name in the raw TSV file |
| `standard_variable` | string | Yes | Name of the standardized variable (from `participants.json` schema) |
| `type` | string | Yes | Data type: `string`, `integer`, `float` |
| `units` | string | No | Unit of measurement (e.g., "years", "cm") |
| `value_mapping` | object | No | Maps source values to standard values (for recoding) |
| `description` | string | No | Explanation of the variable |

---

## Standard PRISM Participant Variables

These are the standard variable names recognized by PRISM:

### Core Demographics
- `participant_id` - Unique identifier
- `age` - Age in years
- `sex` - Biological sex (M, F, O, n/a)
- `gender` - Gender identity

### Education & Employment
- `education_level` - ISCED 2011 (0-8, n/a)
- `education_years` - Years of formal education
- `employment_status` - Employment category

### Physical Traits
- `handedness` - Hand dominance (R, L, A, n/a)
- `height` - Height in cm
- `weight` - Weight in kg
- `bmi` - Body Mass Index

### Health & Lifestyle
- `smoking_status` - Smoking history
- `alcohol_consumption` - Alcohol use frequency
- `physical_activity` - Exercise frequency
- `medication_current` - Current medications (yes/no)

### Clinical
- `psychiatric_diagnosis` - Mental health diagnosis history
- `neurological_diagnosis` - Neurological condition history
- `vision` - Visual acuity status
- `hearing` - Hearing ability status

### Other
- `group` - Study group (control, patient, etc.)
- `marital_status` - Partnership status
- `native_language` - Language code (e.g., en, de)
- `country_of_birth` - ISO 3166-1 code (e.g., DE, US)
- `country_of_residence` - ISO 3166-1 code
- `ethnicity` - Ethnic background category
- `income_bracket` - Income range
- `residence_type` - Urban/suburban/rural

See `official/participants.json` in the PRISM repository for complete definitions and expected values.

---

## Value Mapping Reference

Common value mappings you might need:

### Sex / Gender
```json
"value_mapping": {
  "1": "M",
  "2": "F",
  "3": "O",
  "m": "M",
  "f": "F",
  "male": "M",
  "female": "F"
}
```

### Handedness
```json
"value_mapping": {
  "1": "R",
  "2": "L",
  "3": "A",
  "right": "R",
  "left": "L"
}
```

### Education Level (ISCED 2011)
```json
"value_mapping": {
  "1": "1",
  "2": "2",
  "3": "3",
  "4": "4",
  "5": "5",
  "6": "6",
  "7": "7",
  "8": "8"
}
```

### Yes/No fields
```json
"value_mapping": {
  "1": "yes",
  "0": "no",
  "yes": "yes",
  "no": "no",
  "y": "yes",
  "n": "no"
}
```

---

## Workflow Integration

### In the Web Interface

1. **Upload dataset folder**
2. PRISM detects `participants_mapping.json` at root
3. Shows: "Found participants mapping - Review transformations?"
4. Displays mapping summary and logs transformation
5. Auto-generates `participants.tsv` with standardized values

### In the CLI

```bash
python prism-validator /path/to/my_dataset --apply-mapping
```

Or automatically applied during validation:
```bash
python prism-validator /path/to/my_dataset
```

---

## File Location

The `participants_mapping.json` file should be placed in one of these **project infrastructure locations**:

1. **`code/library/participants_mapping.json`** (recommended)
   - Standard location for preprocessing specifications in PRISM/BIDS YODA layout
   - Part of the project's code/methodology
   - Not transferred to final dataset

2. **`sourcedata/participants_mapping.json`** (alternative)
   - Alternative location if raw data lives in sourcedata/
   - Clear association with raw data

**Why not in the dataset root?**
The mapping file is a **conversion specification**, not part of the final BIDS/PRISM dataset:
- It's used to transform raw data INTO standardized format
- Once data is imported into the dataset root structure, the mapping is no longer needed
- Keeping it in `code/` or `sourcedata/` makes this clear
- It's automatically excluded from BIDS validation

## BIDS Compatibility

The mapping file location (`code/`, `sourcedata/`) is standard for BIDS and automatically excluded from validation.

---

## Troubleshooting

### "No mapping found - continue without?"
- Place `participants_mapping.json` in the dataset root
- Check file name spelling (case-sensitive)
- Ensure valid JSON syntax

### "Source column 'X' not found"
- Verify column name in raw TSV file matches exactly
- Check for typos or whitespace
- Column names are case-sensitive

### "Values don't match mapping"
- Check numeric values (e.g., "1" vs 1)
- Include all expected values in `value_mapping`
- Use descriptive key names for troubleshooting

### "Mapping validation failed"
- Ensure `version` and `mappings` fields exist
- Check JSON syntax (use online JSON validator)
- Each mapping needs `source_column` and `standard_variable`

---

## Examples

### Example 1: Simple numeric sex coding

**Raw data:**
```
participant_id   sex
sub-001          1
sub-002          2
```

**Mapping:**
```json
{
  "mappings": {
    "sex": {
      "source_column": "sex",
      "standard_variable": "sex",
      "type": "string",
      "value_mapping": {
        "1": "M",
        "2": "F"
      }
    }
  }
}
```

**Output:**
```
participant_id   sex
sub-001          M
sub-002          F
```

### Example 2: Rename and recode education

**Raw data:**
```
participant_id   school_years
sub-001          12
sub-002          16
```

**Mapping:**
```json
{
  "mappings": {
    "education": {
      "source_column": "school_years",
      "standard_variable": "education_years",
      "type": "integer",
      "units": "years"
    }
  }
}
```

**Output:**
```
participant_id   education_years
sub-001          12
sub-002          16
```

### Example 3: Complex multipart mapping

**Raw data:**
```
participant_id   pid   visit   age_years   sex_code   handed
sub-001          P001  1       28          2          2
sub-002          P002  1       34          1          1
```

**Mapping:**
```json
{
  "mappings": {
    "participant_id": {
      "source_column": "pid",
      "standard_variable": "participant_id"
    },
    "session": {
      "source_column": "visit",
      "standard_variable": "session",
      "value_mapping": {
        "1": "baseline",
        "2": "followup"
      }
    },
    "age": {
      "source_column": "age_years",
      "standard_variable": "age",
      "type": "integer",
      "units": "years"
    },
    "sex": {
      "source_column": "sex_code",
      "standard_variable": "sex",
      "value_mapping": {
        "1": "M",
        "2": "F"
      }
    },
    "handedness": {
      "source_column": "handed",
      "standard_variable": "handedness",
      "value_mapping": {
        "1": "R",
        "2": "L"
      }
    }
  }
}
```

**Output:**
```
participant_id   session   age   sex   handedness
P001             baseline  28    F     L
P002             baseline  34    M     R
```

---

## Best Practices

1. **Document your encoding** - Always explain what numeric codes mean in the mapping
2. **Be consistent** - Use the same mapping across all files in a study
3. **Keep it simple** - Only map participant variables, not survey items
4. **Validate mappings** - Check for typos and missing value mappings
5. **Version control** - Commit `participants_mapping.json` to git with your dataset
6. **Test first** - Dry-run on a small dataset before full conversion

---

## Getting Help

- Check the examples in `examples/workshop/` 
- See `official/participants.json` for all standard variables and definitions
- Run with verbose logging to see transformation details
- Check `.prismrc.json` for validation settings